This document outlines the planned implementation of VFrames functions, prioritized by common usage in data analysis workflows.
Functions essential for basic data manipulation tasks.
| Status | Function | Description |
|---|---|---|
| [X] | to_csv |
Export DataFrame to CSV file |
| [X] | to_json |
Export DataFrame to JSON file |
| [X] | to_parquet |
Export DataFrame to Parquet file |
| [X] | rename |
Rename columns |
| [X] | rename_axis |
Rename axis (alias for rename) |
| [X] | replace |
Replace values in DataFrame |
| [X] | astype |
Convert column data types |
| [X] | isin |
Filter rows by list of values |
| [X] | value_counts |
Count unique values |
| [X] | agg / aggregate |
Aggregate functions (sum, mean, etc.) |
| [X] | describe |
Basic statistics (already implemented) |
Functions frequently used in data cleaning and transformation.
| Status | Function | Description |
|---|---|---|
| [X] | merge |
Merge two DataFrames |
| [X] | join |
Join two DataFrames |
| [X] | concat |
Concatenate DataFrames |
| [X] | pivot |
Pivot table functionality |
| [X] | pivot_table |
Advanced pivot with aggregation |
| [X] | melt |
Unpivot DataFrame |
| [X] | drop_duplicates |
Remove duplicate rows |
| [X] | sample |
Random sample of rows |
| [X] | assign |
Add new columns via assignment |
| [X] | select |
Select columns (use subset instead) |
Useful for specific analytical workflows.
| Status | Function | Description |
|---|---|---|
| [X] | apply |
Apply custom functions |
| [X] | map |
Map function to elements |
| [X] | rank |
Rank values |
| [X] | quantile |
Calculate quantiles |
| [X] | corr |
Correlation matrix |
| [X] | cov |
Covariance matrix |
| [X] | rolling |
Rolling window calculations |
| [X] | shift |
Shift values |
| [X] | diff |
Calculate differences |
| [X] | pct_change |
Percentage change |
Advanced functions for specific use cases.
| Status | Function | Description |
|---|---|---|
| [ ] | cummax |
Cumulative maximum |
| [ ] | cummin |
Cumulative minimum |
| [ ] | cumprod |
Cumulative product |
| [ ] | cumsum |
Cumulative sum |
| [ ] | ewm |
Exponentially weighted functions |
| [ ] | resample |
Resample time series |
| [ ] | interpolate |
Interpolate missing values |
| [ ] | get |
Get value by label |
| [ ] | at |
Access single value by label |
| [ ] | iat |
Access single value by position |
Advanced index manipulation.
| Status | Function | Description |
|---|---|---|
| [ ] | loc |
Label-based indexing |
| [ ] | iloc |
Position-based indexing |
| [ ] | set_index |
Set index column |
| [ ] | reset_index |
Reset index (already implemented) |
| [ ] | reindex |
Reindex DataFrame |
| [ ] | rename_axis |
Rename axis |
Various export formats.
| Status | Function | Description |
|---|---|---|
| [ ] | to_dict |
Export to dictionary |
| [ ] | to_string |
String representation |
| [ ] | to_html |
HTML table |
| [ ] | to_excel |
Excel file |
| [ ] | to_sql |
SQL table |
| [ ] | to_records |
NumPy records |
| [ ] | to_markdown |
Markdown table |
| [ ] | to_clipboard |
Copy to clipboard |
| [ ] | to_orc |
ORC file format |
Row/column iteration.
| Status | Function | Description |
|---|---|---|
| [ ] | iterrows |
Iterate over rows |
| [ ] | itertuples |
Iterate over rows as tuples |
| [ ] | items |
Iterate over column pairs |
| [ ] | iterrows |
(already listed above) |
Complex operations.
| Status | Function | Description |
|---|---|---|
| [ ] | unstack |
Unstack pivot |
| [ ] | stack |
Stack DataFrame |
| [ ] | explode |
Explode list-like columns |
| [ ] | melt |
Already in Phase 2 |
| [ ] | where |
Conditional replacement |
| [ ] | mask |
(already implemented) |
| [ ] | eval |
Evaluate expressions |
| [ ] | query |
(already implemented) |
isna/isnull- Check for null valuesnotna/notnull- Check for non-null valuesfillna- Fill null values (value, ffill, bfill)ffill/bfill- Forward/backward fill- Error handling - Proper error propagation
sort_values- Sort by column values
to_csv- Export DataFrame to CSV fileto_json- Export DataFrame to JSON fileto_parquet- Export DataFrame to Parquet filereplace- Replace values in DataFrameastype- Convert column data typesisin- Filter rows by list of valuesvalue_counts- Count unique valuesagg/aggregate- Aggregate functionsrename- Rename columnsrename_axis- Rename axis
merge- Merge two DataFramesjoin- Join two DataFramesconcat- Concatenate DataFramespivot/pivot_table- Pivot table functionalitymelt- Unpivot DataFramesample- Random sample of rowsassign- Add new columns via assignmentdrop_duplicates- Remove duplicate rows
apply- Apply custom SQL functionsmap- Map function to elements (alias for apply)rank- Rank values (various methods)quantile- Calculate quantilescorr- Correlation matrix for numeric columnscov- Covariance matrix for numeric columnsrolling- Rolling window calculationsshift- Shift values by periodsdiff- Calculate differences between rowspct_change- Percentage change between rows
Many functions can leverage DuckDB's powerful SQL engine:
merge,join- SQL JOIN operationspivot,unstack- SQL pivot capabilitiesrolling,ewm- SQL window functionsrank,dense_rank- SQL window functionscorr,cov- Statistical functionsquantile- SQL quantile functions
Consider implementing:
- Automatic cleanup of intermediate tables
- Table naming strategy for garbage collection
- Memory-mapped file handling for large datasets
When implementing new functions:
- Add tests in
src/funcs_test.vorsrc/mutation_test.v - Update this document when starting implementation
- Use consistent error handling (return
!DataFrame) - Document function with V doc comments
- Consider both in-memory and persisted contexts