Implementation Roadmap

This document outlines the planned implementation of VFrames functions, prioritized by common usage in data analysis workflows.

Priority Phases

Phase 1: Critical (Most Common - Daily Use)

Functions essential for basic data manipulation tasks.

Status	Function	Description
[X]	`to_csv`	Export DataFrame to CSV file
[X]	`to_json`	Export DataFrame to JSON file
[X]	`to_parquet`	Export DataFrame to Parquet file
[X]	`rename`	Rename columns
[X]	`rename_axis`	Rename axis (alias for rename)
[X]	`replace`	Replace values in DataFrame
[X]	`astype`	Convert column data types
[X]	`isin`	Filter rows by list of values
[X]	`value_counts`	Count unique values
[X]	`agg` / `aggregate`	Aggregate functions (sum, mean, etc.)
[X]	`describe`	Basic statistics (already implemented)

Phase 2: High Priority (Common - Weekly Use)

Functions frequently used in data cleaning and transformation.

Status	Function	Description
[X]	`merge`	Merge two DataFrames
[X]	`join`	Join two DataFrames
[X]	`concat`	Concatenate DataFrames
[X]	`pivot`	Pivot table functionality
[X]	`pivot_table`	Advanced pivot with aggregation
[X]	`melt`	Unpivot DataFrame
[X]	`drop_duplicates`	Remove duplicate rows
[X]	`sample`	Random sample of rows
[X]	`assign`	Add new columns via assignment
[X]	`select`	Select columns (use `subset` instead)

Phase 3: Medium Priority (Occasional Use)

Useful for specific analytical workflows.

Status	Function	Description
[X]	`apply`	Apply custom functions
[X]	`map`	Map function to elements
[X]	`rank`	Rank values
[X]	`quantile`	Calculate quantiles
[X]	`corr`	Correlation matrix
[X]	`cov`	Covariance matrix
[X]	`rolling`	Rolling window calculations
[X]	`shift`	Shift values
[X]	`diff`	Calculate differences
[X]	`pct_change`	Percentage change

Phase 4: Lower Priority (Specialized Use)

Advanced functions for specific use cases.

Status	Function	Description
[ ]	`cummax`	Cumulative maximum
[ ]	`cummin`	Cumulative minimum
[ ]	`cumprod`	Cumulative product
[ ]	`cumsum`	Cumulative sum
[ ]	`ewm`	Exponentially weighted functions
[ ]	`resample`	Resample time series
[ ]	`interpolate`	Interpolate missing values
[ ]	`get`	Get value by label
[ ]	`at`	Access single value by label
[ ]	`iat`	Access single value by position

Phase 5: DataFrame Operations (Index/Label Management)

Advanced index manipulation.

Status	Function	Description
[ ]	`loc`	Label-based indexing
[ ]	`iloc`	Position-based indexing
[ ]	`set_index`	Set index column
[ ]	`reset_index`	Reset index (already implemented)
[ ]	`reindex`	Reindex DataFrame
[ ]	`rename_axis`	Rename axis

Phase 6: Output/Export Formats

Various export formats.

Status	Function	Description
[ ]	`to_dict`	Export to dictionary
[ ]	`to_string`	String representation
[ ]	`to_html`	HTML table
[ ]	`to_excel`	Excel file
[ ]	`to_sql`	SQL table
[ ]	`to_records`	NumPy records
[ ]	`to_markdown`	Markdown table
[ ]	`to_clipboard`	Copy to clipboard
[ ]	`to_orc`	ORC file format

Phase 7: Iteration/Iteration Helpers

Row/column iteration.

Status	Function	Description
[ ]	`iterrows`	Iterate over rows
[ ]	`itertuples`	Iterate over rows as tuples
[ ]	`items`	Iterate over column pairs
[ ]	`iterrows`	(already listed above)

Phase 8: Advanced/Experimental

Complex operations.

Status	Function	Description
[ ]	`unstack`	Unstack pivot
[ ]	`stack`	Stack DataFrame
[ ]	`explode`	Explode list-like columns
[ ]	`melt`	Already in Phase 2
[ ]	`where`	Conditional replacement
[ ]	`mask`	(already implemented)
[ ]	`eval`	Evaluate expressions
[ ]	`query`	(already implemented)

Implementation Notes

Completed Functions (v0.1.3+)

isna / isnull - Check for null values
notna / notnull - Check for non-null values
fillna - Fill null values (value, ffill, bfill)
ffill / bfill - Forward/backward fill
Error handling - Proper error propagation

Not Yet Implemented

sort_values - Sort by column values

Completed Functions (v0.1.4+ - Phase 1 & 2)

Phase 1 - Completed

to_csv - Export DataFrame to CSV file
to_json - Export DataFrame to JSON file
to_parquet - Export DataFrame to Parquet file
replace - Replace values in DataFrame
astype - Convert column data types
isin - Filter rows by list of values
value_counts - Count unique values
agg / aggregate - Aggregate functions
rename - Rename columns
rename_axis - Rename axis

Phase 2 - Completed

merge - Merge two DataFrames
join - Join two DataFrames
concat - Concatenate DataFrames
pivot / pivot_table - Pivot table functionality
melt - Unpivot DataFrame
sample - Random sample of rows
assign - Add new columns via assignment
drop_duplicates - Remove duplicate rows

Phase 3 - Medium Priority

apply - Apply custom SQL functions
map - Map function to elements (alias for apply)
rank - Rank values (various methods)
quantile - Calculate quantiles
corr - Correlation matrix for numeric columns
cov - Covariance matrix for numeric columns
rolling - Rolling window calculations
shift - Shift values by periods
diff - Calculate differences between rows
pct_change - Percentage change between rows

DuckDB Backend

Many functions can leverage DuckDB's powerful SQL engine:

merge, join - SQL JOIN operations
pivot, unstack - SQL pivot capabilities
rolling, ewm - SQL window functions
rank, dense_rank - SQL window functions
corr, cov - Statistical functions
quantile - SQL quantile functions

Memory Management

Consider implementing:

Automatic cleanup of intermediate tables
Table naming strategy for garbage collection
Memory-mapped file handling for large datasets

Contributing

When implementing new functions:

Add tests in src/funcs_test.v or src/mutation_test.v
Update this document when starting implementation
Use consistent error handling (return !DataFrame)
Document function with V doc comments
Consider both in-memory and persisted contexts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation Roadmap

Priority Phases

Phase 1: Critical (Most Common - Daily Use)

Phase 2: High Priority (Common - Weekly Use)

Phase 3: Medium Priority (Occasional Use)

Phase 4: Lower Priority (Specialized Use)

Phase 5: DataFrame Operations (Index/Label Management)

Phase 6: Output/Export Formats

Phase 7: Iteration/Iteration Helpers

Phase 8: Advanced/Experimental

Implementation Notes

Completed Functions (v0.1.3+)

Not Yet Implemented

Completed Functions (v0.1.4+ - Phase 1 & 2)

Phase 1 - Completed

Phase 2 - Completed

Phase 3 - Medium Priority

DuckDB Backend

Memory Management

Contributing

FilesExpand file tree

IMPLEMENTATION_ROADMAP.md

Latest commit

History

IMPLEMENTATION_ROADMAP.md

File metadata and controls

Implementation Roadmap

Priority Phases

Phase 1: Critical (Most Common - Daily Use)

Phase 2: High Priority (Common - Weekly Use)

Phase 3: Medium Priority (Occasional Use)

Phase 4: Lower Priority (Specialized Use)

Phase 5: DataFrame Operations (Index/Label Management)

Phase 6: Output/Export Formats

Phase 7: Iteration/Iteration Helpers

Phase 8: Advanced/Experimental

Implementation Notes

Completed Functions (v0.1.3+)

Not Yet Implemented

Completed Functions (v0.1.4+ - Phase 1 & 2)

Phase 1 - Completed

Phase 2 - Completed

Phase 3 - Medium Priority

DuckDB Backend

Memory Management

Contributing