Skip to content

Switch over to .parquet files to back a Region2Vec dataset#15

Closed
nleroy917 wants to merge 1241 commits intomasterfrom
r2v_atacformer_tokenization_updates
Closed

Switch over to .parquet files to back a Region2Vec dataset#15
nleroy917 wants to merge 1241 commits intomasterfrom
r2v_atacformer_tokenization_updates

Conversation

@nleroy917
Copy link
Copy Markdown
Member

@nleroy917 nleroy917 commented Sep 5, 2025

This directly addresses #14.

Briefly, it changes the Region2VecDataset from requiring a folder of .gtok files to a single .parquet file. I've also updated the documentation to support this change.


TODO:

  • Fix tests
  • Run linter
  • Wait for gtars release

Copy link
Copy Markdown
Member

@khoroshevskyi khoroshevskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good, but we need to fix tests

@nleroy917
Copy link
Copy Markdown
Member Author

The geniml_dev to geniml mess-up completely broke this, so closing

@nleroy917 nleroy917 closed this Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants