[Feature Request] Support Dataset Updating on Same Version

**Issue:** I am trying to create a dataset from a Kaggle kernel. My workflow involves converting an existing input dataset into TFRecord format, shard by shard. Because of limited disk space in the kernel, my plan was to create and upload the dataset incrementally for each shard, deleting the local shard after upload to free space.

The problem is that uploading the next shard creates a new version of the dataset each time. This behavior is inconvenient for workflows that require incremental uploads of large datasets.

Currently, in [this code](https://github.com/Kaggle/kagglehub/blob/5d58b02babf2fa100d6d1df0066140970fe70885/src/kagglehub/datasets_helpers.py#L41-L51), if the dataset already exists, dataset_upload() automatically creates a new version for each upload. Ideally, there could be a way to append files to the same dataset version before publishing, so that large datasets can be uploaded incrementally without generating multiple versions.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Support Dataset Updating on Same Version #266

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Support Dataset Updating on Same Version #266

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions