Skip to content

[Feature Request] Support Dataset Updating on Same Version #266

@innat

Description

@innat

Issue: I am trying to create a dataset from a Kaggle kernel. My workflow involves converting an existing input dataset into TFRecord format, shard by shard. Because of limited disk space in the kernel, my plan was to create and upload the dataset incrementally for each shard, deleting the local shard after upload to free space.

The problem is that uploading the next shard creates a new version of the dataset each time. This behavior is inconvenient for workflows that require incremental uploads of large datasets.

Currently, in this code, if the dataset already exists, dataset_upload() automatically creates a new version for each upload. Ideally, there could be a way to append files to the same dataset version before publishing, so that large datasets can be uploaded incrementally without generating multiple versions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions