Skip to content

Partial write is actually O(N^2) for non copy-on-write FS #2995

@qGentry

Description

@qGentry

Seems like that due to atomicity we create a recursive copy of the checkpoint on every new entry.
https://github.com/google/orbax/blob/main/checkpoint/orbax/checkpoint/_src/path/snapshot/snapshot.py#L72

For the regular NFS/S3 that doesn't support copy-on-write, this actually leads to O(N^2) traffic usage, because we make a full copy of existing checkpoint on each update. Is there a way to avoid that, even at the cost of disabling the atomicity? I don't really need it in my case by still want to use partial writes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:supportFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions