branch-4.0: [fix](recycler) Add recycle state for rs meta to avoid data loss #58459#59765
Merged
yiguolei merged 1 commit intoapache:branch-4.0from Jan 13, 2026
Merged
Conversation
…che#58459) Add a `RECYCLE` state for rowset/meta (rs meta) and update the recycler logic to mark metadata as `RECYCLE` before final deletion. This reduces the risk of accidental data loss. ## Problem The recycler sometimes deletes rs meta too early (race conditions, restarts, or recovery cases), which can cause metadata and file inconsistencies or data loss. ## Solution - Introduce a `RECYCLE` intermediate state for rs meta. - When an item is chosen for cleanup, mark it `RECYCLE` and record a timestamp. - Only perform the final delete after a confirmation window or additional checks. - Make recovery/restart logic treat `RECYCLE` items as recoverable until final deletion. ## Main changes - Add `RECYCLE` to the rs meta state enum. - Update metadata APIs to set/query `RECYCLE`. - Update recycler to use two-step deletion: ***mark -> confirm -> abort txn/job and delete***. - Add logs and tests for the new flow. ## Test case ``` 1. begin_txn -> prepare_rowset -> force_recycle -> commit_rowset -> commit_txn 2. start_job -> prepare_rowset -> force_recycle -> commit_rowset -> finish_job Rowset will be marked as recycled to prevent commit_rowset and finish job/txn 3. begin_txn -> prepare_rowset -> commit_rowset -> force_recycle -> commit_txn 4. start_job -> prepare_rowset -> commit_rowset -> force_recycle -> finish_job Rowset will be marked as recycled to prevent finish job/txn 5. begin_txn -> prepare_rowset -> force_recycle * 2 -> commit_rowset -> commit_txn 6. start_job -> prepare_rowset -> force_recycle * 2 -> commit_rowset -> finish_job 7. begin_txn -> prepare_rowset -> commit_rowset -> force_recycle * 2 -> commit_txn 9. start_job -> prepare_rowset -> commit_rowset -> force_recycle * 2 -> finish_job 10. delete_job -> commit_rowset -> force_recycle * 2 -> finish_job 11. delete_job -> prepare_rowset -> commit_rowset -> force_recycle * 2 -> finish_job 12. delete_job -> prepare_rowset -> force_recycle * 2 -> commit_rowset -> finish_job Double recycle job will mark rowset as recycled and abort job/txn, then delete data and kv ```
Contributor
Author
|
run buildall |
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
yiguolei
approved these changes
Jan 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
pick: #58459