-
Notifications
You must be signed in to change notification settings - Fork 4.1k
sql/inspect: add protected timestamp for "now" AOST case #160138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fqazi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fqazi reviewed 3 files and all commit messages, and made 1 comment.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @spilchen).
36f90d0 to
62a680e
Compare
62a680e to
07daa4b
Compare
Previously, INSPECT jobs without a historical AS OF SYSTEM TIME clause
would not create protected timestamp records, but still used an AOST
clause with the current timestamp. If span processing took a long time
(especially with BulkLowQoS admission control), garbage collection could
occur before the query completed, resulting in "batch timestamp must be
after replica GC threshold" errors.
This change adds per-span protected timestamp protection when INSPECT
uses "now" as the AOST. The implementation uses a coordinator-based
approach where:
1. When a processor starts processing a span and picks "now" as the
timestamp, it sends a new "span started" progress message containing
the span and timestamp via InspectProcessorProgress.
2. The coordinator's progress tracker receives this message and calls
TryToProtectBeforeGC for the relevant tables in that span. This
waits until 80% of the GC TTL has elapsed before creating a PTS,
avoiding unnecessary PTS creation for quick operations.
3. When span processing completes (existing behavior), the coordinator
cleans up the PTS for that span. Any remaining PTS records are
cleaned up when the tracker terminates (e.g., on job cancellation).
This coordinator-based design keeps PTS management centralized rather
than distributed across processors, simplifying cleanup and error
handling. PTS failures are logged but don't fail the job since the
protection is best-effort.
Resolves: cockroachdb#159866
Release note: None
In addition to checkpointing in the job, now we also log progress to text logs periodically in order to enhance observability. Release note: None
07daa4b to
bb4759d
Compare
|
Moving from span to object ID in That said, doesn't this clobber the PTS ID stored in the job each time it is run? The switch in jobsprotectedts.Protect is actually on the list of to-be-deleted anti-patterns in jobs, at least the part that switches over specific job types to record the PTS ID in their individual legacy progress, since how any one job stores its own state should be wholly confined to its own implementation (and should stop using legacy progress). Do we need to persist the individual PTS separately in an info key (updated transactionally with creating them) to ensure they're all durably recorded for cleanup rather than the in-memory map of cleanups? Aside, speaking of persisting: `InspectProcessorProgress isn't persisted, right? so it should be in execinfrapb instead of jobspb? |
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
Thanks for pointing that out. I didn't realize that the PTS system would clobber. I had thought it would protect all the timestamps until each one had its I have reworked this (currently in a separate commit, but I can squash if that's preferred) so that INSPECT only protects the minimum timestamp that is currently being used by all processors. When that timestamp is done, that PTS record will be cleaned up and the coordinator finds the next smallest timestamp that needs to be protected.
I don't think so. We still need to know when a specific span starts being processed and is done being processed, since the timestamp that needs to be protected then later cleaned up is associated with that specific span. |
Previously, TryToProtectBeforeGC accepted a catalog.TableDescriptor parameter but only used it to call GetID() in two places. This was unnecessarily restrictive and forced callers to load a full table descriptor just to pass the ID. This change simplifies the function signature to accept a descpb.ID directly. The most significant improvement is in inspect/progress.go, where this eliminates an unnecessary DescsTxn call that was only used to load the descriptor for its ID. Release note: None Epic: None
4e881fe to
c344d07
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
spilchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Just a couple of minor nits.
@spilchen partially reviewed 7 files and made 3 comments.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @fqazi, @kev-cao, and @rafiss).
pkg/sql/inspect/progress.go line 342 at r8 (raw file):
if !needsNewPTS { log.VEventf(ctx, 2, "INSPECT: span %s at %s covered by existing PTS at %s", spanStarted, tsToProtect, t.mu.currentPTSTimestamp)
do we have a mutex held for t.mu.currentPTSTimestamp?
pkg/sql/inspect/progress.go line 361 at r8 (raw file):
_, tableID, err := t.codec.DecodeTablePrefix(spanStarted.Key) if err != nil { log.Dev.Warningf(ctx, "failed to decode table ID from span %s: %v", spanStarted, err)
do we need to remove the span from activeSpanTimestamps if we get in here? Or can we extract the tableID sooner, right at the very start?
rafiss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rafiss made 2 comments.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @fqazi, @kev-cao, and @spilchen).
pkg/sql/inspect/progress.go line 342 at r8 (raw file):
Previously, spilchen wrote…
do we have a mutex held for
t.mu.currentPTSTimestamp?
nice catch; fixed
pkg/sql/inspect/progress.go line 361 at r8 (raw file):
Previously, spilchen wrote…
do we need to remove the span from
activeSpanTimestampsif we get in here? Or can we extract the tableID sooner, right at the very start?
i don't think we need to remove it. keeping it in activeSpanTimestamps actually seems desirable, since the span is still getting actively processed, even if we fail to add the protected timestamp (or fail to decode the key)
c344d07 to
3224624
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
Previously, the INSPECT job called TryToProtectBeforeGC per span with different timestamps. Since the job only stores one PTS record, each new span's call to Protect would update the existing record's timestamp via UpdateTimestamp, which removes protection for older spans. To address this, this patch changes the PTS strategy to track the minimum (oldest) timestamp across all active spans and protect only at that timestamp. Since PROTECT_AFTER mode protects all data at or after the specified timestamp, protecting at the minimum covers all active spans. When the oldest span completes, the PTS is updated to the new minimum timestamp, allowing GC of data between the old and new minimum. Release note: None
3224624 to
1e07873
Compare
|
tftr! bors r+ |
160138: sql/inspect: add protected timestamp for "now" AOST case r=rafiss a=rafiss
Previously, INSPECT jobs without a historical AS OF SYSTEM TIME clause
would not create protected timestamp records, but still used an AOST
clause with the current timestamp. If span processing took a long time
(especially with BulkLowQoS admission control), garbage collection could
occur before the query completed, resulting in "batch timestamp must be
after replica GC threshold" errors.
This change adds per-span protected timestamp protection when INSPECT
uses "now" as the AOST. The implementation uses a coordinator-based
approach where:
1. When a processor starts processing a span and picks "now" as the
timestamp, it sends a new "span started" progress message containing
the span and timestamp via InspectProcessorProgress.
2. The coordinator's progress tracker receives this message and calls
TryToProtectBeforeGC for the relevant tables in that span. This
waits until 80% of the GC TTL has elapsed before creating a PTS,
avoiding unnecessary PTS creation for quick operations.
3. When span processing completes (existing behavior), the coordinator
cleans up the PTS for that span. Any remaining PTS records are
cleaned up when the tracker terminates (e.g., on job cancellation).
This coordinator-based design keeps PTS management centralized rather
than distributed across processors, simplifying cleanup and error
handling. PTS failures are logged but don't fail the job since the
protection is best-effort.
### sql/inspect: use minimum timestamp for PTS protection
Previously, the INSPECT job called TryToProtectBeforeGC per span with
different timestamps. Since the job only stores one PTS record, each
new span's call to Protect would update the existing record's timestamp
via UpdateTimestamp, which removes protection for older spans.
To address this, this patch changes the PTS strategy to track the
minimum (oldest) timestamp across all active spans and protect only at
that timestamp. Since PROTECT_AFTER mode protects all data at or after
the specified timestamp, protecting at the minimum covers all active
spans. When the oldest span completes, the PTS is updated to the new
minimum timestamp, allowing GC of data between the old and new minimum.
Resolves: #159866
Epic: None
Release note: None
160570: sql/ttl: enable TTL tests to run with secondary tenants r=rafiss a=rafiss
Previously, TTL tests used `TestIsForStuffThatShouldWorkWithSecondaryTenantsButDoesntYet` and manually controlled tenant creation. This prevented the tests from benefiting from the standard tenant randomization in the test framework.
This commit makes several changes to enable TTL tests to work with tenants:
1. Updates `newRowLevelTTLTestJobTestHelper` to use `ApplicationLayer(0)` instead of manually starting tenants, leveraging the built-in tenant randomization logic.
2. Fixes `SplitTable` in testcluster to use `TestingMakePrimaryIndexKeyForTenant` with the correct codec, so range splits work correctly for tenant tables.
3. Fixes external process tenant startup to propagate version settings from the parent, allowing tenants to start when the cluster is running at an older version (e.g., MinSupported).
4. Removes the `testMultiTenant` parameter from the test helper since tenant mode is now controlled by the framework's randomization.
Resolves: #109391
Release note: None
Co-authored-by: Rafi Shamim <[email protected]>
|
Build failed (retrying...): |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 0c988ff to blathers/backport-release-26.1-160138: POST https://api.github.com/repos/rafiss/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 26.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Previously, INSPECT jobs without a historical AS OF SYSTEM TIME clause
would not create protected timestamp records, but still used an AOST
clause with the current timestamp. If span processing took a long time
(especially with BulkLowQoS admission control), garbage collection could
occur before the query completed, resulting in "batch timestamp must be
after replica GC threshold" errors.
This change adds per-span protected timestamp protection when INSPECT
uses "now" as the AOST. The implementation uses a coordinator-based
approach where:
When a processor starts processing a span and picks "now" as the
timestamp, it sends a new "span started" progress message containing
the span and timestamp via InspectProcessorProgress.
The coordinator's progress tracker receives this message and calls
TryToProtectBeforeGC for the relevant tables in that span. This
waits until 80% of the GC TTL has elapsed before creating a PTS,
avoiding unnecessary PTS creation for quick operations.
When span processing completes (existing behavior), the coordinator
cleans up the PTS for that span. Any remaining PTS records are
cleaned up when the tracker terminates (e.g., on job cancellation).
This coordinator-based design keeps PTS management centralized rather
than distributed across processors, simplifying cleanup and error
handling. PTS failures are logged but don't fail the job since the
protection is best-effort.
sql/inspect: use minimum timestamp for PTS protection
Previously, the INSPECT job called TryToProtectBeforeGC per span with
different timestamps. Since the job only stores one PTS record, each
new span's call to Protect would update the existing record's timestamp
via UpdateTimestamp, which removes protection for older spans.
To address this, this patch changes the PTS strategy to track the
minimum (oldest) timestamp across all active spans and protect only at
that timestamp. Since PROTECT_AFTER mode protects all data at or after
the specified timestamp, protecting at the minimum covers all active
spans. When the oldest span completes, the PTS is updated to the new
minimum timestamp, allowing GC of data between the old and new minimum.
Resolves: #159866
Epic: None
Release note: None