sql/inspect: add protected timestamp for "now" AOST case #160138

rafiss · 2025-12-24T22:14:32Z

Previously, INSPECT jobs without a historical AS OF SYSTEM TIME clause
would not create protected timestamp records, but still used an AOST
clause with the current timestamp. If span processing took a long time
(especially with BulkLowQoS admission control), garbage collection could
occur before the query completed, resulting in "batch timestamp must be
after replica GC threshold" errors.

This change adds per-span protected timestamp protection when INSPECT
uses "now" as the AOST. The implementation uses a coordinator-based
approach where:

When a processor starts processing a span and picks "now" as the
timestamp, it sends a new "span started" progress message containing
the span and timestamp via InspectProcessorProgress.
The coordinator's progress tracker receives this message and calls
TryToProtectBeforeGC for the relevant tables in that span. This
waits until 80% of the GC TTL has elapsed before creating a PTS,
avoiding unnecessary PTS creation for quick operations.
When span processing completes (existing behavior), the coordinator
cleans up the PTS for that span. Any remaining PTS records are
cleaned up when the tracker terminates (e.g., on job cancellation).

This coordinator-based design keeps PTS management centralized rather
than distributed across processors, simplifying cleanup and error
handling. PTS failures are logged but don't fail the job since the
protection is best-effort.

sql/inspect: use minimum timestamp for PTS protection

Previously, the INSPECT job called TryToProtectBeforeGC per span with
different timestamps. Since the job only stores one PTS record, each
new span's call to Protect would update the existing record's timestamp
via UpdateTimestamp, which removes protection for older spans.

To address this, this patch changes the PTS strategy to track the
minimum (oldest) timestamp across all active spans and protect only at
that timestamp. Since PROTECT_AFTER mode protects all data at or after
the specified timestamp, protecting at the minimum covers all active
spans. When the oldest span completes, the PTS is updated to the new
minimum timestamp, allowing GC of data between the old and new minimum.

Resolves: #159866
Epic: None

Release note: None

cockroach-teamcity · 2025-12-24T22:14:43Z

This change is

fqazi

@fqazi reviewed 3 files and all commit messages, and made 1 comment.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @spilchen).

Previously, INSPECT jobs without a historical AS OF SYSTEM TIME clause would not create protected timestamp records, but still used an AOST clause with the current timestamp. If span processing took a long time (especially with BulkLowQoS admission control), garbage collection could occur before the query completed, resulting in "batch timestamp must be after replica GC threshold" errors. This change adds per-span protected timestamp protection when INSPECT uses "now" as the AOST. The implementation uses a coordinator-based approach where: 1. When a processor starts processing a span and picks "now" as the timestamp, it sends a new "span started" progress message containing the span and timestamp via InspectProcessorProgress. 2. The coordinator's progress tracker receives this message and calls TryToProtectBeforeGC for the relevant tables in that span. This waits until 80% of the GC TTL has elapsed before creating a PTS, avoiding unnecessary PTS creation for quick operations. 3. When span processing completes (existing behavior), the coordinator cleans up the PTS for that span. Any remaining PTS records are cleaned up when the tracker terminates (e.g., on job cancellation). This coordinator-based design keeps PTS management centralized rather than distributed across processors, simplifying cleanup and error handling. PTS failures are logged but don't fail the job since the protection is best-effort. Resolves: cockroachdb#159866 Release note: None

In addition to checkpointing in the job, now we also log progress to text logs periodically in order to enhance observability. Release note: None

dt · 2025-12-30T18:04:56Z

Moving from span to object ID in TryToProtectBeforeGC seems good to me, since the PTS system has been defined in terms of objects rather than spans for a while now. Could we go even further and use table ID in the message from the workers to the coordinator? I'd also imagine we only need to protect a given table once, even if there are multiple processors running on (spans of) it?

That said, doesn't this clobber the PTS ID stored in the job each time it is run? The switch in jobsprotectedts.Protect is actually on the list of to-be-deleted anti-patterns in jobs, at least the part that switches over specific job types to record the PTS ID in their individual legacy progress, since how any one job stores its own state should be wholly confined to its own implementation (and should stop using legacy progress). Do we need to persist the individual PTS separately in an info key (updated transactionally with creating them) to ensure they're all durably recorded for cleanup rather than the in-memory map of cleanups?

Aside, speaking of persisting: `InspectProcessorProgress isn't persisted, right? so it should be in execinfrapb instead of jobspb?

github-actions · 2026-01-05T19:47:50Z

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

rafiss · 2026-01-05T19:49:46Z

doesn't this clobber the PTS ID stored in the job each time it is run?

Thanks for pointing that out. I didn't realize that the PTS system would clobber. I had thought it would protect all the timestamps until each one had its cleaner called, but on further reflection the way that it works now makes sense.

I have reworked this (currently in a separate commit, but I can squash if that's preferred) so that INSPECT only protects the minimum timestamp that is currently being used by all processors. When that timestamp is done, that PTS record will be cleaned up and the coordinator finds the next smallest timestamp that needs to be protected.

Could we go even further and use table ID in the message from the workers to the coordinator?

I don't think so. We still need to know when a specific span starts being processed and is done being processed, since the timestamp that needs to be protected then later cleaned up is associated with that specific span.

Previously, TryToProtectBeforeGC accepted a catalog.TableDescriptor parameter but only used it to call GetID() in two places. This was unnecessarily restrictive and forced callers to load a full table descriptor just to pass the ID. This change simplifies the function signature to accept a descpb.ID directly. The most significant improvement is in inspect/progress.go, where this eliminates an unnecessary DescsTxn call that was only used to load the descriptor for its ID. Release note: None Epic: None

github-actions · 2026-01-05T20:25:37Z

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

spilchen

Looks great. Just a couple of minor nits.

@spilchen partially reviewed 7 files and made 3 comments.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @fqazi, @kev-cao, and @rafiss).

pkg/sql/inspect/progress.go line 342 at r8 (raw file):

	if !needsNewPTS {
		log.VEventf(ctx, 2, "INSPECT: span %s at %s covered by existing PTS at %s",
			spanStarted, tsToProtect, t.mu.currentPTSTimestamp)

do we have a mutex held for t.mu.currentPTSTimestamp?

pkg/sql/inspect/progress.go line 361 at r8 (raw file):

		_, tableID, err := t.codec.DecodeTablePrefix(spanStarted.Key)
		if err != nil {
			log.Dev.Warningf(ctx, "failed to decode table ID from span %s: %v", spanStarted, err)

do we need to remove the span from activeSpanTimestamps if we get in here? Or can we extract the tableID sooner, right at the very start?

rafiss

@rafiss made 2 comments.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @fqazi, @kev-cao, and @spilchen).

pkg/sql/inspect/progress.go line 342 at r8 (raw file):

Previously, spilchen wrote…

do we have a mutex held for t.mu.currentPTSTimestamp?

nice catch; fixed

pkg/sql/inspect/progress.go line 361 at r8 (raw file):

Previously, spilchen wrote…

do we need to remove the span from activeSpanTimestamps if we get in here? Or can we extract the tableID sooner, right at the very start?

i don't think we need to remove it. keeping it in activeSpanTimestamps actually seems desirable, since the span is still getting actively processed, even if we fail to add the protected timestamp (or fail to decode the key)

github-actions · 2026-01-06T21:55:37Z

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

Previously, the INSPECT job called TryToProtectBeforeGC per span with different timestamps. Since the job only stores one PTS record, each new span's call to Protect would update the existing record's timestamp via UpdateTimestamp, which removes protection for older spans. To address this, this patch changes the PTS strategy to track the minimum (oldest) timestamp across all active spans and protect only at that timestamp. Since PROTECT_AFTER mode protects all data at or after the specified timestamp, protecting at the minimum covers all active spans. When the oldest span completes, the PTS is updated to the new minimum timestamp, allowing GC of data between the old and new minimum. Release note: None

rafiss · 2026-01-06T22:02:06Z

tftr!

bors r+

160138: sql/inspect: add protected timestamp for "now" AOST case r=rafiss a=rafiss Previously, INSPECT jobs without a historical AS OF SYSTEM TIME clause would not create protected timestamp records, but still used an AOST clause with the current timestamp. If span processing took a long time (especially with BulkLowQoS admission control), garbage collection could occur before the query completed, resulting in "batch timestamp must be after replica GC threshold" errors. This change adds per-span protected timestamp protection when INSPECT uses "now" as the AOST. The implementation uses a coordinator-based approach where: 1. When a processor starts processing a span and picks "now" as the timestamp, it sends a new "span started" progress message containing the span and timestamp via InspectProcessorProgress. 2. The coordinator's progress tracker receives this message and calls TryToProtectBeforeGC for the relevant tables in that span. This waits until 80% of the GC TTL has elapsed before creating a PTS, avoiding unnecessary PTS creation for quick operations. 3. When span processing completes (existing behavior), the coordinator cleans up the PTS for that span. Any remaining PTS records are cleaned up when the tracker terminates (e.g., on job cancellation). This coordinator-based design keeps PTS management centralized rather than distributed across processors, simplifying cleanup and error handling. PTS failures are logged but don't fail the job since the protection is best-effort. ### sql/inspect: use minimum timestamp for PTS protection Previously, the INSPECT job called TryToProtectBeforeGC per span with different timestamps. Since the job only stores one PTS record, each new span's call to Protect would update the existing record's timestamp via UpdateTimestamp, which removes protection for older spans. To address this, this patch changes the PTS strategy to track the minimum (oldest) timestamp across all active spans and protect only at that timestamp. Since PROTECT_AFTER mode protects all data at or after the specified timestamp, protecting at the minimum covers all active spans. When the oldest span completes, the PTS is updated to the new minimum timestamp, allowing GC of data between the old and new minimum. Resolves: #159866 Epic: None Release note: None 160570: sql/ttl: enable TTL tests to run with secondary tenants r=rafiss a=rafiss Previously, TTL tests used `TestIsForStuffThatShouldWorkWithSecondaryTenantsButDoesntYet` and manually controlled tenant creation. This prevented the tests from benefiting from the standard tenant randomization in the test framework. This commit makes several changes to enable TTL tests to work with tenants: 1. Updates `newRowLevelTTLTestJobTestHelper` to use `ApplicationLayer(0)` instead of manually starting tenants, leveraging the built-in tenant randomization logic. 2. Fixes `SplitTable` in testcluster to use `TestingMakePrimaryIndexKeyForTenant` with the correct codec, so range splits work correctly for tenant tables. 3. Fixes external process tenant startup to propagate version settings from the parent, allowing tenants to start when the cluster is running at an older version (e.g., MinSupported). 4. Removes the `testMultiTenant` parameter from the test helper since tenant mode is now controlled by the framework's randomization. Resolves: #109391 Release note: None Co-authored-by: Rafi Shamim <[email protected]>

craig · 2026-01-06T23:12:51Z

Build failed (retrying...):

unit_tests

craig · 2026-01-07T00:31:49Z

Build succeeded:

blathers-crl · 2026-01-07T00:32:03Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

error creating merge commit from 0c988ff to blathers/backport-release-26.1-160138: POST https://api.github.com/repos/rafiss/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 26.1.x failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

rafiss requested a review from spilchen December 24, 2025 22:14

rafiss added the backport-26.1.x Flags PRs that need to be backported to 26.1 label Dec 24, 2025

rafiss requested a review from a team as a code owner December 24, 2025 22:14

fqazi approved these changes Dec 24, 2025

View reviewed changes

rafiss force-pushed the inspect-per-span-now-pts branch from 36f90d0 to 62a680e Compare December 30, 2025 06:36

rafiss requested review from a team as code owners December 30, 2025 06:36

rafiss requested review from kev-cao and removed request for a team December 30, 2025 06:36

rafiss force-pushed the inspect-per-span-now-pts branch from 62a680e to 07daa4b Compare December 30, 2025 07:12

rafiss added 2 commits December 30, 2025 12:17

inspect: log progress periodically

bb4759d

In addition to checkpointing in the job, now we also log progress to text logs periodically in order to enhance observability. Release note: None

rafiss force-pushed the inspect-per-span-now-pts branch from 07daa4b to bb4759d Compare December 30, 2025 17:17

rafiss mentioned this pull request Dec 30, 2025

jobsprotectedts: simplify TryToProtectBeforeGC to accept descpb.ID #160292

Closed

github-actions bot added the o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. label Jan 5, 2026

rafiss changed the title ~~sql/inspect: add per-span protected timestamp for "now" AOST case~~ sql/inspect: add protected timestamp for "now" AOST case Jan 5, 2026

rafiss requested a review from fqazi January 5, 2026 20:20

rafiss force-pushed the inspect-per-span-now-pts branch from 4e881fe to c344d07 Compare January 5, 2026 20:22

rafiss added the O-AI-Review-Not-Helpful AI reviewer produced result which was incorrect or unhelpful label Jan 5, 2026

spilchen approved these changes Jan 6, 2026

View reviewed changes

rafiss commented Jan 6, 2026

View reviewed changes

rafiss force-pushed the inspect-per-span-now-pts branch from c344d07 to 3224624 Compare January 6, 2026 21:52

rafiss force-pushed the inspect-per-span-now-pts branch from 3224624 to 1e07873 Compare January 6, 2026 22:01

craig bot merged commit 2d75bf2 into cockroachdb:master Jan 7, 2026
22 of 24 checks passed

celeste-cockroachdb bot added the target-release-26.2.0 label Jan 7, 2026

blathers-crl bot added the backport-failed label Jan 7, 2026

rafiss deleted the inspect-per-span-now-pts branch January 7, 2026 00:58

rafiss mentioned this pull request Jan 7, 2026

release-26.1: sql/inspect: add protected timestamp for "now" AOST case #160587

Merged

rafiss removed the backport-failed label Jan 7, 2026

sql/inspect: add protected timestamp for "now" AOST case #160138

sql/inspect: add protected timestamp for "now" AOST case #160138

Uh oh!

Conversation

rafiss commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

sql/inspect: use minimum timestamp for PTS protection

Uh oh!

cockroach-teamcity commented Dec 24, 2025

Uh oh!

fqazi left a comment

Choose a reason for hiding this comment

Uh oh!

dt commented Dec 30, 2025

Uh oh!

github-actions bot commented Jan 5, 2026

Potential Bug(s) Detected

Uh oh!

rafiss commented Jan 5, 2026

Uh oh!

github-actions bot commented Jan 5, 2026

Potential Bug(s) Detected

Uh oh!

spilchen left a comment

Choose a reason for hiding this comment

Uh oh!

rafiss left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 6, 2026

Potential Bug(s) Detected

Uh oh!

rafiss commented Jan 6, 2026

Uh oh!

craig bot commented Jan 6, 2026

Uh oh!

craig bot commented Jan 7, 2026

Uh oh!

Uh oh!

blathers-crl bot commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rafiss commented Dec 24, 2025 •

edited

Loading