Skip to content

perf(auth): collapse N+1 in getComputingUnitAccess into one join#5648

Merged
Yicong-Huang merged 8 commits into
apache:mainfrom
Ma77Ball:perf/computing-unit-access-single-join
Jun 14, 2026
Merged

perf(auth): collapse N+1 in getComputingUnitAccess into one join#5648
Yicong-Huang merged 8 commits into
apache:mainfrom
Ma77Ball:perf/computing-unit-access-single-join

Conversation

@Ma77Ball

@Ma77Ball Ma77Ball commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

  • Replace the two-DAO lookup in ComputingUnitAccess.getComputingUnitAccess (fetch the unit by cuid, then fetch all of the user's access rows by uid and filter them in memory) with a single LEFT JOIN of WORKFLOW_COMPUTING_UNIT to COMPUTING_UNIT_USER_ACCESS on (cuid, uid), returning the owner uid and the caller's privilege in one round-trip.
  • Resolve the privilege from that single row: the owner gets WRITE, a present access row yields its privilege, and a null join (no grant) yields NONE.
  • A missing computing unit now resolves to NONE instead of throwing; the sole caller (AccessControlResource) already maps both NONE and a thrown exception to 403 FORBIDDEN, so the externally observable behavior is unchanged.

Performance comparison (previous vs current logic)

Measured with a throwaway benchmark against the same embedded Postgres the unit tests use (MockTexeraDB), exercising the non-owner path (the only path that previously issued a second query). For each grant count the benchmark seeds a user holding that many grants, then times 2000 calls per implementation after a 200-call warmup. Microseconds per call, lower is better:

Grants held by the user Previous (2 queries) Current (1 join) Speedup
1 4382.5 us 2395.1 us 1.83x
10 3892.7 us 1786.2 us 2.18x
100 3260.0 us 1690.4 us 1.93x
1000 3650.8 us 1630.0 us 2.24x

So the rewrite is roughly 2x faster across the board. The speedup stays flat rather than growing with the grant count, which is itself informative: on this loopback setup the dominant cost is the extra database round-trip (two statements vs one), not the in-memory transfer-and-filter of the user's grant rows. Against a real database over a network, where per-statement latency is higher, halving the round-trips should matter at least as much. This addresses the fair concern that "more SQL queries do not necessarily mean slower": here the previous logic issued fewer total joins but paid for an extra round-trip, and measuring confirms the single join wins.

These numbers come from a local micro-benchmark, not committed to the suite, and are absolute timings on embedded Postgres; treat the ratio as the signal, not the raw microseconds.

Any related issues, documentation, discussions?

Closes: #5645

How was this PR tested?

  • Added ComputingUnitAccessSpec covering the owner (WRITE), shared-grant (granted privilege), no-grant (NONE), and missing-unit (NONE) cases.
  • sbt Auth/compile and sbt Auth/test, expect a clean compile of the rewritten jOOQ query and a passing spec.
  • Manual, via the access-control route backed by AccessControlResource: request a computing unit as its owner, expect WRITE; as a user holding a shared grant, expect that grant's privilege; as a user with no grant, or for a non-existent cuid, expect 403 FORBIDDEN.
  • Behavior-preservation check: confirm the only caller maps NONE to 403, so the missing-unit path (previously a thrown NPE, now NONE) still yields 403.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8 (Claude Code), in compliance with the ASF Generative Tooling Guidance

Ma77Ball added 5 commits June 10, 2026 20:53
  into a single LEFT JOIN

  Replace the two-DAO lookup (fetch unit, then fetch
  all of the user's
  access rows and filter in memory) with one LEFT JOIN
  returning the unit
  owner and the caller's privilege for this cuid.
  Missing unit now resolves
  to NONE instead of throwing; the sole caller maps
  both to FORBIDDEN.
@github-actions github-actions Bot added ci changes related to CI common labels Jun 12, 2026
@codecov-commenter

codecov-commenter commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.85%. Comparing base (aca41f3) to head (08e717e).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5648      +/-   ##
============================================
- Coverage     52.85%   52.85%   -0.01%     
  Complexity     2620     2620              
============================================
  Files          1090     1090              
  Lines         42176    42178       +2     
  Branches       4531     4530       -1     
============================================
  Hits          22293    22293              
- Misses        18573    18576       +3     
+ Partials       1310     1309       -1     
Flag Coverage Δ *Carryforward flag
access-control-service 70.91% <ø> (-0.52%) ⬇️
agent-service 34.36% <ø> (ø) Carriedforward from 4fd395b
amber 52.95% <100.00%> (+0.08%) ⬆️
computing-unit-managing-service 1.65% <ø> (ø)
config-service 56.71% <ø> (ø)
file-service 57.06% <ø> (ø)
frontend 47.86% <ø> (-0.07%) ⬇️ Carriedforward from 4fd395b
pyamber 90.72% <ø> (ø) Carriedforward from 4fd395b
python 90.74% <ø> (ø) Carriedforward from 4fd395b
workflow-compiling-service 58.69% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

✅ No material benchmark regressions detected

🟢 2 better · 🔴 0 worse · ⚪ 13 noise (<±5%) · 0 without baseline

Compared against main aca41f3 benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🟢 bs=10 sw=10 sl=64 435 0.265 21,840/30,613/30,613 us 🟢 -21.9% / 🟢 -12.5%
bs=100 sw=10 sl=64 857 0.523 116,870/144,742/144,742 us ⚪ within ±5% / ⚪ within ±5%
bs=1000 sw=10 sl=64 958 0.585 1,043,537/1,079,883/1,079,883 us ⚪ within ±5% / 🔴 -8.0%
Baseline details

Latest main aca41f3 from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 435 tuples/sec 419 tuples/sec 410.82 tuples/sec +3.8% +5.9%
bs=10 sw=10 sl=64 MB/s 0.265 MB/s 0.256 MB/s 0.251 MB/s +3.5% +5.7%
bs=10 sw=10 sl=64 p50 21,840 us 21,555 us 23,785 us +1.3% -8.2%
bs=10 sw=10 sl=64 p95 30,613 us 39,211 us 34,980 us -21.9% -12.5%
bs=10 sw=10 sl=64 p99 30,613 us 39,211 us 34,980 us -21.9% -12.5%
bs=100 sw=10 sl=64 throughput 857 tuples/sec 873 tuples/sec 891.94 tuples/sec -1.8% -3.9%
bs=100 sw=10 sl=64 MB/s 0.523 MB/s 0.533 MB/s 0.544 MB/s -1.9% -3.9%
bs=100 sw=10 sl=64 p50 116,870 us 112,435 us 112,277 us +3.9% +4.1%
bs=100 sw=10 sl=64 p95 144,742 us 141,802 us 139,802 us +2.1% +3.5%
bs=100 sw=10 sl=64 p99 144,742 us 141,802 us 139,802 us +2.1% +3.5%
bs=1000 sw=10 sl=64 throughput 958 tuples/sec 966 tuples/sec 1,041 tuples/sec -0.8% -8.0%
bs=1000 sw=10 sl=64 MB/s 0.585 MB/s 0.59 MB/s 0.635 MB/s -0.8% -7.9%
bs=1000 sw=10 sl=64 p50 1,043,537 us 1,026,400 us 972,714 us +1.7% +7.3%
bs=1000 sw=10 sl=64 p95 1,079,883 us 1,105,947 us 1,023,057 us -2.4% +5.6%
bs=1000 sw=10 sl=64 p99 1,079,883 us 1,105,947 us 1,023,057 us -2.4% +5.6%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,460.19,200,128000,435,0.265,21839.71,30613.21,30613.21
1,100,10,64,20,2334.19,2000,1280000,857,0.523,116870.05,144742.12,144742.12
2,1000,10,64,20,20881.12,20000,12800000,958,0.585,1043537.40,1079882.99,1079882.99

@Yicong-Huang Yicong-Huang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction looks promising. @Ma77Ball a few changes needed:

  1. can you do a simple time comparison on previous logic vs current logic and put it into PR description? sometimes more SQL queries do not necessarily mean slower.
  2. we need test coverage.
  3. please remove unrelated files.

Comment thread .github/workflows/template-compliance-warning.yml
Comment thread .github/template-compliance-warning.txt
@github-actions github-actions Bot added dependencies Pull requests that update a dependency file and removed ci changes related to CI labels Jun 13, 2026
@Ma77Ball Ma77Ball requested a review from Yicong-Huang June 14, 2026 02:08

@Yicong-Huang Yicong-Huang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the benchmark result!

@Yicong-Huang Yicong-Huang added this pull request to the merge queue Jun 14, 2026
Merged via the queue into apache:main with commit 01e070f Jun 14, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Collapse N+1 query in getComputingUnitAccess into a single join

3 participants