Skip to content

feat(observability): query gateway core + dashboard shell#5378

Draft
Ma77Ball wants to merge 50 commits into
apache:mainfrom
Ma77Ball:obs/pr4/gateway-core
Draft

feat(observability): query gateway core + dashboard shell#5378
Ma77Ball wants to merge 50 commits into
apache:mainfrom
Ma77Ball:obs/pr4/gateway-core

Conversation

@Ma77Ball

@Ma77Ball Ma77Ball commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

Introduces the tenant-scoped read path the dashboard queries, plus the Angular shell that hosts the per-signal panels. This PR warrants the closest review: it enforces tenancy and rate limiting and is the first user-visible surface.
Backend:

  • BackendClient: HTTP client to the telemetry backends.
  • ScopeResolver: derives the caller's tenant scope and constrains every query to it.
  • RateLimiter, AuditLogger, GatewayContext: per-request rate limiting, audit logging, and shared request context.
  • dtos.scala: typed request objects with validators (time window, page size, free text, service name).
  • ObservabilityResources with the /observability/health endpoint, registered in TexeraWebApplication.
  • RequestContextMdcFilter and UserContextMdcFilter: inject request and user context into the logging MDC.
  • ObservabilityGatewayConfig and its configuration file.
    Frontend:
  • Observability dashboard page, route, and navigation entry, plus observability.service, observability.types, and the traces-pivot.service.
  • Health gating: each tab is guarded by the per-signal reachability check; an unreachable signal renders an explicit state rather than a broken panel. Signal panels follow in PR5 through PR8.

Any related issues, documentation, or discussions?

Closes: #5370
Part of #4070. Stacked on #5377.

How was this PR tested?

  • Backend specs for the gateway core, DTO validation, scope resolver, rate limiter, and MDC filters; sbt scalafmtCheckAll passes.
  • Frontend component and service specs; prettier-eslint and eslint pass.
  • Compile and the full test suites run in this PR's CI.

Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.8 in compliance with ASF

Ma77Ball and others added 4 commits June 5, 2026 04:49
…, SDK bootstrap (default-off)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt scope, health, routing)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ca/eBPF profiling

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tracing primitives

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added engine dependencies Pull requests that update a dependency file frontend Changes related to the frontend GUI docs Changes related to documentations dev common labels Jun 5, 2026
@codecov-commenter

codecov-commenter commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 57.92350% with 385 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.22%. Comparing base (5869492) to head (d93fa83).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...apache/texera/web/observability/gateway/dtos.scala 42.93% 103 Missing and 6 partials ⚠️
...ala/org/apache/texera/observability/OtelInit.scala 63.30% 43 Missing and 8 partials ⚠️
...texera/web/observability/gateway/AuditLogger.scala 0.00% 41 Missing ⚠️
...observability/gateway/ObservabilityResources.scala 0.00% 38 Missing ⚠️
...xera/web/observability/gateway/ScopeResolver.scala 19.44% 27 Missing and 2 partials ⚠️
...era/web/observability/gateway/GatewayContext.scala 0.00% 19 Missing ⚠️
...a/org/apache/texera/web/TexeraWebApplication.scala 0.00% 14 Missing ⚠️
...org/apache/texera/observability/TexeraTracer.scala 0.00% 14 Missing ⚠️
...xera/web/observability/gateway/BackendClient.scala 72.91% 5 Missing and 8 partials ⚠️
...rg/apache/texera/observability/TexeraMetrics.scala 84.93% 4 Missing and 7 partials ⚠️
... and 11 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5378      +/-   ##
============================================
+ Coverage     53.17%   53.22%   +0.04%     
- Complexity     2660     2697      +37     
============================================
  Files          1094     1117      +23     
  Lines         42363    43256     +893     
  Branches       4556     4717     +161     
============================================
+ Hits          22528    23022     +494     
- Misses        18507    18858     +351     
- Partials       1328     1376      +48     
Flag Coverage Δ *Carryforward flag
access-control-service 70.44% <ø> (ø)
agent-service 34.36% <ø> (ø) Carriedforward from 263093a
amber 53.52% <53.72%> (-0.02%) ⬇️
computing-unit-managing-service 1.65% <ø> (ø)
config-service 56.71% <ø> (ø)
file-service 57.06% <ø> (ø)
frontend 48.21% <93.75%> (+0.25%) ⬆️
pyamber 89.84% <ø> (-0.29%) ⬇️ Carriedforward from 263093a
python 90.80% <ø> (ø) Carriedforward from 263093a
workflow-compiling-service 58.69% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions github-actions Bot added the platform Non-amber Scala service paths label Jun 5, 2026
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

⚠️ Benchmark changes need a look

🟢 2 better · 🔴 5 worse · ⚪ 8 noise (<±5%) · 0 without baseline

Compared against main 5869492 benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🔴 bs=10 sw=10 sl=64 387 0.236 25,044/38,420/38,420 us 🔴 +13.1% / 🔴 +9.8%
🟢 bs=100 sw=10 sl=64 967 0.59 102,719/118,095/118,095 us 🟢 -10.6% / 🟢 -15.5%
bs=1000 sw=10 sl=64 1,111 0.678 900,949/944,001/944,001 us ⚪ within ±5% / 🟢 -7.7%
Baseline details

Latest main 5869492 from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 387 tuples/sec 417 tuples/sec 410.82 tuples/sec -7.2% -5.8%
bs=10 sw=10 sl=64 MB/s 0.236 MB/s 0.254 MB/s 0.251 MB/s -7.1% -5.9%
bs=10 sw=10 sl=64 p50 25,044 us 22,947 us 23,785 us +9.1% +5.3%
bs=10 sw=10 sl=64 p95 38,420 us 33,956 us 34,980 us +13.1% +9.8%
bs=10 sw=10 sl=64 p99 38,420 us 33,956 us 34,980 us +13.1% +9.8%
bs=100 sw=10 sl=64 throughput 967 tuples/sec 954 tuples/sec 891.94 tuples/sec +1.4% +8.4%
bs=100 sw=10 sl=64 MB/s 0.59 MB/s 0.582 MB/s 0.544 MB/s +1.4% +8.4%
bs=100 sw=10 sl=64 p50 102,719 us 103,873 us 112,277 us -1.1% -8.5%
bs=100 sw=10 sl=64 p95 118,095 us 132,080 us 139,802 us -10.6% -15.5%
bs=100 sw=10 sl=64 p99 118,095 us 132,080 us 139,802 us -10.6% -15.5%
bs=1000 sw=10 sl=64 throughput 1,111 tuples/sec 1,118 tuples/sec 1,041 tuples/sec -0.6% +6.7%
bs=1000 sw=10 sl=64 MB/s 0.678 MB/s 0.683 MB/s 0.635 MB/s -0.7% +6.7%
bs=1000 sw=10 sl=64 p50 900,949 us 899,927 us 972,714 us +0.1% -7.4%
bs=1000 sw=10 sl=64 p95 944,001 us 930,096 us 1,023,057 us +1.5% -7.7%
bs=1000 sw=10 sl=64 p99 944,001 us 930,096 us 1,023,057 us +1.5% -7.7%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,516.21,200,128000,387,0.236,25044.44,38419.84,38419.84
1,100,10,64,20,2067.54,2000,1280000,967,0.590,102719.18,118094.86,118094.86
2,1000,10,64,20,17994.38,20000,12800000,1111,0.678,900948.57,944000.53,944000.53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common dependencies Pull requests that update a dependency file dev docs Changes related to documentations engine frontend Changes related to the frontend GUI platform Non-amber Scala service paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Observability] Tenant-scoped query gateway and dashboard shell

2 participants