Skip to content

Improve Spark Connect compatibility for NDS and NDS-H#254

Merged
sayedbilalbari merged 2 commits intoNVIDIA:devfrom
sayedbilalbari:cleanup/spark-connect-dev
Apr 20, 2026
Merged

Improve Spark Connect compatibility for NDS and NDS-H#254
sayedbilalbari merged 2 commits intoNVIDIA:devfrom
sayedbilalbari:cleanup/spark-connect-dev

Conversation

@sayedbilalbari
Copy link
Copy Markdown
Collaborator

Summary

This PR tightens generic Spark Connect compatibility for the benchmark runners without adding router-specific behavior.

It makes the runner and reporting paths more tolerant of Spark Connect environments where sparkContext access, listener
registration, or config exposure differ from classic PySpark. It also documents the expected warning patterns seen during
smoke testing so healthy Spark Connect runs do not look broken.

Changes

  • add --spark_connect support to nds_power.py and nds_h_power.py
  • build remote sessions with SparkSession.builder.remote(...) when requested
  • stop assuming applicationId is always available through sparkContext
  • use spark_session.stop() for session shutdown in the Spark Connect path
  • harden both benchmark report implementations to fall back when sparkContext or listener registration is unavailable
  • document direct Spark Connect execution and expected smoke-test warnings in the NDS and NDS-H READMEs

Why

The previous benchmark code still had a few direct dependencies on classic Spark APIs that are not reliable in Spark Connect
runs, especially around app id lookup, listener registration, and config access.

The goal here is to keep the benchmark-side work generic:

  • no smart-router-specific flags or naming
  • no Aether-specific operational assumptions
  • preserve classic behavior when Spark Connect is not used

Validation

  • python3 -m py_compile nds/nds_power.py nds-h/nds_h_power.py nds/PysparkBenchReport.py utils/python_benchmark_reporter/ PysparkBenchReport.py
  • smoke-tested in the broader AB + Spark Connect + Spark2A flow after cleanup work, with the benchmark-side Spark Connect
    compatibility fallbacks working for that path

Notes

Expected Spark Connect smoke-test warnings are now documented explicitly, including:

  • missing RAPIDS Python listener registration from the client runtime
  • Spark Connect rejecting non-updatable configs such as spark.locality.wait

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 7, 2026

Greptile Summary

This PR adds --spark_connect support to the NDS and NDS-H power runners and hardens the benchmark report implementations to be tolerant of Spark Connect environments where sparkContext access and listener registration may be unavailable. Classic Spark behaviour is fully preserved; both paths have been smoke-tested and the expected warning patterns are now documented in the READMEs.

Confidence Score: 5/5

PR is safe to merge; all remaining concerns are minor quality items

No P0/P1 findings. The two open items from prior review threads (hardcoded 'spark-connect' fallback ID and utils/ vs nds/ PysparkBenchReport divergence) are pre-existing P2 style/traceability concerns that do not block correct benchmark execution. The implementation is functionally correct, well-documented, and smoke-tested.

utils/python_benchmark_reporter/PysparkBenchReport.py has slightly less hardened Spark Connect handling than its nds/ counterpart, but it is functionally safe.

Important Files Changed

Filename Overview
nds/nds_power.py Adds _get_app_id helper and --spark_connect flag; session shutdown already correctly used spark_session.stop()
nds-h/nds_h_power.py Mirrors nds_power.py changes: _get_app_id helper, --spark_connect flag, and spark_session.stop() replacing sparkContext.stop()
nds/PysparkBenchReport.py Adds proactive is_remote() guard via _is_spark_400_or_later for clean Spark Connect handling; listener registration skipped properly in remote mode
utils/python_benchmark_reporter/PysparkBenchReport.py Adds try/except fallback for sparkContext conf access and broadens listener exception type; reactive rather than proactive Spark Connect guard compared to nds/ sibling
nds/README.md Documents Spark Connect execution path and expected smoke-test warnings for NDS power runs
nds-h/README.md Documents Spark Connect execution path and expected smoke-test warnings for NDS-H power runs

Sequence Diagram

sequenceDiagram
    participant CLI as CLI Caller
    participant Runner as nds_power / nds_h_power
    participant Builder as SparkSession.builder
    participant SC as Spark Connect Server
    participant Classic as Classic SparkContext

    CLI->>Runner: --spark_connect sc://host:port
    Runner->>Builder: builder.remote(spark_connect)
    Builder->>SC: getOrCreate() via Spark Connect
    SC-->>Runner: SparkSession (remote)
    Runner->>Runner: _get_app_id(session)
    Note right of Runner: 1. conf.get(spark.app.id)<br/>2. sparkContext.applicationId<br/>3. fallback spark-connect
    Runner->>SC: run queries / setup tables
    SC-->>Runner: results
    Runner->>SC: spark_session.stop()

    CLI->>Runner: (no --spark_connect)
    Runner->>Builder: builder (classic)
    Builder->>Classic: getOrCreate()
    Classic-->>Runner: SparkSession (classic)
    Runner->>Runner: _get_app_id(session)
    Note right of Runner: conf.get fails → sparkContext.applicationId
    Runner->>Classic: run queries
    Classic-->>Runner: results
    Runner->>Classic: spark_session.stop()
Loading

Greploops — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.
Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.

Reviews (2): Last reviewed commit: "Document Spark Connect smoke-test expect..." | Re-trigger Greptile

Comment on lines +59 to +64
def _get_spark_conf(self):
try:
return self.spark_session.sparkContext._conf.getAll()
except Exception:
get_all = getattr(self.spark_session.conf, 'getAll', None)
return get_all() if callable(get_all) else (get_all or [])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Spark Connect handling diverges from nds/ sibling

nds/PysparkBenchReport.py received a proactive is_remote() guard (via _is_spark_400_or_later) that skips sparkContext entirely in Spark Connect mode, while this copy only adds a try/except fallback. Consequently, in Spark Connect runs this file still attempts sparkContext._conf.getAll(), prints a warning on failure, and does not have a clean skip path for PythonListener registration. The broadened except Exception on the listener (line 88) handles it, but the nds/ version avoids the failed attempt entirely. Aligning this file with the nds/ approach would make Spark Connect behaviour consistent across both report implementations.

Comment thread nds-h/nds_h_power.py
Comment on lines +61 to +69
def _get_app_id(spark_session):
# Spark Connect may not expose applicationId through sparkContext.
try:
return spark_session.conf.get("spark.app.id")
except Exception:
try:
return spark_session.sparkContext.applicationId
except Exception:
return "spark-connect"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded fallback app ID makes benchmark runs indistinguishable in CSV

When both conf.get("spark.app.id") and sparkContext.applicationId raise, every row in the time-log CSV is tagged application_id = "spark-connect". Multiple runs against the same endpoint will produce CSVs that cannot be distinguished by application_id. Appending a timestamp (e.g. "spark-connect-{}".format(int(time.time() * 1000))) would give each run a unique identifier. The identical pattern appears in nds/nds_power.py.

Suggested change
def _get_app_id(spark_session):
# Spark Connect may not expose applicationId through sparkContext.
try:
return spark_session.conf.get("spark.app.id")
except Exception:
try:
return spark_session.sparkContext.applicationId
except Exception:
return "spark-connect"
def _get_app_id(spark_session):
# Spark Connect may not expose applicationId through sparkContext.
try:
return spark_session.conf.get("spark.app.id")
except Exception:
try:
return spark_session.sparkContext.applicationId
except Exception:
return "spark-connect-{}".format(int(time.time() * 1000))

Signed-off-by: Sayed Bilal Bari <sbari@nvidia.com>
Signed-off-by: Sayed Bilal Bari <sbari@nvidia.com>
@sayedbilalbari sayedbilalbari force-pushed the cleanup/spark-connect-dev branch from b485533 to 2c5ce65 Compare April 7, 2026 02:02
@sayedbilalbari sayedbilalbari self-assigned this Apr 7, 2026
@sayedbilalbari sayedbilalbari merged commit eb4c759 into NVIDIA:dev Apr 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants