Improve Spark Connect compatibility for NDS and NDS-H by sayedbilalbari · Pull Request #254 · NVIDIA/spark-rapids-benchmarks

sayedbilalbari · 2026-04-07T01:55:11Z

Summary

This PR tightens generic Spark Connect compatibility for the benchmark runners without adding router-specific behavior.

It makes the runner and reporting paths more tolerant of Spark Connect environments where sparkContext access, listener
registration, or config exposure differ from classic PySpark. It also documents the expected warning patterns seen during
smoke testing so healthy Spark Connect runs do not look broken.

Changes

add --spark_connect support to nds_power.py and nds_h_power.py
build remote sessions with SparkSession.builder.remote(...) when requested
stop assuming applicationId is always available through sparkContext
use spark_session.stop() for session shutdown in the Spark Connect path
harden both benchmark report implementations to fall back when sparkContext or listener registration is unavailable
document direct Spark Connect execution and expected smoke-test warnings in the NDS and NDS-H READMEs

Why

The previous benchmark code still had a few direct dependencies on classic Spark APIs that are not reliable in Spark Connect
runs, especially around app id lookup, listener registration, and config access.

The goal here is to keep the benchmark-side work generic:

no smart-router-specific flags or naming
no Aether-specific operational assumptions
preserve classic behavior when Spark Connect is not used

Validation

python3 -m py_compile nds/nds_power.py nds-h/nds_h_power.py nds/PysparkBenchReport.py utils/python_benchmark_reporter/ PysparkBenchReport.py
smoke-tested in the broader AB + Spark Connect + Spark2A flow after cleanup work, with the benchmark-side Spark Connect
compatibility fallbacks working for that path

Notes

Expected Spark Connect smoke-test warnings are now documented explicitly, including:

missing RAPIDS Python listener registration from the client runtime
Spark Connect rejecting non-updatable configs such as spark.locality.wait

greptile-apps · 2026-04-07T01:59:05Z

Greptile Summary

This PR adds --spark_connect support to the NDS and NDS-H power runners and hardens the benchmark report implementations to be tolerant of Spark Connect environments where sparkContext access and listener registration may be unavailable. Classic Spark behaviour is fully preserved; both paths have been smoke-tested and the expected warning patterns are now documented in the READMEs.

Confidence Score: 5/5

PR is safe to merge; all remaining concerns are minor quality items

No P0/P1 findings. The two open items from prior review threads (hardcoded 'spark-connect' fallback ID and utils/ vs nds/ PysparkBenchReport divergence) are pre-existing P2 style/traceability concerns that do not block correct benchmark execution. The implementation is functionally correct, well-documented, and smoke-tested.

utils/python_benchmark_reporter/PysparkBenchReport.py has slightly less hardened Spark Connect handling than its nds/ counterpart, but it is functionally safe.

Important Files Changed

Filename	Overview
nds/nds_power.py	Adds `_get_app_id` helper and `--spark_connect` flag; session shutdown already correctly used `spark_session.stop()`
nds-h/nds_h_power.py	Mirrors nds_power.py changes: `_get_app_id` helper, `--spark_connect` flag, and `spark_session.stop()` replacing `sparkContext.stop()`
nds/PysparkBenchReport.py	Adds proactive `is_remote()` guard via `_is_spark_400_or_later` for clean Spark Connect handling; listener registration skipped properly in remote mode
utils/python_benchmark_reporter/PysparkBenchReport.py	Adds try/except fallback for sparkContext conf access and broadens listener exception type; reactive rather than proactive Spark Connect guard compared to nds/ sibling
nds/README.md	Documents Spark Connect execution path and expected smoke-test warnings for NDS power runs
nds-h/README.md	Documents Spark Connect execution path and expected smoke-test warnings for NDS-H power runs

Sequence Diagram

sequenceDiagram
    participant CLI as CLI Caller
    participant Runner as nds_power / nds_h_power
    participant Builder as SparkSession.builder
    participant SC as Spark Connect Server
    participant Classic as Classic SparkContext

    CLI->>Runner: --spark_connect sc://host:port
    Runner->>Builder: builder.remote(spark_connect)
    Builder->>SC: getOrCreate() via Spark Connect
    SC-->>Runner: SparkSession (remote)
    Runner->>Runner: _get_app_id(session)
    Note right of Runner: 1. conf.get(spark.app.id)<br/>2. sparkContext.applicationId<br/>3. fallback spark-connect
    Runner->>SC: run queries / setup tables
    SC-->>Runner: results
    Runner->>SC: spark_session.stop()

    CLI->>Runner: (no --spark_connect)
    Runner->>Builder: builder (classic)
    Builder->>Classic: getOrCreate()
    Classic-->>Runner: SparkSession (classic)
    Runner->>Runner: _get_app_id(session)
    Note right of Runner: conf.get fails → sparkContext.applicationId
    Runner->>Classic: run queries
    Classic-->>Runner: results
    Runner->>Classic: spark_session.stop()

_{Greploops — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.
Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.}

_{Reviews (2): Last reviewed commit: "Document Spark Connect smoke-test expect..." | Re-trigger Greptile}

greptile-apps · 2026-04-07T01:59:08Z

+    def _get_spark_conf(self):
+        try:
+            return self.spark_session.sparkContext._conf.getAll()
+        except Exception:
+            get_all = getattr(self.spark_session.conf, 'getAll', None)
+            return get_all() if callable(get_all) else (get_all or [])


Spark Connect handling diverges from nds/ sibling

nds/PysparkBenchReport.py received a proactive is_remote() guard (via _is_spark_400_or_later) that skips sparkContext entirely in Spark Connect mode, while this copy only adds a try/except fallback. Consequently, in Spark Connect runs this file still attempts sparkContext._conf.getAll(), prints a warning on failure, and does not have a clean skip path for PythonListener registration. The broadened except Exception on the listener (line 88) handles it, but the nds/ version avoids the failed attempt entirely. Aligning this file with the nds/ approach would make Spark Connect behaviour consistent across both report implementations.

greptile-apps · 2026-04-07T01:59:09Z

+def _get_app_id(spark_session):
+    # Spark Connect may not expose applicationId through sparkContext.
+    try:
+        return spark_session.conf.get("spark.app.id")
+    except Exception:
+        try:
+            return spark_session.sparkContext.applicationId
+        except Exception:
+            return "spark-connect"


Hardcoded fallback app ID makes benchmark runs indistinguishable in CSV

When both conf.get("spark.app.id") and sparkContext.applicationId raise, every row in the time-log CSV is tagged application_id = "spark-connect". Multiple runs against the same endpoint will produce CSVs that cannot be distinguished by application_id. Appending a timestamp (e.g. "spark-connect-{}".format(int(time.time() * 1000))) would give each run a unique identifier. The identical pattern appears in nds/nds_power.py.

Suggested change

def _get_app_id(spark_session):

# Spark Connect may not expose applicationId through sparkContext.

try:

return spark_session.conf.get("spark.app.id")

except Exception:

try:

return spark_session.sparkContext.applicationId

except Exception:

return "spark-connect"

def _get_app_id(spark_session):

# Spark Connect may not expose applicationId through sparkContext.

try:

return spark_session.conf.get("spark.app.id")

except Exception:

try:

return spark_session.sparkContext.applicationId

except Exception:

return "spark-connect-{}".format(int(time.time() * 1000))

Signed-off-by: Sayed Bilal Bari <sbari@nvidia.com>

greptile-apps Bot reviewed Apr 7, 2026

View reviewed changes

sayedbilalbari added 2 commits April 6, 2026 19:02

Add generic Spark Connect compatibility fallbacks

ed72aa7

Signed-off-by: Sayed Bilal Bari <sbari@nvidia.com>

Document Spark Connect smoke-test expectations

2c5ce65

Signed-off-by: Sayed Bilal Bari <sbari@nvidia.com>

sayedbilalbari force-pushed the cleanup/spark-connect-dev branch from b485533 to 2c5ce65 Compare April 7, 2026 02:02

sayedbilalbari requested a review from mattahrens April 7, 2026 17:21

sayedbilalbari self-assigned this Apr 7, 2026

mattahrens approved these changes Apr 20, 2026

View reviewed changes

sayedbilalbari merged commit eb4c759 into NVIDIA:dev Apr 20, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Spark Connect compatibility for NDS and NDS-H#254

Improve Spark Connect compatibility for NDS and NDS-H#254
sayedbilalbari merged 2 commits intoNVIDIA:devfrom
sayedbilalbari:cleanup/spark-connect-dev

sayedbilalbari commented Apr 7, 2026

Uh oh!

greptile-apps Bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Apr 7, 2026

Uh oh!

greptile-apps Bot Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sayedbilalbari commented Apr 7, 2026

Summary

Changes

Why

Validation

Notes

Uh oh!

greptile-apps Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps Bot commented Apr 7, 2026 •

edited

Loading