Replace docker exec gnoi_client with native Python gRPC calls#367
Replace docker exec gnoi_client with native Python gRPC calls#367sigabrtv1-ui wants to merge 13 commits intosonic-net:masterfrom
Conversation
Proposes replacing subprocess-based gnoi_client invocations in gnoi_shutdown_daemon with direct Python gRPC calls. Documents the gnoi_client output format, a parsing bug in RebootStatus polling, and the difficulty of diagnosing RPC failures through Go panic stack traces. Ref: sonic-net#360 Signed-off-by: Dawei Huang <[email protected]>
…proto Signed-off-by: Dawei Huang <[email protected]>
…tion Signed-off-by: Dawei Huang <[email protected]>
- Add types_pb2.py to vendored stubs file list (types.proto dependency) - Fix RebootStatus enum access pattern for protobuf Python codegen - Fix test mock to implement grpc.Call interface (code()/details()) Signed-off-by: Dawei Huang <[email protected]>
- Correct stderr statement: _send_reboot_command uses suppress_stderr=True, so panic output is suppressed, not captured - Add packaging note: setup.py must include host_modules.gnoi in packages list for vendored stubs to be installed Signed-off-by: Dawei Huang <[email protected]>
Section 8 now links to gnoi_client, reboot.go, and system.proto instead of inlining full source code. Signed-off-by: Dawei Huang <[email protected]>
…rplate Signed-off-by: Dawei Huang <[email protected]>
Signed-off-by: Dawei Huang <[email protected]>
Signed-off-by: Dawei Huang <[email protected]>
Refactor gnoi_shutdown_daemon to use direct gRPC instead of shelling out to 'docker exec gnmi gnoi_client'. This addresses three issues: 1. Broken RebootStatus completion detection - the old code checked for 'reboot complete' in stdout, but gnoi_client outputs JSON with 'STATUS_SUCCESS', so the poll always timed out regardless of DPU state. 2. Suppressed error context - _send_reboot_command used suppress_stderr=True, discarding Go panic traces. Now gRPC errors carry status codes and details natively (e.g. UNAVAILABLE: connection refused). 3. Unnecessary NPU gnmi container dependency - the DPU's own gnmi server is the actual RPC endpoint; direct gRPC connects without involving the NPU gnmi container as an intermediary. Changes: - Vendor gNOI System proto Python stubs (from openconfig/gnoi) into host_modules/gnoi/ (system_pb2, types_pb2, common_pb2) - Add GnoiClient wrapper (client.py) with reboot(), reboot_status(), cancel_reboot() methods and context manager support - Refactor _send_reboot_command to use GnoiClient.reboot() with proper grpc.RpcError handling - Refactor _poll_reboot_status to check resp.active and resp.status.status protobuf fields directly instead of string matching - Add grpcio and protobuf to setup.py install_requires - Add host_modules.gnoi to setup.py packages list - Update all unit tests to mock GnoiClient instead of subprocess calls - Add new tests: gRPC error handling, transient error recovery, failure status detection Signed-off-by: sigabrtv1-ui <[email protected]> Signed-off-by: Dawei Huang <[email protected]>
|
/azp run |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
Integrate upstream sonic-net#352 (_should_skip_gnoi_shutdown for offline/powered-down DPUs) with native gRPC refactor. All 37 tests pass. Signed-off-by: sigabrtv1-ui <[email protected]> Signed-off-by: Dawei Huang <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| from host_modules.gnoi import system_pb2_grpc | ||
|
|
||
|
|
||
| class GnoiClient: |
There was a problem hiding this comment.
We might have other gnoi services. Should not pin it to System.
There was a problem hiding this comment.
Good call — refactored in 5b12fba. GnoiClient now manages just the gRPC channel and exposes service stubs via properties (client.system). Future gNOI services (Healthz, Cert, etc.) can add their own @property or use client.channel directly for custom stubs.
There was a problem hiding this comment.
Call should be routed from through service. client.system.reboot instead of client.reboot.
There was a problem hiding this comment.
Updated in 9fa8000 — removed all convenience methods. Callers now go through client.system.Reboot() / client.system.RebootStatus() directly. GnoiClient just manages the channel + exposes service stubs via properties.
GnoiClient now manages the gRPC channel and exposes service stubs via properties (e.g. client.system). The System convenience methods (reboot, reboot_status, cancel_reboot) remain for backward compat, but future gNOI services (Healthz, Cert, etc.) can add their own stubs without modifying the client class. Also exposes client.channel for direct stub construction when needed. Signed-off-by: sigabrtv1-ui <[email protected]> Signed-off-by: Dawei Huang <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Remove convenience methods from GnoiClient. All RPCs now go through service stub properties: client.system.Reboot(), client.system.RebootStatus(). This keeps the client clean for future gNOI services — each service gets its own stub property, callers use the standard protobuf API. Signed-off-by: sigabrtv1-ui <[email protected]> Signed-off-by: Dawei Huang <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
What I did
Refactored
gnoi_shutdown_daemonto use direct Python gRPC calls instead of shelling out todocker exec gnmi gnoi_client.Why I did it
Three issues in the current subprocess-based approach:
Broken RebootStatus detection — the poll checks
"reboot complete" in out_s.lower(), butgnoi_clientoutputs JSON ({"active":false,"status":{"status":"STATUS_SUCCESS"}}). The substring never matches, so the poll always times out regardless of DPU state.Suppressed error context —
_send_reboot_command()usessuppress_stderr=True, discarding Go panic traces fromgnoi_client. The daemon logs only"Reboot command failed"with no actionable detail.Unnecessary NPU gnmi container dependency — the current path shells into the NPU
gnmicontainer to rungnoi_client, but the DPU's own gnmi server is the actual RPC endpoint. Direct gRPC connects without this intermediary.How I did it
host_modules/gnoi/GnoiClientwrapper (client.py) withreboot(),reboot_status(),cancel_reboot()methods_send_reboot_command→GnoiClient.reboot()with propergrpc.RpcErrorhandling (status codes + details)_poll_reboot_status→ checksresp.activeandresp.status.statusprotobuf fields directlygrpcioandprotobuftosetup.pydependenciesHow to verify it
On a SmartSwitch with DPU:
docker execin logs)Design doc: #361