feat: add model health monitoring and trace context logging#58
feat: add model health monitoring and trace context logging#58vlordier wants to merge 1 commit intoLiquid4All:mainfrom
Conversation
1. Model Health Check + Auto-Recovery: - Add ModelStatus struct with key, display_name, base_url, healthy, model_name, error - Add InferenceClient::get_status() method for detailed health status - Health check hits /models endpoint to verify connectivity 2. Parallel MCP Server Startup: - Already implemented in lifecycle.rs using tokio::spawn - All servers start concurrently 3. Structured Logging with Trace Context: - Add trace_id (8-char UUID) to each send_message request - Log trace_id at start of message processing - Enables correlation across logs for debugging Testing: 365 tests pass, clippy clean
There was a problem hiding this comment.
Pull request overview
Adds health/status reporting for the currently selected inference model and introduces per-request trace ID logging to improve observability in the LocalCowork Tauri app.
Changes:
- Adds a
ModelStatusstruct for serializable model health/status reporting. - Introduces
InferenceClient::get_status()to return endpoint + health details. - Logs a generated
trace_idat the start ofsend_message()for log correlation.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| examples/localcowork/src-tauri/src/inference/types.rs | Adds ModelStatus type (serde-serializable) for health monitoring consumers. |
| examples/localcowork/src-tauri/src/inference/client.rs | Adds get_status() API and extends tests with an LM Studio model config case. |
| examples/localcowork/src-tauri/src/commands/chat.rs | Generates and logs a trace_id at the beginning of message processing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub async fn get_status(&self) -> super::types::ModelStatus { | ||
| let url = format!("{}/models", self.current_model.base_url); | ||
| match self.http.get(&url).timeout(CONNECT_TIMEOUT).send().await { | ||
| Ok(resp) if resp.status().is_success() => super::types::ModelStatus { |
There was a problem hiding this comment.
This uses CONNECT_TIMEOUT (documented as a TCP connect timeout) as a per-request total timeout via .timeout(CONNECT_TIMEOUT). That’s confusing and easy to misuse because reqwest::RequestBuilder::timeout applies to the entire request/response. Consider introducing a dedicated short HEALTH_CHECK_TIMEOUT (or similar) for health/status calls, or updating the constant name/doc to match its actual use here.
| let trace_id = uuid::Uuid::new_v4().to_string()[..8].to_string(); | ||
|
|
||
| tracing::info!(trace_id = %trace_id, session_id = %session_id, content_len = content.len(), "starting message processing"); | ||
|
|
There was a problem hiding this comment.
trace_id is generated but only attached to a single log record. If the goal is request-level correlation across subsequent logs, consider creating a tracing span that includes trace_id (and entering it for the rest of the function) so downstream tracing::info!/debug! calls inherit the same context. Also, truncating to 8 hex chars (32 bits) increases the chance of collisions in logs; using the full UUID or a longer prefix would reduce that risk.
| let trace_id = uuid::Uuid::new_v4().to_string()[..8].to_string(); | |
| tracing::info!(trace_id = %trace_id, session_id = %session_id, content_len = content.len(), "starting message processing"); | |
| let trace_id = Uuid::new_v4().to_string(); | |
| // Create a span so all subsequent logs inherit the same trace_id/session_id context. | |
| let span = tracing::info_span!("send_message", trace_id = %trace_id, session_id = %session_id); | |
| let _span_guard = span.enter(); | |
| tracing::info!(content_len = content.len(), "starting message processing"); |
| Ok(resp) => super::types::ModelStatus { | ||
| key: self.current_model_key.clone(), | ||
| display_name: self.current_model.display_name.clone(), | ||
| base_url: self.current_model.base_url.clone(), | ||
| healthy: false, | ||
| model_name: None, | ||
| error: Some(format!("HTTP {}", resp.status())), |
There was a problem hiding this comment.
get_status sets model_name: None for non-2xx and transport errors, even though the client already knows the configured model name/key. This makes the status payload less useful for UIs/health dashboards because you lose which model is unhealthy. Consider always populating model_name (e.g., from current_model.model_name falling back to current_model_key) and reserve error/healthy for the health signal.
- Add 3 attempts with exponential backoff to health_check() - Improves reliability of PR Liquid4All#58 health monitoring by handling transient failures This helps avoid false negatives when model server is temporarily slow.
Add ModelStatus struct, get_status method, and trace_id logging