Skip to content

feat: add model health monitoring and trace context logging#58

Open
vlordier wants to merge 1 commit intoLiquid4All:mainfrom
vlordier:feat/health-monitoring-trace-context
Open

feat: add model health monitoring and trace context logging#58
vlordier wants to merge 1 commit intoLiquid4All:mainfrom
vlordier:feat/health-monitoring-trace-context

Conversation

@vlordier
Copy link
Copy Markdown

@vlordier vlordier commented Mar 6, 2026

Add ModelStatus struct, get_status method, and trace_id logging

1. Model Health Check + Auto-Recovery:
   - Add ModelStatus struct with key, display_name, base_url, healthy, model_name, error
   - Add InferenceClient::get_status() method for detailed health status
   - Health check hits /models endpoint to verify connectivity

2. Parallel MCP Server Startup:
   - Already implemented in lifecycle.rs using tokio::spawn
   - All servers start concurrently

3. Structured Logging with Trace Context:
   - Add trace_id (8-char UUID) to each send_message request
   - Log trace_id at start of message processing
   - Enables correlation across logs for debugging

Testing: 365 tests pass, clippy clean
Copilot AI review requested due to automatic review settings March 6, 2026 20:01
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds health/status reporting for the currently selected inference model and introduces per-request trace ID logging to improve observability in the LocalCowork Tauri app.

Changes:

  • Adds a ModelStatus struct for serializable model health/status reporting.
  • Introduces InferenceClient::get_status() to return endpoint + health details.
  • Logs a generated trace_id at the start of send_message() for log correlation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
examples/localcowork/src-tauri/src/inference/types.rs Adds ModelStatus type (serde-serializable) for health monitoring consumers.
examples/localcowork/src-tauri/src/inference/client.rs Adds get_status() API and extends tests with an LM Studio model config case.
examples/localcowork/src-tauri/src/commands/chat.rs Generates and logs a trace_id at the beginning of message processing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +429 to +432
pub async fn get_status(&self) -> super::types::ModelStatus {
let url = format!("{}/models", self.current_model.base_url);
match self.http.get(&url).timeout(CONNECT_TIMEOUT).send().await {
Ok(resp) if resp.status().is_success() => super::types::ModelStatus {
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses CONNECT_TIMEOUT (documented as a TCP connect timeout) as a per-request total timeout via .timeout(CONNECT_TIMEOUT). That’s confusing and easy to misuse because reqwest::RequestBuilder::timeout applies to the entire request/response. Consider introducing a dedicated short HEALTH_CHECK_TIMEOUT (or similar) for health/status calls, or updating the constant name/doc to match its actual use here.

Copilot uses AI. Check for mistakes.
Comment on lines +1244 to +1247
let trace_id = uuid::Uuid::new_v4().to_string()[..8].to_string();

tracing::info!(trace_id = %trace_id, session_id = %session_id, content_len = content.len(), "starting message processing");

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trace_id is generated but only attached to a single log record. If the goal is request-level correlation across subsequent logs, consider creating a tracing span that includes trace_id (and entering it for the rest of the function) so downstream tracing::info!/debug! calls inherit the same context. Also, truncating to 8 hex chars (32 bits) increases the chance of collisions in logs; using the full UUID or a longer prefix would reduce that risk.

Suggested change
let trace_id = uuid::Uuid::new_v4().to_string()[..8].to_string();
tracing::info!(trace_id = %trace_id, session_id = %session_id, content_len = content.len(), "starting message processing");
let trace_id = Uuid::new_v4().to_string();
// Create a span so all subsequent logs inherit the same trace_id/session_id context.
let span = tracing::info_span!("send_message", trace_id = %trace_id, session_id = %session_id);
let _span_guard = span.enter();
tracing::info!(content_len = content.len(), "starting message processing");

Copilot uses AI. Check for mistakes.
Comment on lines +440 to +446
Ok(resp) => super::types::ModelStatus {
key: self.current_model_key.clone(),
display_name: self.current_model.display_name.clone(),
base_url: self.current_model.base_url.clone(),
healthy: false,
model_name: None,
error: Some(format!("HTTP {}", resp.status())),
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_status sets model_name: None for non-2xx and transport errors, even though the client already knows the configured model name/key. This makes the status payload less useful for UIs/health dashboards because you lose which model is unhealthy. Consider always populating model_name (e.g., from current_model.model_name falling back to current_model_key) and reserve error/healthy for the health signal.

Copilot uses AI. Check for mistakes.
vlordier added a commit to vlordier/cookbook that referenced this pull request Mar 6, 2026
- Add 3 attempts with exponential backoff to health_check()
- Improves reliability of PR Liquid4All#58 health monitoring by handling transient failures

This helps avoid false negatives when model server is temporarily slow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants