ChatWithCrewFlow.__init__ makes blocking LLM call at module import, crashes containers on any LLM hiccup

## Summary

`ChatWithCrewFlow.__init__` in `ag_ui_crewai.crews` triggers synchronous blocking LLM calls at module import time via `crewai.cli.crew_chat.generate_crew_chat_inputs`, which in turn calls:

- `generate_input_description_with_ai` — [`lib/crewai/src/crewai/cli/crew_chat.py:481`](https://github.com/crewAIInc/crewAI/blob/main/lib/crewai/src/crewai/cli/crew_chat.py#L481)
  ```python
  response = chat_llm.call(messages=[{"role": "user", "content": prompt}])
  ```
- `generate_crew_description_with_ai` — [`lib/crewai/src/crewai/cli/crew_chat.py:535`](https://github.com/crewAIInc/crewAI/blob/main/lib/crewai/src/crewai/cli/crew_chat.py#L535)
  ```python
  response = chat_llm.call(messages=[{"role": "user", "content": prompt}])
  ```

For users deploying CrewAI behind a FastAPI server via `ag_ui_crewai.endpoint.add_crewai_crew_fastapi_endpoint` (the recommended integration for AG-UI / CopilotKit), these LLM calls fire during module import — BEFORE `uvicorn` binds to its HTTP port.

## Failure mode

ANY LLM provider hiccup during container startup causes the Python process to crash before the HTTP server is listening:

- OpenAI 500 / 503 / rate-limit
- Network blip, DNS failure, slow cold-start on a mock/proxy server
- Invalid credentials (even transient)
- Litellm `APIError`, `Timeout`, or `APIConnectionError`

In orchestrated environments (Railway, Kubernetes, AWS ECS, Fly.io) the platform's readiness/health check fails because no process ever binds the port. The platform then marks the deploy failed and rolls back to the previous image, making the service effectively unresponsive to LLM-layer instability.

We hit this on our Railway-hosted CopilotKit showcase when our LLM mock (aimock) returned a transient schema error. The mock error was recoverable — the issue is that it shouldn't have been able to crash the entire container before the HTTP server was ready.

### Actual stack trace we observed

```
File "/app/agent_server.py", line 27, in <module>
    add_crewai_crew_fastapi_endpoint(app, LatestAiDevelopment(), "/")
File ".../ag_ui_crewai/endpoint.py", line 250, in add_crewai_crew_fastapi_endpoint
    add_crewai_flow_fastapi_endpoint(app, ChatWithCrewFlow(crew=crew), path)
File ".../ag_ui_crewai/crews.py", line 56, in __init__
    self.crew_chat_inputs = crew_chat_generate_crew_chat_inputs(...)
File ".../crewai/cli/crew_chat.py", line 387, in generate_crew_chat_inputs
    description = generate_input_description_with_ai(input_name, crew, chat_llm)
File ".../crewai/cli/crew_chat.py", line 481, in generate_input_description_with_ai
    response = chat_llm.call(messages=[...])
File ".../crewai/llm.py", line 956, in call
    return self._handle_non_streaming_response(...)
...
APIError: <connection failure>
```

Container exits with code 1, never binds a port, orchestrator's health check fails, deploy rolls back.

## Why this is a CrewAI concern, not just an ag-ui-crewai concern

While `ChatWithCrewFlow` lives in `ag-ui-crewai`, the two functions that block are part of CrewAI's public `crewai.cli.crew_chat` module. CrewAI is asking users to consume these helpers at import/init time without any of the standard production-server defenses:

- No timeout
- No retry/fallback
- No try/except with a graceful default
- No opt-out

Any consumer that instantiates a chat flow with them in a serving context inherits this fragility.

## Suggested fixes (any or all)

1. **Lazy init at first request.** Have `ChatWithCrewFlow.__init__` store the crew and LLM but defer `generate_crew_chat_inputs` until the first actual chat turn. (A similar fix has already landed on `ag-ui-protocol/ag-ui` main for `add_crewai_crew_fastapi_endpoint` — deferring `ChatWithCrewFlow` construction to first-request. But the underlying functions in CrewAI still have no defenses.)

2. **Try/except with a static fallback inside the generator functions.** If the LLM call fails for any reason, fall back to a generic string like `"Input value for the crew's tasks and agents."` or `"A CrewAI crew."`. These descriptions are only surfaced in the CrewAI chat UI — shipping a generic default on LLM failure is strictly better than crashing the process.

3. **Make AI-generated descriptions opt-in.** Accept a kwarg `generate_descriptions: bool = True` (default preserves current behavior), but let production users pass `False` to skip the LLM calls entirely.

4. **Timeout + bounded retry.** At minimum, enforce a short timeout (e.g., 10s) on `chat_llm.call` in these two functions so a hung LLM can't indefinitely block process startup.

## Our workaround

We're shipping a defensive monkey-patch in our showcase to patch both functions to return static strings before `ag_ui_crewai` is imported. PR: https://github.com/CopilotKit/CopilotKit/pull/3974

This is fragile (depends on private function names) and we'd much prefer an upstream fix so every AG-UI / CopilotKit / direct-CrewAI production deployment doesn't inherit this footgun.

## Environment

- `crewai>=0.130.0`
- `ag-ui-crewai==0.1.5` (latest released; `main` already has the deferred-construction fix in `endpoint.py` but no release yet)
- Python 3.12


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChatWithCrewFlow.init makes blocking LLM call at module import, crashes containers on any LLM hiccup #5510

Summary

Failure mode

Actual stack trace we observed

Why this is a CrewAI concern, not just an ag-ui-crewai concern

Suggested fixes (any or all)

Our workaround

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ChatWithCrewFlow.__init__ makes blocking LLM call at module import, crashes containers on any LLM hiccup #5510

Description

Summary

Failure mode

Actual stack trace we observed

Why this is a CrewAI concern, not just an ag-ui-crewai concern

Suggested fixes (any or all)

Our workaround

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

ChatWithCrewFlow.init makes blocking LLM call at module import, crashes containers on any LLM hiccup #5510