Skip to content

fix: raise RLIMIT_NOFILE and use real cocoindex inflight env var (#306)#307

Merged
HumanBean17 merged 1 commit into
masterfrom
fix/emfile-fd-limit-and-inflight-env
Jun 13, 2026
Merged

fix: raise RLIMIT_NOFILE and use real cocoindex inflight env var (#306)#307
HumanBean17 merged 1 commit into
masterfrom
fix/emfile-fd-limit-and-inflight-env

Conversation

@HumanBean17

Copy link
Copy Markdown
Owner

Summary

Re-fixes the recurring Too many open files (os error 24) LanceDB error that the #293 fix (#300) failed to resolve and which recurred in #306.

Root cause of the recurrence

#300 set COCOINDEX_SOURCE_MAX_INFLIGHT_ROWSan env var CocoIndex never reads. Verified against cocoindex 1.0.7: 0 matches in the Python package and 0 in the native core.abi3.so binary. The real semaphore env var is COCOINDEX_MAX_INFLIGHT_COMPONENTS (default 1024), so the throttle was a no-op and the default 1024 inflight components stayed in effect → enough concurrent LanceDB merge-inserts to exhaust OS file descriptors.

Changes

Layer A — correctness (replace the dead env var):

  • New cocoindex_subprocess_env_defaults() in java_codebase_rag/config.py — single source of truth for the throttle, using the real COCOINDEX_MAX_INFLIGHT_COMPONENTS=256.
  • pipeline.run_cocoindex_update and server._cocoindex_subprocess_env both apply it via setdefault (operator override still wins). The bogus name is gone from all production code.

Layer B — deterministic OS-resource fix (the robust layer):

  • New java_codebase_rag/_fdlimit.py::raise_fd_limit() raises the process's own soft RLIMIT_NOFILE toward its hard limit (capped at 65536, never RLIM_INFINITY — that breaks select()/kqueue on macOS). Best-effort, silent, no-op on Windows.
  • Called at the top of cli.main and server.main. RLIMIT_NOFILE is inherited across fork+exec, so every cocoindex / cocoindex-code child inherits the headroom.

Why both: Layer A bounds concurrency (fewer FDs needed, lower peak load) but is a probabilistic mitigation; Layer B is the deterministic fix for the resource exhaustion itself. macOS processes launched by GUI / launchd / IDE / MCP host inherit a 256 FD ceiling, not the shell's raised ulimit — which is why the error recurred even on hosts whose terminal reports a high limit.

User-visible behaviour changes

  • The cocoindex throttle env var is renamed COCOINDEX_SOURCE_MAX_INFLIGHT_ROWSCOCOINDEX_MAX_INFLIGHT_COMPONENTS. Anyone who set the old name manually will need to switch (the old name never did anything anyway).
  • The java-codebase-rag CLI and MCP server now raise their own soft FD limit to ≤65536 at startup (no-op if already higher).

No re-index / schema impact

No ontology bump, no Lance/Kuzu schema change, no re-index required.

Validation

  • .venv/bin/ruff check . — clean
  • .venv/bin/python -m pytest tests -v748 passed, 11 skipped, 0 failed (heavy gate off)
  • Layer B verified at the OS level under the exact Install error #2 #293/Init error #306 trigger:
    $ ulimit -n 256 && python -c 'from java_codebase_rag._fdlimit import raise_fd_limit; ...'
    BEFORE raise_fd_limit: soft=256 hard=9223372036854775807
    AFTER  raise_fd_limit: soft=65536 hard=9223372036854775807
    
    (Correct no-op when the limit is already high.)
  • New tests written RED→GREEN (TDD): tests/test_fd_limit.py (5), tests/test_config.py (+1), tests/test_mcp_tools.py (+1, server-wiring regression guard).

Fixes #306. Supersedes the broken #300 fix for #293.

🤖 Generated with Claude Code

The #293 fix (#300) set COCOINDEX_SOURCE_MAX_INFLIGHT_ROWS — an env var
CocoIndex never reads. The real semaphore var is
COCOINDEX_MAX_INFLIGHT_COMPONENTS (default 1024, see
cocoindex/_internal/app.py), so the throttle was a no-op and the EMFILE
"Too many open files (os error 24)" recurred (#306).

Layer A (correctness): centralize the throttle in
cocoindex_subprocess_env_defaults() using the real env var; both cocoindex
subprocess sites (pipeline.run_cocoindex_update + server._cocoindex_subprocess_env)
apply it via setdefault so an operator override still wins.

Layer B (deterministic): raise_fd_limit() raises the process soft
RLIMIT_NOFILE toward its hard limit (capped 65536, never infinity) at
cli.main / server.main startup. rlimits are inherited across fork+exec, so
cocoindex children get headroom regardless of launch context — macOS
GUI/launchd/IDE-launched processes inherit a 256 FD ceiling, not the shell's
raised limit, which is why the error recurred even on hosts whose terminal
shows a high ulimit.

No ontology/schema/re-index impact.

Fixes #306.

Co-Authored-By: Claude <noreply@anthropic.com>
@HumanBean17 HumanBean17 force-pushed the fix/emfile-fd-limit-and-inflight-env branch from e5a0363 to c2d4d2a Compare June 13, 2026 14:09
@HumanBean17 HumanBean17 merged commit 59de211 into master Jun 13, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Init error

1 participant