fix: raise RLIMIT_NOFILE and use real cocoindex inflight env var (#306)#307
Merged
Merged
Conversation
The #293 fix (#300) set COCOINDEX_SOURCE_MAX_INFLIGHT_ROWS — an env var CocoIndex never reads. The real semaphore var is COCOINDEX_MAX_INFLIGHT_COMPONENTS (default 1024, see cocoindex/_internal/app.py), so the throttle was a no-op and the EMFILE "Too many open files (os error 24)" recurred (#306). Layer A (correctness): centralize the throttle in cocoindex_subprocess_env_defaults() using the real env var; both cocoindex subprocess sites (pipeline.run_cocoindex_update + server._cocoindex_subprocess_env) apply it via setdefault so an operator override still wins. Layer B (deterministic): raise_fd_limit() raises the process soft RLIMIT_NOFILE toward its hard limit (capped 65536, never infinity) at cli.main / server.main startup. rlimits are inherited across fork+exec, so cocoindex children get headroom regardless of launch context — macOS GUI/launchd/IDE-launched processes inherit a 256 FD ceiling, not the shell's raised limit, which is why the error recurred even on hosts whose terminal shows a high ulimit. No ontology/schema/re-index impact. Fixes #306. Co-Authored-By: Claude <noreply@anthropic.com>
e5a0363 to
c2d4d2a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Re-fixes the recurring
Too many open files (os error 24)LanceDB error that the #293 fix (#300) failed to resolve and which recurred in #306.Root cause of the recurrence
#300 set
COCOINDEX_SOURCE_MAX_INFLIGHT_ROWS— an env var CocoIndex never reads. Verified against cocoindex 1.0.7: 0 matches in the Python package and 0 in the nativecore.abi3.sobinary. The real semaphore env var isCOCOINDEX_MAX_INFLIGHT_COMPONENTS(default 1024), so the throttle was a no-op and the default 1024 inflight components stayed in effect → enough concurrent LanceDB merge-inserts to exhaust OS file descriptors.Changes
Layer A — correctness (replace the dead env var):
cocoindex_subprocess_env_defaults()injava_codebase_rag/config.py— single source of truth for the throttle, using the realCOCOINDEX_MAX_INFLIGHT_COMPONENTS=256.pipeline.run_cocoindex_updateandserver._cocoindex_subprocess_envboth apply it viasetdefault(operator override still wins). The bogus name is gone from all production code.Layer B — deterministic OS-resource fix (the robust layer):
java_codebase_rag/_fdlimit.py::raise_fd_limit()raises the process's own softRLIMIT_NOFILEtoward its hard limit (capped at 65536, neverRLIM_INFINITY— that breaks select()/kqueue on macOS). Best-effort, silent, no-op on Windows.cli.mainandserver.main.RLIMIT_NOFILEis inherited acrossfork+exec, so every cocoindex /cocoindex-codechild inherits the headroom.Why both: Layer A bounds concurrency (fewer FDs needed, lower peak load) but is a probabilistic mitigation; Layer B is the deterministic fix for the resource exhaustion itself. macOS processes launched by GUI / launchd / IDE / MCP host inherit a 256 FD ceiling, not the shell's raised
ulimit— which is why the error recurred even on hosts whose terminal reports a high limit.User-visible behaviour changes
COCOINDEX_SOURCE_MAX_INFLIGHT_ROWS→COCOINDEX_MAX_INFLIGHT_COMPONENTS. Anyone who set the old name manually will need to switch (the old name never did anything anyway).java-codebase-ragCLI and MCP server now raise their own soft FD limit to ≤65536 at startup (no-op if already higher).No re-index / schema impact
No ontology bump, no Lance/Kuzu schema change, no re-index required.
Validation
.venv/bin/ruff check .— clean.venv/bin/python -m pytest tests -v— 748 passed, 11 skipped, 0 failed (heavy gate off)tests/test_fd_limit.py(5),tests/test_config.py(+1),tests/test_mcp_tools.py(+1, server-wiring regression guard).Fixes #306. Supersedes the broken #300 fix for #293.
🤖 Generated with Claude Code