-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Replace pickle with JSON in Auto3DSeg algo serialization #8695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Replace pickle with JSON in Auto3DSeg algo serialization #8695
Conversation
- Rename algo_to_pickle() to algo_to_json(): - Serialize algo state as JSON with _target_ for class reconstruction - Uses MONAI's ConfigParser pattern for dynamic instantiation - Truly pickle-free serialization (for algo metadata; model weights still use torch.save) - Add _make_json_serializable() to handle numpy arrays, tensors, Path objects - Rename algo_from_pickle() to algo_from_json() - Add deprecated aliases for backward compatibility - Update import_bundle_algo_history() to prefer .json files Fixes Project-MONAI#8586 Signed-off-by: Soumya Snigdha Kundu <[email protected]>
📝 WalkthroughWalkthroughThe pull request migrates Auto3DSeg's algorithm serialization from pickle to JSON format. The changes introduce two new public functions ( Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @monai/auto3dseg/utils.py:
- Around line 474-494: The loop over paths_to_try manipulates sys.path
non-atomically (sys.path.insert/remove) around ConfigParser/
parser.get_parsed_content which can race if concurrent; fix by introducing a
module-level lock (e.g., SYS_PATH_LOCK) and acquire it before any sys.path
modification and release it after cleanup, wrapping the path insert, parser
instantiation (ConfigParser and parser.get_parsed_content) and the removal in a
try/finally so removal always happens and the lock is released; reference the
paths_to_try loop, sys.path.insert/sys.path.remove, ConfigParser,
parser.get_parsed_content, and used_template_path when applying the lock.
🧹 Nitpick comments (10)
monai/apps/auto3dseg/utils.py (1)
67-70: Minor readability nit: condition can be simplified.The
or not only_trainedis a double negative. Consider inverting for clarity, though current logic is correct.monai/auto3dseg/utils.py (9)
281-304: LGTM with minor observation.Good coverage of common types. The fallback to
str(value)on line 304 silently converts unknown types—consider logging a warning for unexpected types to aid debugging.
307-312: Missing docstring parameter/return documentation.Per coding guidelines, docstrings should describe parameters and return values.
Suggested docstring
def _add_path_with_parent(paths: list[str], path: str | None) -> None: - """Add a path and its parent directory to the list if the path is a valid directory.""" + """ + Add a path and its parent directory to the list if the path is a valid directory. + + Args: + paths: List to append paths to (modified in-place). + path: Directory path to add; skipped if None or not a valid directory. + """
327-333: Hardcoded attribute list may become stale.Consider defining
SERIALIZABLE_ATTRSas a module-level constant or documenting why these specific attributes are serialized. Easier to maintain and extend.
346-348: Missing encoding parameter for file write.Explicit
encoding="utf-8"is recommended for cross-platform consistency.Fix
- with open(json_filename, "w") as f: + with open(json_filename, "w", encoding="utf-8") as f:
414-414: Unusedkwargsparameter.Static analysis (ARG001) notes
kwargsis unused. Docstring says "reserved for future use"—acceptable, but consider using_prefix convention or**_kwargsto suppress linter warnings.
446-447: Missing encoding parameter for file read.Same as write—explicit
encoding="utf-8"recommended.Fix
- with open(filename) as f: + with open(filename, encoding="utf-8") as f:
449-450: ConsiderTypeErrorfor type validation.Static analysis (TRY004) suggests
TypeErroroverValueErrorfor invalid types. Minor pedantic issue.
479-481: Simplify dictionary access.Static analysis (RUF019) notes unnecessary key check. Use
state.get("template_path")instead.Fix
algo_config: dict[str, Any] = {"_target_": target} - if "template_path" in state and state["template_path"]: - algo_config["template_path"] = state["template_path"] + if state.get("template_path"): + algo_config["template_path"] = state["template_path"]
499-502: State restoration silently skips unknown attributes.If the JSON contains an attribute not present on the algo object, it's silently ignored. This is likely intentional for forward compatibility, but logging a debug message would help troubleshoot version mismatches.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
monai/apps/auto3dseg/utils.pymonai/auto3dseg/__init__.pymonai/auto3dseg/utils.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.
Files:
monai/apps/auto3dseg/utils.pymonai/auto3dseg/__init__.pymonai/auto3dseg/utils.py
🧬 Code graph analysis (2)
monai/apps/auto3dseg/utils.py (2)
monai/auto3dseg/utils.py (2)
algo_from_json(414-510)algo_to_json(315-350)monai/utils/enums.py (1)
AlgoKeys(687-699)
monai/auto3dseg/__init__.py (1)
monai/auto3dseg/utils.py (3)
algo_from_json(414-510)algo_from_pickle(669-670)algo_to_json(315-350)
🪛 Ruff (0.14.10)
monai/auto3dseg/utils.py
384-384: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue
(S301)
391-391: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue
(S301)
414-414: Unused function argument: kwargs
(ARG001)
450-450: Prefer TypeError exception for invalid type
(TRY004)
450-450: Avoid specifying long messages outside the exception class
(TRY003)
455-455: Avoid specifying long messages outside the exception class
(TRY003)
480-480: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
497-497: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: min-dep-py3 (3.12)
- GitHub Check: min-dep-os (macOS-latest)
- GitHub Check: min-dep-pytorch (2.7.1)
- GitHub Check: min-dep-py3 (3.11)
- GitHub Check: min-dep-pytorch (2.8.0)
- GitHub Check: min-dep-pytorch (2.6.0)
- GitHub Check: min-dep-pytorch (2.5.1)
- GitHub Check: min-dep-py3 (3.10)
- GitHub Check: min-dep-os (ubuntu-latest)
- GitHub Check: min-dep-py3 (3.9)
- GitHub Check: min-dep-os (windows-latest)
- GitHub Check: flake8-py3 (mypy)
- GitHub Check: quick-py3 (windows-latest)
- GitHub Check: quick-py3 (macOS-latest)
- GitHub Check: quick-py3 (ubuntu-latest)
- GitHub Check: build-docs
- GitHub Check: packaging
- GitHub Check: flake8-py3 (pytype)
- GitHub Check: flake8-py3 (codeformat)
🔇 Additional comments (7)
monai/apps/auto3dseg/utils.py (3)
17-17: LGTM!Import updated to use new JSON-based functions.
45-56: LGTM!Clean preference logic: JSON first, pickle fallback. Good migration strategy.
75-84: LGTM!Docstring and function body updated to use JSON serialization.
monai/auto3dseg/utils.py (3)
353-411: Legacy pickle loader retained for backward compatibility.Static analysis flags pickle security (S301). Acceptable here since it's deprecated legacy support with clear warnings. No action required beyond eventual removal.
504-504:used_template_pathoverwrites state-restoredtemplate_path.Line 501-502 restores
template_pathfrom state, but line 504 immediately overwrites it withused_template_path(which could be different). Verify this is the intended behavior.
662-669: LGTM!Clean deprecated aliases delegating to new JSON functions.
monai/auto3dseg/__init__.py (1)
27-39: LGTM!New JSON functions properly exported alongside deprecated pickle aliases.
| for path in paths_to_try: | ||
| try: | ||
| if path and path not in sys.path: | ||
| sys.path.insert(0, path) | ||
|
|
||
| algo_config: dict[str, Any] = {"_target_": target} | ||
| if "template_path" in state and state["template_path"]: | ||
| algo_config["template_path"] = state["template_path"] | ||
|
|
||
| parser = ConfigParser(algo_config) | ||
| algo = parser.get_parsed_content() | ||
| used_template_path = path | ||
|
|
||
| if path and path in sys.path: | ||
| sys.path.remove(path) | ||
| break | ||
| except (ModuleNotFoundError, ImportError) as e: | ||
| if path and path in sys.path: | ||
| sys.path.remove(path) | ||
| logging.debug(f"Failed to instantiate {target} with path {path}: {e}") | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's see the function containing this code
sed -n '450,500p' monai/auto3dseg/utils.pyRepository: Project-MONAI/MONAI
Length of output: 2103
🏁 Script executed:
# Check what function this is part of
grep -n "^def\|^class" monai/auto3dseg/utils.py | grep -B1 "474"Repository: Project-MONAI/MONAI
Length of output: 45
🏁 Script executed:
# Search for calls to algo_from_json or algo_from_pickle
rg "algo_from_json|algo_from_pickle" --type py -C 2Repository: Project-MONAI/MONAI
Length of output: 4135
🏁 Script executed:
# Check if there's any threading or async pattern in this file
rg "Thread|threading|asyncio|concurrent|Executor" monai/auto3dseg/utils.pyRepository: Project-MONAI/MONAI
Length of output: 45
🏁 Script executed:
# Check if the function is exposed in __init__.py or used in tests
rg "algo_from_json|algo_from_pickle" --type py | head -20Repository: Project-MONAI/MONAI
Length of output: 1560
🏁 Script executed:
# Look for tests of algo_from_json
find . -path ./build -prune -o -name "*test*.py" -type f -exec grep -l "algo_from_json\|algo_from_pickle" {} \;Repository: Project-MONAI/MONAI
Length of output: 45
🏁 Script executed:
# Check the context of calls in hpo_gen.py to see if concurrent
sed -n '1,100p' monai/apps/auto3dseg/hpo_gen.py | head -60Repository: Project-MONAI/MONAI
Length of output: 2408
🏁 Script executed:
# Get the full context of algo_from_json and algo_from_pickle function signatures
sed -n '420,510p' monai/auto3dseg/utils.pyRepository: Project-MONAI/MONAI
Length of output: 3791
sys.path manipulation lacks thread-safety guarantees, though current usage is sequential.
While sys.path.insert() and sys.path.remove() are individually atomic due to the GIL, the paired operations aren't atomic together and could race if called concurrently. Current callers invoke this sequentially during initialization (in hpo_gen.py.__init__ and utils.py iteration), so practical risk is low. Consider a lock or context manager if concurrent calls are anticipated.
🧰 Tools
🪛 Ruff (0.14.10)
480-480: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
🤖 Prompt for AI Agents
In @monai/auto3dseg/utils.py around lines 474 - 494, The loop over paths_to_try
manipulates sys.path non-atomically (sys.path.insert/remove) around
ConfigParser/ parser.get_parsed_content which can race if concurrent; fix by
introducing a module-level lock (e.g., SYS_PATH_LOCK) and acquire it before any
sys.path modification and release it after cleanup, wrapping the path insert,
parser instantiation (ConfigParser and parser.get_parsed_content) and the
removal in a try/finally so removal always happens and the lock is released;
reference the paths_to_try loop, sys.path.insert/sys.path.remove, ConfigParser,
parser.get_parsed_content, and used_template_path when applying the lock.
Summary
algo_to_pickle()withalgo_to_json()using compact JSON + MONAI's existing_target_ConfigParser pattern for truly pickle-free algo metadata serialization_make_json_serializable()helper to handle numpy arrays, tensors, Path objectsalgo_from_json()can still load legacy.pklfiles (with deprecation warning)algo_to_pickle()/algo_from_pickle()as deprecated aliasesNote: Model weights still use
torch.saveseparately—this PR focuses on the algo object serialization.Test plan
./runtests.sh --codeformat,./runtests.sh --ruff)_make_json_serializable()and_add_path_with_parent()helperstests/apps/test_auto3dseg_bundlegen.py(requires optional deps)Fixes #8586