Feat/refractor llm provider#914
Conversation
… management; support for personalization module
There was a problem hiding this comment.
Code Review
This pull request refactors the MS-Agent framework by introducing a knowledge-driven skill system (v2) integrated directly into the standard agent loop, a data-driven LLM provider layer with adaptive retry, a pluggable retriever framework, and a robust project/session management system. The code review identified several critical security and stability issues, including path traversal bypass vulnerabilities in path resolution, potential TypeError and ValueError crashes during streaming merges and string splitting, ZeroDivisionError risks in the BM25 retriever, and Windows compatibility issues with Path.rename().
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def _resolve_safe(self, rel_path: str) -> Path: | ||
| target = (self._root / rel_path).resolve() | ||
| if not str(target).startswith(str(self._root)): | ||
| raise PermissionError(f'Path traversal blocked: {rel_path}') | ||
| return target |
There was a problem hiding this comment.
Path Traversal Bypass Vulnerability
Using str(target).startswith(str(self._root)) to verify that a path is within the workspace root is vulnerable to path traversal bypass. For example, if self._root is /foo/bar and target resolves to /foo/bar_baz, the check will pass because /foo/bar_baz starts with /foo/bar, even though it is outside the root directory.
Use Path.relative_to() to securely verify that the resolved path is strictly under the root directory.
| def _resolve_safe(self, rel_path: str) -> Path: | |
| target = (self._root / rel_path).resolve() | |
| if not str(target).startswith(str(self._root)): | |
| raise PermissionError(f'Path traversal blocked: {rel_path}') | |
| return target | |
| def _resolve_safe(self, rel_path: str) -> Path: | |
| target = (self._root / rel_path).resolve() | |
| try: | |
| target.relative_to(self._root) | |
| except ValueError: | |
| raise PermissionError(f'Path traversal blocked: {rel_path}') | |
| return target |
| if not str(target).startswith(str(skill_root)): | ||
| return json.dumps({"error": "Path traversal not allowed"}) |
There was a problem hiding this comment.
Path Traversal Bypass Vulnerability
Using str(target).startswith(str(skill_root)) to verify that a path is within the skill root is vulnerable to path traversal bypass. For example, if skill_root is /foo/bar and target resolves to /foo/bar_baz, the check will pass because /foo/bar_baz starts with /foo/bar, even though it is outside the root directory.
Use Path.relative_to() to securely verify that the resolved path is strictly under the root directory.
| if not str(target).startswith(str(skill_root)): | |
| return json.dumps({"error": "Path traversal not allowed"}) | |
| try: | |
| target.relative_to(skill_root) | |
| except ValueError: | |
| return json.dumps({"error": "Path traversal not allowed"}) |
| message = deepcopy(pre_message_chunk) | ||
| message.reasoning_content += message_chunk.reasoning_content | ||
| message.content += message_chunk.content |
There was a problem hiding this comment.
Potential TypeError on None values during streaming merge
During streaming, pre_message_chunk.reasoning_content or pre_message_chunk.content can be None (especially for the first chunk of a tool call or reasoning, or if the previous message didn't have reasoning content). Attempting to perform += on a None value will raise a TypeError and crash the agent.
Ensure both operands are coalesced to empty strings before concatenation.
| message = deepcopy(pre_message_chunk) | |
| message.reasoning_content += message_chunk.reasoning_content | |
| message.content += message_chunk.content | |
| message = deepcopy(pre_message_chunk) | |
| message.reasoning_content = (message.reasoning_content or '') + (message_chunk.reasoning_content or '') | |
| message.content = (message.content or '') + (message_chunk.content or '') |
| if self.continue_gen_mode == 'prefix' and self.continue_gen_stop: | ||
| existing = list(kwargs.pop('stop', []) or []) | ||
| kwargs['stop'] = existing + list(self.continue_gen_stop) |
There was a problem hiding this comment.
Potential bug when stop is a string
If existing (popped from kwargs) is a string (e.g., "###"), calling list(existing) will split it into individual characters (e.g., ['#', '#', '#']), which is not the intended behavior for stop sequences.
Check if existing is a string and wrap it in a list to preserve the sequence.
if self.continue_gen_mode == 'prefix' and self.continue_gen_stop:
existing = kwargs.pop('stop', [])
if isinstance(existing, str):
existing = [existing]
else:
existing = list(existing or [])
kwargs['stop'] = existing + list(self.continue_gen_stop)| stop = list(kwargs.pop('stop', []) or []) | ||
| stop.append('```') | ||
| return self._call_llm( | ||
| messages=messages, tools=tools, stop=stop, **kwargs) |
There was a problem hiding this comment.
Potential bug when stop is a string
If stop (popped from kwargs) is a string, calling list(stop) will split it into individual characters, which is not the intended behavior for stop sequences.
Check if stop is a string and wrap it in a list to preserve the sequence.
| stop = list(kwargs.pop('stop', []) or []) | |
| stop.append('```') | |
| return self._call_llm( | |
| messages=messages, tools=tools, stop=stop, **kwargs) | |
| stop = kwargs.pop('stop', []) | |
| if isinstance(stop, str): | |
| stop = [stop] | |
| else: | |
| stop = list(stop or []) | |
| stop.append('```') | |
| return self._call_llm( | |
| messages=messages, tools=tools, stop=stop, **kwargs) |
| os.makedirs(local_dir, exist_ok=True) | ||
|
|
||
| _owner, name = skill_id.split("/", 1) | ||
| skill_dir = os.path.join(local_dir, name) |
There was a problem hiding this comment.
Potential ValueError on single-segment skill_id
If skill_id does not contain a / (e.g., a custom or local skill), calling skill_id.split("/", 1) will return a list of length 1, raising a ValueError when unpacking into _owner, name.
Check if / is present in skill_id before splitting.
| skill_dir = os.path.join(local_dir, name) | |
| if "/" in skill_id: | |
| _owner, name = skill_id.split("/", 1) | |
| else: | |
| name = skill_id |
| denom = tf + self.k1 * ( | ||
| 1 - self.b + self.b * (dl / self._avgdl)) | ||
| scores[i] += idf * (numer / denom) |
There was a problem hiding this comment.
Potential ZeroDivisionError
If all indexed documents are empty or contain no valid tokens, self._avgdl will be 0.0, causing a ZeroDivisionError when calculating dl / self._avgdl.
Guard against self._avgdl == 0 to prevent runtime crashes.
| denom = tf + self.k1 * ( | |
| 1 - self.b + self.b * (dl / self._avgdl)) | |
| scores[i] += idf * (numer / denom) | |
| denom = tf + self.k1 * ( | |
| 1 - self.b + self.b * (dl / self._avgdl if self._avgdl > 0 else 1.0)) | |
| scores[i] += idf * (numer / denom) |
| def parse_input(text: str) -> tuple[str, str]: | ||
| stripped = text.strip() | ||
| parts = stripped.split(maxsplit=1) | ||
| cmd = parts[0].lstrip('/').lower() | ||
| args = parts[1] if len(parts) > 1 else '' | ||
| return cmd, args |
There was a problem hiding this comment.
Potential IndexError on empty or whitespace-only input
If text is empty or contains only whitespace, stripped will be empty, and stripped.split(maxsplit=1) will return an empty list, causing an IndexError when accessing parts[0].
Add a guard to return empty strings early if stripped is empty.
| def parse_input(text: str) -> tuple[str, str]: | |
| stripped = text.strip() | |
| parts = stripped.split(maxsplit=1) | |
| cmd = parts[0].lstrip('/').lower() | |
| args = parts[1] if len(parts) > 1 else '' | |
| return cmd, args | |
| @staticmethod | |
| def parse_input(text: str) -> tuple[str, str]: | |
| stripped = text.strip() | |
| if not stripped: | |
| return '', '' | |
| parts = stripped.split(maxsplit=1) | |
| cmd = parts[0].lstrip('/').lower() | |
| args = parts[1] if len(parts) > 1 else '' |
| tmp = self._path.with_suffix('.tmp') | ||
| with open(tmp, 'w', encoding='utf-8') as f: | ||
| json.dump(data, f, ensure_ascii=False, indent=2) | ||
| tmp.rename(self._path) |
There was a problem hiding this comment.
Windows Compatibility Issue with Path.rename
On Windows, Path.rename() raises a FileExistsError if the destination file already exists. To perform a cross-platform atomic replace, use Path.replace() instead.
Note that this also applies to other files in this PR where rename is used on temporary files (e.g., ms_agent/personalization/profile.py, ms_agent/personalization/settings.py, and ms_agent/config/skills_manager.py).
| tmp.rename(self._path) | |
| tmp.replace(self._path) |
Change Summary
Related issue number
Checklist
pre-commit installandpre-commit run --all-filesbefore git commit, and passed lint check.