Skip to content

Feat/refractor llm provider#914

Open
alcholiclg wants to merge 7 commits into
modelscope:mainfrom
alcholiclg:feat/refractor_llm
Open

Feat/refractor llm provider#914
alcholiclg wants to merge 7 commits into
modelscope:mainfrom
alcholiclg:feat/refractor_llm

Conversation

@alcholiclg

Copy link
Copy Markdown
Collaborator

Change Summary

  1. refractor llm provider: data-driven provider layer with 10 providers

Related issue number

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Run pre-commit install and pre-commit run --all-files before git commit, and passed lint check.
  • Documentation reflects the changes where applicable

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the MS-Agent framework by introducing a knowledge-driven skill system (v2) integrated directly into the standard agent loop, a data-driven LLM provider layer with adaptive retry, a pluggable retriever framework, and a robust project/session management system. The code review identified several critical security and stability issues, including path traversal bypass vulnerabilities in path resolution, potential TypeError and ValueError crashes during streaming merges and string splitting, ZeroDivisionError risks in the BM25 retriever, and Windows compatibility issues with Path.rename().

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +88 to +92
def _resolve_safe(self, rel_path: str) -> Path:
target = (self._root / rel_path).resolve()
if not str(target).startswith(str(self._root)):
raise PermissionError(f'Path traversal blocked: {rel_path}')
return target

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

Path Traversal Bypass Vulnerability

Using str(target).startswith(str(self._root)) to verify that a path is within the workspace root is vulnerable to path traversal bypass. For example, if self._root is /foo/bar and target resolves to /foo/bar_baz, the check will pass because /foo/bar_baz starts with /foo/bar, even though it is outside the root directory.

Use Path.relative_to() to securely verify that the resolved path is strictly under the root directory.

Suggested change
def _resolve_safe(self, rel_path: str) -> Path:
target = (self._root / rel_path).resolve()
if not str(target).startswith(str(self._root)):
raise PermissionError(f'Path traversal blocked: {rel_path}')
return target
def _resolve_safe(self, rel_path: str) -> Path:
target = (self._root / rel_path).resolve()
try:
target.relative_to(self._root)
except ValueError:
raise PermissionError(f'Path traversal blocked: {rel_path}')
return target

Comment on lines +295 to +296
if not str(target).startswith(str(skill_root)):
return json.dumps({"error": "Path traversal not allowed"})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

Path Traversal Bypass Vulnerability

Using str(target).startswith(str(skill_root)) to verify that a path is within the skill root is vulnerable to path traversal bypass. For example, if skill_root is /foo/bar and target resolves to /foo/bar_baz, the check will pass because /foo/bar_baz starts with /foo/bar, even though it is outside the root directory.

Use Path.relative_to() to securely verify that the resolved path is strictly under the root directory.

Suggested change
if not str(target).startswith(str(skill_root)):
return json.dumps({"error": "Path traversal not allowed"})
try:
target.relative_to(skill_root)
except ValueError:
return json.dumps({"error": "Path traversal not allowed"})

Comment on lines +334 to +336
message = deepcopy(pre_message_chunk)
message.reasoning_content += message_chunk.reasoning_content
message.content += message_chunk.content

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Potential TypeError on None values during streaming merge

During streaming, pre_message_chunk.reasoning_content or pre_message_chunk.content can be None (especially for the first chunk of a tool call or reasoning, or if the previous message didn't have reasoning content). Attempting to perform += on a None value will raise a TypeError and crash the agent.

Ensure both operands are coalesced to empty strings before concatenation.

Suggested change
message = deepcopy(pre_message_chunk)
message.reasoning_content += message_chunk.reasoning_content
message.content += message_chunk.content
message = deepcopy(pre_message_chunk)
message.reasoning_content = (message.reasoning_content or '') + (message_chunk.reasoning_content or '')
message.content = (message.content or '') + (message_chunk.content or '')

Comment on lines +534 to +536
if self.continue_gen_mode == 'prefix' and self.continue_gen_stop:
existing = list(kwargs.pop('stop', []) or [])
kwargs['stop'] = existing + list(self.continue_gen_stop)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Potential bug when stop is a string

If existing (popped from kwargs) is a string (e.g., "###"), calling list(existing) will split it into individual characters (e.g., ['#', '#', '#']), which is not the intended behavior for stop sequences.

Check if existing is a string and wrap it in a list to preserve the sequence.

        if self.continue_gen_mode == 'prefix' and self.continue_gen_stop:
            existing = kwargs.pop('stop', [])
            if isinstance(existing, str):
                existing = [existing]
            else:
                existing = list(existing or [])
            kwargs['stop'] = existing + list(self.continue_gen_stop)

Comment on lines +41 to 44
stop = list(kwargs.pop('stop', []) or [])
stop.append('```')
return self._call_llm(
messages=messages, tools=tools, stop=stop, **kwargs)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Potential bug when stop is a string

If stop (popped from kwargs) is a string, calling list(stop) will split it into individual characters, which is not the intended behavior for stop sequences.

Check if stop is a string and wrap it in a list to preserve the sequence.

Suggested change
stop = list(kwargs.pop('stop', []) or [])
stop.append('```')
return self._call_llm(
messages=messages, tools=tools, stop=stop, **kwargs)
stop = kwargs.pop('stop', [])
if isinstance(stop, str):
stop = [stop]
else:
stop = list(stop or [])
stop.append('```')
return self._call_llm(
messages=messages, tools=tools, stop=stop, **kwargs)

Comment thread ms_agent/skill/catalog.py
os.makedirs(local_dir, exist_ok=True)

_owner, name = skill_id.split("/", 1)
skill_dir = os.path.join(local_dir, name)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Potential ValueError on single-segment skill_id

If skill_id does not contain a / (e.g., a custom or local skill), calling skill_id.split("/", 1) will return a list of length 1, raising a ValueError when unpacking into _owner, name.

Check if / is present in skill_id before splitting.

Suggested change
skill_dir = os.path.join(local_dir, name)
if "/" in skill_id:
_owner, name = skill_id.split("/", 1)
else:
name = skill_id

Comment on lines +118 to +120
denom = tf + self.k1 * (
1 - self.b + self.b * (dl / self._avgdl))
scores[i] += idf * (numer / denom)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Potential ZeroDivisionError

If all indexed documents are empty or contain no valid tokens, self._avgdl will be 0.0, causing a ZeroDivisionError when calculating dl / self._avgdl.

Guard against self._avgdl == 0 to prevent runtime crashes.

Suggested change
denom = tf + self.k1 * (
1 - self.b + self.b * (dl / self._avgdl))
scores[i] += idf * (numer / denom)
denom = tf + self.k1 * (
1 - self.b + self.b * (dl / self._avgdl if self._avgdl > 0 else 1.0))
scores[i] += idf * (numer / denom)

Comment on lines +116 to +121
def parse_input(text: str) -> tuple[str, str]:
stripped = text.strip()
parts = stripped.split(maxsplit=1)
cmd = parts[0].lstrip('/').lower()
args = parts[1] if len(parts) > 1 else ''
return cmd, args

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Potential IndexError on empty or whitespace-only input

If text is empty or contains only whitespace, stripped will be empty, and stripped.split(maxsplit=1) will return an empty list, causing an IndexError when accessing parts[0].

Add a guard to return empty strings early if stripped is empty.

Suggested change
def parse_input(text: str) -> tuple[str, str]:
stripped = text.strip()
parts = stripped.split(maxsplit=1)
cmd = parts[0].lstrip('/').lower()
args = parts[1] if len(parts) > 1 else ''
return cmd, args
@staticmethod
def parse_input(text: str) -> tuple[str, str]:
stripped = text.strip()
if not stripped:
return '', ''
parts = stripped.split(maxsplit=1)
cmd = parts[0].lstrip('/').lower()
args = parts[1] if len(parts) > 1 else ''

Comment thread ms_agent/project/store.py
tmp = self._path.with_suffix('.tmp')
with open(tmp, 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=2)
tmp.rename(self._path)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Windows Compatibility Issue with Path.rename

On Windows, Path.rename() raises a FileExistsError if the destination file already exists. To perform a cross-platform atomic replace, use Path.replace() instead.

Note that this also applies to other files in this PR where rename is used on temporary files (e.g., ms_agent/personalization/profile.py, ms_agent/personalization/settings.py, and ms_agent/config/skills_manager.py).

Suggested change
tmp.rename(self._path)
tmp.replace(self._path)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant