fix: support non-UTF-8 encodings in eval data loading by CodeForgeNet · Pull Request #4100 · microsoft/promptflow

CodeForgeNet · 2026-03-22T20:59:25Z

The eval SDK only read JSONL files as UTF-8. If your data had a BOM (utf-8-sig)
— common for multilingual content generated on Windows — it failed immediately
with ValueError: Expected object or value. Not helpful.

The fix adds BOM detection before reading and a fallback chain
(utf-8 → utf-8-sig → latin-1 → cp1252) so the loader handles real-world
files without requiring users to re-encode their data.

Three files touched:

promptflow/_utils/load_data.py — _pd_read_file() now detects encoding
before calling pd.read_json() on .jsonl files
evaluate/_evaluate.py — _validate_and_load_data() gets the same treatment
evaluate/_utils.py — load_jsonl() updated with BOM detection + fallback

Added a utf-8-sig encoded test file with multilingual content and a unit test
that would have caught this from the start.

Checklist

No breaking changes
Read the contribution guidelines
New dependencies are MIT compatible
CHANGELOG updated
Test coverage included for the change

Fixes microsoft#3670 pd.read_json defaulted to UTF-8 only. Files encoded with utf-8-sig (BOM) raised ValueError: Expected object or value. - Added _detect_encoding() BOM detection in load_data.py, _evaluate.py, _utils.py - Added fallback encoding chain: utf-8, utf-8-sig, latin-1, cp1252 - Improved error messages to show which encodings were attempted - Added test case and utf-8-sig encoded test data file

CodeForgeNet · 2026-03-22T21:01:42Z

@microsoft-github-policy-service agree

CodeForgeNet requested review from a team as code owners March 22, 2026 20:59

github-actions bot added promptflow-core promptflow-evals external labels Mar 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: support non-UTF-8 encodings in eval data loading#4100

fix: support non-UTF-8 encodings in eval data loading#4100
CodeForgeNet wants to merge 1 commit intomicrosoft:mainfrom
CodeForgeNet:fix/eval-utf8-encoding-support

CodeForgeNet commented Mar 22, 2026

Uh oh!

CodeForgeNet commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CodeForgeNet commented Mar 22, 2026

Checklist

Uh oh!

CodeForgeNet commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant