Feature request: Add error_handler_t support to json::parse() for tolerant handling of invalid UTF-8 in input strings #5053
aaronalbers
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
🚀 Feature Request: Tolerate / replace invalid UTF-8 during parsing
Current behavior
The library strictly enforces that input JSON text is valid UTF-8 (as per RFC 8259). When parsing a string value that contains invalid UTF-8 byte sequences,
json::parse()throws ajson::parse_error(usually code 101: "invalid string: ill-formed UTF-8 byte").This is correct per the JSON specification, but in real-world applications we often need to process "dirty" JSON from untrusted sources (user input, legacy systems, corrupted network data, logs, scraped web content, etc.) where strings may contain invalid sequences.
Desired feature
Add an optional
error_handler_tparameter (same enum already used indump()) to the mainjson::parse()overloads, allowing callers to choose:strict(current/default): throw on invalid UTF-8 (existing behavior)replace: replace invalid sequences with U+FFFD (�) and continue parsingignore: skip invalid bytes entirely and continue parsingExample proposed API (mirroring
dump()style):Beta Was this translation helpful? Give feedback.
All reactions