Add --escapes, \n-style escapes to allow multiline comments by madah81pnz1 · Pull Request #891 · xiph/flac

madah81pnz1 · 2026-03-05T06:40:09Z

Same as how it works in vorbiscomment, except \0 is not supported.

\0 is problematic to support in C code, and it is ambiguous what something like \012 should mean, given that octal escapes are not supported by vorbiscomment anyway.

Fixes #232

Same as how it works in vorbiscomment, except \0 is not supported. \0 is problematic to support in C code, and it is ambiguous what something like \012 should mean, given that octal escapes are not supported by vorbiscomment anyway. Fixes xiph#232

madah81pnz1 · 2026-03-05T06:46:29Z

I tried to be compatible with vorbiscomment, but decided to drop support for \0, since that doesn't really work in vorbiscomment anyway. I don't think it's worth supporting embedded \0, as many tools either have problems or just silently truncates the string.

The case for supporting \0 is that it makes it possible for the user to repair a faulty tag; by exporting the escaped tags, it should be easy to remove "\0" in any text editor, and recover the end part of the tag after it. Resaving such tag in other audio players or tools might truncate the tag at the first \0, which is what vorbiscomment itself claims to do.

\0 is also just a special case of an octal escape sequence. But those are a bit ambiguous, e.g. what should \012 mean; is it \0 + "12", or \n. This means for octal escape encoding, either always use 3 characters, or check if the next unescaped character is a number 0..7, and if so use 3 characters. So \0 + "0" would be encoded as \0000, and \0 + "A" as \0A. Interestingly, Rust does not support octal escape sequences, but does support \0 as a special case.

\x have similar problems, due to how it is defined in C and C++ as "arbitrary number of hexadecimal digits". Consider if you have a long string that only consists of 0-9 and a-f characters, then it's impossible to have a \x escape sequence in the middle of it, as there is no way to tell where it ends, e.g. "\x123456789". Python changed its \x in PEP-223 to be exactly two hex digits, Rust does the same.

I think it's more useful to support \u and \U instead, which would let you export and import without the risk of UTF-8 turning into mojibake. But even the need for this should be rare, and --no-utf8-convert can already be used as a workaround if needed. In most languages, \u is exactly 4 hex digits, and \U is 8. The exception is Bash that allows variable hex digits like for \x. Bash otherwise follows the C standard escaping, so this is some discrepancy.

Another thing to consider is that the exported tags might be processed further by other user tools and scripts, so it makes sense to keep the escaping simple. Replacing newlines with \n should be trivial in any language, but \u requires more parsing logic and a full UTF-8 encoder/decoder.

References:
https://en.cppreference.com/w/c/language/escape.html
https://en.cppreference.com/w/cpp/language/escape.html
https://peps.python.org/pep-0223/
https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences
https://doc.rust-lang.org/reference/tokens.html#ascii-escapes

madah81pnz1 · 2026-03-09T04:36:39Z

src/share/grabbag/escapes.c

+#include <stddef.h>
+#include "share/grabbag.h"
+
+FLAC__bool grabbag__escape_string_needed(const char *src, size_t src_size)


Since this is just a performance optimization, and running escape/unescape over all tags shouldn't be a bottleneck, this function could be removed.

madah81pnz1 · 2026-03-09T04:38:00Z

src/share/grabbag/escapes.c

+		switch(src[s]) {
+			case '\\':
+				dst[d++] = '\\';
+				dst[d++] = '\\';


This might overrun d by one byte, but shouldn't be a problem in this specific usage, as the calling functions has +1 byte available for the null terminator.
Should be fixed, add a if (d + 1 < dst_size) to also prevent writing a partially escaped character.

madah81pnz1 · 2026-03-09T04:38:33Z

src/share/grabbag/escapes.c

+FLAC__bool grabbag__unescape_string_needed(const char *src, size_t src_size)
+{
+	for(size_t s = 0; s < src_size; ++s) {
+		if(src[s] == '\\') {


Could use strchr() instead here, but at the time I wasn't sure if handling non-null terminated strings would be important.

madah81pnz1 · 2026-03-09T04:39:14Z

src/share/grabbag/escapes.c

+	return false;
+}
+
+static size_t grabbag__unescape_string_size(const char *src, size_t src_size)


The size could also be done by the unescape_string() if provided a NULL dst buffer, which is preferable in case the escaping becomes more complicated in the future.

madah81pnz1 · 2026-03-09T04:39:50Z

src/share/grabbag/escapes.c

+						dst[d++] = '\n';
+						break;
+					default:
+						/* Unsupported escape character */


Could add a "ignore errors" mode, which keeps all invalid/unrecognized escape sequences as-is. But for now I don't think it is useful.

madah81pnz1 · 2026-03-09T04:42:58Z

src/metaflac/utils.c

+					entry_length = strlen(allocated_str);
+				}
+				else {
+					/* error is ignored, the original value will be used */


There is no way to return an error in this function. But also grabbag__create_escaped_string() can't really fail unless malloc fails.

Ignoring errors is also done for the utf8_decode() call below, which would print the original string in case it fails. Maybe could be improved in the future, since --show-tag might show the tag in the wrong encoding to the user, without giving any indiciation or error, which would be surprising.

Update usage, add tests for --show-tag and unsupported escapes

903f40d

madah81pnz1 commented Mar 9, 2026

View reviewed changes

Check for buffer overrun in grabbag__escape_string()

00c65e2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --escapes, \n-style escapes to allow multiline comments#891

Add --escapes, \n-style escapes to allow multiline comments#891
madah81pnz1 wants to merge 3 commits intoxiph:masterfrom
madah81pnz1:add-cmdline-escapes

madah81pnz1 commented Mar 5, 2026

Uh oh!

madah81pnz1 commented Mar 5, 2026

Uh oh!

madah81pnz1 Mar 9, 2026

Uh oh!

madah81pnz1 Mar 9, 2026

Uh oh!

madah81pnz1 Mar 9, 2026

Uh oh!

madah81pnz1 Mar 9, 2026

Uh oh!

madah81pnz1 Mar 9, 2026

Uh oh!

madah81pnz1 Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

madah81pnz1 commented Mar 5, 2026

Uh oh!

madah81pnz1 commented Mar 5, 2026

Uh oh!

madah81pnz1 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

madah81pnz1 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

madah81pnz1 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

madah81pnz1 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

madah81pnz1 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

madah81pnz1 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant