encode & decode 32-bit float (IEEE 754 standard) samples.#842
encode & decode 32-bit float (IEEE 754 standard) samples.#842aliheidary1381 wants to merge 3 commits intoxiph:masterfrom
Conversation
…standard) samples. I've also added support to encode & decode raw, wave riff & aiff float formats. The float encoding feature achieves a near 70% compression ratio, which is better than nothing. no oss-fuzz or tests have been added yet (sorry). the replay gain feature should also be expanded in the future to support the new feature. sorry for the big PR. it's pretty readable tho! I tried to make it as modular & independent as possible. documentation is good. should this make it to a release version, an update to the standard RFC could also be considered. the changes are backwards-compatible and are as follows: 1. when storing float samples, the bps bits in the streaminfo metadata block should be 0b00000. 2. when storing float samples, the zero-padded bit, originally reserved, after the bit-depth bits of each frame header should be 1. 3. when storing float samples, the actual data samples stored in each subframe are obtained by doing some bit manipulation before encoding and after decoding. That part is in src/libFLAC/transform_float.c and is necessary to boost the compression (more info in the file comments). Monkey's Audio .ape does something similar, but this one seems to achieve better ratios (~8% improvement). sorry for the unorthodox type conversions. will fix it if I see it being considered for a merge.
|
Hi, Many thanks for filing a PR. I haven't tested it, and as you might understand, this will need a lot of testing :) I've tried doing this in the past, but didn't get very far. float PCM coding has been a often-requested feature, so this would be a great addition in that regard. I have some "objections on principal grounds" to floating point audio in FLAC. For example, floating point audio isn't meant for playback. Also, because this breaks forward compatibility. It feels to me like this will cause confusion for a lot of users, as these FLAC files will not playback on current equipment. But then again, that was also the case (and still is for some newly manufactured equipment) for 24-bit audio. I'm not saying this won't be merged, but I'll need to hear a few opinions of other people involved in the FLAC project. Still, I already know a lot of people want this. I am surprised you didn't need any new coding methods (subframe types or residual coding methods). I think this is really nice. FLAC is also used for compression measurement signals (for scientific experiments and such) where this also might be useful. Not sure whether your bit manipulation works for that as well. After I test this myself (which might take a while), I would like to propose this change through various channels with FLAC enthousiasts. However, before that, I would like to make the format changes future-proof. The problem is that the reserved bit after the bit depth bits that you use, is the last remaining reserved bit. Use of this makes it impossible for the frame header to extended any further in the future. I would like this bit to have a different function: signal an extension to the frame header with one byte of extra flag bits (or feature bits) and a channel mask. Anyway, many thanks for taking on this task. Please be patient, as merging might take quite a while, and publishing a release after that a while longer. |
|
By the way, perhaps we should also define a new STREAMINFO metadata block variant. |
|
Thx!
There's also another reserved bit in the frame header, just after the sync bits at the very beginning (
I don't think it'd be necessary. A bps of 1-3 is forbidden, according to the current standard rfc. Using them seems like a no-brainer to me. It also suggests (to the older decoders) to stop playing these (cause of the forbidden value), and requires minimal changes in the standard and other implementations. p.s. Sorry for closing the PR, wrong button 😅 |
No, there isn't. The code still refers to it, but the RFC made it part of the sync code. That is to make it distinguishable from MPEG. See here: https://lists.xiph.org/pipermail/flac-dev/2008-December/002607.html So, we only have one bit remaining.
When it is necessary to deviate from the standard, I'd like to do it in a clear and conscious way. So, there'll be more features incorporated in a new streaminfo metadata block, like increasing the max number of samples or perhaps increasing the max samplerate and the total number of samples. Still really niche stuff, so it is really only necessary for exotic stuff. The thing is, like using floats (which is a niche), there are others using FLAC for stuff it wasn't made for, like RF captures. So, this will take a long time. |
Oh, OK. |
|
I completely agree both on
There is also the 64-bit float format (in both endiannesses) - not that it is any more urgently needed for listening, but for compatibility (edit: with DAW plugins, for example) it wouldn't be a bad thing for a FLAC plugin to be able to handle "everything" the application could save. (64-bits would likely need more than 5-bit Rice, but who cares if that element also makes today's decoders err out on something they cannot decode.) Here is a part of a possible solution, if we adopt the following view:
The FLAC format admits sample rate "0" for non-audio. It could also be used for "audio in files that are stored as non-audio", with the FLAC format being used to compress the file and not just the audio stream: interpret the use of "0" as "files you need to treat as a full file", and a player/DAW should then skip it unless it knows what it is doing. There are a bunch of APPLICATION block types left to be used. You got ones for foreign metadata already, but here you might consider ones for mandatory file headers/footers (I think footer would be potentially more crucial for float than for integer, with possible metadata chunks for volume?!), and maybe one for audio properties the decoder needs to reconstruct the files (endianness, and signendness although that isn't applicable for float) - and source file extension (like WavPack does)? So the workflow of a "player"/DAW would then be:
This could also make it possible to store AIFF(/CAF) with non-integer sampling rates - or object-based audio in the BW64 format (like Monkey's and WavPack do), as there would be no need for FLAC to "understand" the objects metadata. Sure the _en_coder does need to know what a sample and a channel is, and if it is not aware of the input format it could be force-fed with "raw format options" plus OptimFROG-stype --headersize and --tailsize. As long as the source file is structured in the order header--audio--footer and nothing between audio and audio, then it would future-proof against new file types? (And past-proof enough to encode .au and A-law and µ-law ... just what the world has not been waiting for.)
In frame headers then:
And more:
|
|
How does WavPack (or its patented competitor, DST) compress 1-bit PWM streams, I don't know. I suppose we also need to reserve some space for defining new subframe types later? AFAIK, with As for the frame header, I will use bit_depth= I'm more in favour of defining a new STREAMINFO_EXTENSION metadata block type (instead of wrapping it inside an APPLICATION block), with the same workflow you said.
|
|
Oof my bad, I wrote APPLICATION when I meant "metadata". I agree yes, different (new) type - there are 120 left, not running out soon. STREAMINFO_EXTENSION to inform a "new audio type-aware" decoder/player how to play, and a "this block type-aware but won't play" decoder how to order the decoded bits wrt. endianness and signedness and interleaving. Then I suggest types to inform about and store file header and footer when those are "mandatory" to get output right; If full file headers/footers are indeed indispensable, there should be a way to get them past the 16MiB by spanning them over several blocks if necessary; whether the STREAMINFO_EXTENSION contains the info about that or there are just block types for "first" and "continued" header and ditto footer ... there are many ways.
Though I tend to disagree (I think that "subset" should not become more permissible, that would put new demands on finalized software that claims to decode subset), and maybe it has to be sorted out before committing, but ... not needed yet. |
|
Well, now that we're modifying the STREAMINFO block, we can also replace the 3-byte block size indicator with an elias gamma representation... |
|
Sorry guys... Been busy lately. |
|
This needs a really thorough review and a lot of work to make it happen. Lately, I haven't had a lot of time to work on this, and it might take quite a while before I do. |
|
The only backwards compatible way to extend the 16 MiB block size is to add a new block type, e.g. FLAC__METADATA_TYPE_EXTENDED. This is then chained one after another, and is a container for other blocks, which can then be defined with a new header. For example, a 40 MiB metadata block would need to be split up in 3 of those extended blocks. But I don't think this is needed for this float proposal, which mostly needs a new FLAC__METADATA_TYPE_STREAMINFO2 block type. +1 for 64-bit float support, but if we're thinking ahead, don't forget (b)float16_t, float128_t. Extending the blocksize is useful for big pictures, but also for future-proofing. It would be useful to have a new way of encoding metadata blocks, since you could add optional checksums, compression, etc. I can start a new discussion topic about this, maybe others have good ideas about it too. |
| #define FLAC__STREAM_SYNC_LENGTH (4u) | ||
|
|
||
| typedef enum { NOT_SPECIFIED = -1, | ||
| INT = 0, |
There was a problem hiding this comment.
Since this is put into the global scope/namespace, I suggest to add a prefix like SAMPLE_TYPE_ for all enum values, or even FLAC__SAMPLE_TYPE_.
Especially INT is a bit too short and too common name as to not be confusing.
|
|
||
| typedef enum { NOT_SPECIFIED = -1, | ||
| INT = 0, | ||
| FLOAT = 1 } SampleType; // TODO: merge with endian & unsigned |
There was a problem hiding this comment.
The name of the type should be FLAC__SampleType, to follow naming style
| @@ -1,4 +1,5 @@ | |||
| option(ENABLE_64_BIT_WORDS "Set FLAC__BYTES_PER_WORD to 8, for 64-bit machines. For 32-bit machines, turning this off might give a tiny speed improvement" ON) | |||
| option(ENABLE_EXPERIMENTAL_FLOAT_SAMPLE_CODING "Enable experimental feature to encode & decode 32-bit float (IEEE 754 standard) samples" ON) | |||
There was a problem hiding this comment.
Should be OFF by default?
| #include "share/compat.h" | ||
| #include "decode.h" | ||
|
|
||
| #define sample_type(is_float) is_float == FLOAT ? "float" : "int" |
There was a problem hiding this comment.
Make this a static inline function instead, sample_type is too short name as to not be confused for a local variable somewhere. Especially also since it is used as sample_type(sample_type).
The name could be something more like FLAC__get_sample_type_string().
Don't forget about the undefined enum value (-1), for debugging purposes I'd recommend to also check and print that one instead of "int" for everything else.
| /* first part of GUID */ | ||
| if(!read_uint16(e->fin, /*big_endian=*/false, &x, e->inbasefilename)) | ||
| return false; | ||
| if(x != 1) { |
There was a problem hiding this comment.
Here you should also check for 3, since that is when WAVEFORMATEXTENSIBLE is used for 32-bit float samples.
| else if(0 == strcmp(opt, "set-sample-type")) { | ||
| FLAC__ASSERT(0 != option_argument); | ||
| op = append_shorthand_operation(options, OP__SET_SAMPLE_TYPE); | ||
| if(strcmp(option_argument, "float") == 0 || strcmp(option_argument, "IEEE754") == 0 || strcmp(option_argument, "binary32") == 0) { |
There was a problem hiding this comment.
same here, only allow "float", and "int" below
| subtracting an (automatically recognised) DC offset from the "exponents" channel | ||
| and storing the offset in each frame header (i.e. unsigned to signed conversion) | ||
| could be better (haven't tried it yet. hard for me to implement). | ||
| my guess would be a *consistent* ~60% ratio, at least. |
There was a problem hiding this comment.
a 10% improvement in compression might be worth investigating a bit further? Would need more discussions on how to split up the signal in "sub-channels", maybe too much complexity to be worth it.
How does the current (choice 2) compare to other coders like WavPack? If it is already as good or very close, then it might not be worth it.
| return false; | ||
|
|
||
| /* GUID = {0x00000001, 0x0000, 0x0010, {0x80, 0x00, 0x00, 0xaa, 0x00, 0x38, 0x9b, 0x71}} */ | ||
| if(flac__utils_fwrite("\x01\x00\x00\x00\x00\x00\x10\x00\x80\x00\x00\xaa\x00\x38\x9b\x71", 1, 16, f) != 16) |
There was a problem hiding this comment.
You need to handle float here also for waveformatextensible, the first value of the GUID would be 3 for float
|
|
||
| #if ENABLE_EXPERIMENTAL_FLOAT_SAMPLE_CODING | ||
| /* sanity-check the sample type */ | ||
| if(decoder_session->sample_type != -1) { |
There was a problem hiding this comment.
Use the enum value, don't hardcode -1
| #if ENABLE_EXPERIMENTAL_FLOAT_SAMPLE_CODING | ||
| if(decoder_session->sample_type == FLOAT && metadata->data.stream_info.sample_type != FLOAT) { | ||
| stats_print_name_and_stream_number(1, decoder_session->inbasefilename, decoder_session->stream_counter); | ||
| flac__utils_printf(stderr, 1, "ERROR, this link's STREAMINFO is set to float PCM format but was int PCM in previous one\n"); |
There was a problem hiding this comment.
Use the sample_type string function here, so the name is consistent everywhere
I've also added support to encode & decode raw, wave riff, & aiff float formats.
The float encoding feature achieves a near 70% compression ratio, which is better than nothing. No oss-fuzz or tests have been added yet (sorry). The replay gain feature should also be expanded in the future to support the new feature. Sorry for the big PR. It's pretty readable tho! I tried to make it as modular & independent as possible. Documentation is good. Should this make it to a release version, an update to the standard RFC could also be considered. The changes are backwards-compatible and are as follows: