Cache H264/H265 GOPs in order to allow readers to decode frames immediately#4189
Cache H264/H265 GOPs in order to allow readers to decode frames immediately#4189jean343 wants to merge 16 commits intobluenviron:mainfrom
Conversation
|
This is a great work. The important thing is that you can confirm that the feature works and in which scenarios (protocols, codecs, with/without B-frames).
|
|
Thanks @aler9 for your kind comment! The feature does work in the following scenarios: Protocol:
Codecs:
I have only tried videos without b-frames as WebRTC does not support b-frames. There might be adjustments to make when dealing with b-frames over RTSP. In order to reduce RAM exhaustion, we do not cache anything until we get a key frame, this will prevent unsupported codecs from storing anything, and will save a little bit for supported codecs. As per additional codecs, I could not find a reliable way to detect their keyframes. In the WebRTC playback scenario, the PTS and Timestamp needs to be modified to prevent gaps. WebRTC will pause and stop playback if set incorrectly. For example, incorrect Timestamp will look like: Untitled.movCorrect timestamp will look like: Screen.Recording.2025-01-23.at.1.40.16.PM.mov |
|
You have missing fixes needed in order to align with base branch. you can reference them here: |
|
@angry-beaver i need to finish a couple of other things then i'll focus on this. In the meanwhile, @jean343 and @yairzahavi can try to go on by themselves. |
@jean343 I'll try and do it this soon including the av1 and ram exhaustion prevention and i'll ping you for a review. 🙌 |
|
Thanks everyone for the help. I merged from master and fixed the build! @yairzahavi, the AV1 work is awesome. I merged the AV1 work into this branch and fixed the merge conflicts. I did not change the CacheLength logic, it it's not clear that the additional logic helps performance. We should aim at making a final PR as coauthors! |
| if s.CachedUnits != nil { | ||
| s.CachedUnits = append(s.CachedUnits, u) | ||
| } | ||
| l := len(s.CachedUnits) | ||
| if l > maxCachedGOPSize { | ||
| s.CachedUnits = s.CachedUnits[l-maxCachedGOPSize:] | ||
| sf.decodeErrLogger.Log(logger.Warn, "GOP cache is full, dropping packets") | ||
| } | ||
| } |
There was a problem hiding this comment.
Thanks everyone for the help. I merged from master and fixed the build!
@yairzahavi, the AV1 work is awesome. I merged the AV1 work into this branch and fixed the merge conflicts. I did not change the CacheLength logic, it it's not clear that the additional logic helps performance.
You are right that the change is not clear and additionally it doesn't really work and i have yet to figure out why.
But the reason I tried and change it is that i inspected this code block and it seems there is another memory allocation when you surpass the maxCachedGOPSize
And if you truncate it afterwards the memory allocation already happened.
Additionally I tried to reduce memory allocations and usage by allocating a fixed size.
I hope you'd have a solution\idea for this.
There was a problem hiding this comment.
We could drop the entire cache once it reaches maxCachedGOPSize. It would solve memory allocations, and once the cache size reaches maxCachedGOPSize the GOP gets affected and the video player will need to wait for the next key frame regardless.
|
Another point i noticed is that the GOP cache causes higher webrtc jitter |
|
Hello, i've tested the patch, while the working principle is present, there some aspects that can be improved:
The GOP caching feature has always been difficult to implement because it has to take into consideration how players react when receiving a bunch of access units at the same time. It involves testing all possible ways to send the GOP, digging into source code of all players and codec specifications. The feature can be merged into the main branch only when a high level of compatibility with all major protocols and players is reached. out.mp4 |
|
Thank you @aler9 for testing and for your feedback. I did not expect to uncover this many corner cases when I started implementation :)
|
| atomic.AddUint64(s.bytesReceived, size) | ||
|
|
||
| if sf.gopCache && medi.Type == description.MediaTypeVideo { | ||
| if isKeyFrame(u) { |
There was a problem hiding this comment.
Would it be better to record from SPS/PPS? Or save the historical SPS/PSP and append to here.
| udpMaxPayloadSize: 1472 | ||
| # Enable GOP cache to improve initial playback experience for new clients. | ||
| # Note: will increase memory usage. | ||
| gopCache: false |
There was a problem hiding this comment.
maybe need parameter(in bytes ? per path ?) for more transparent memory control ?
There was a problem hiding this comment.
The cache is in-memory and does not need a path. I like the idea of specifying the cache size, in bytes or packets.
|
It's interest to see this feature is merged ? Any progress here. Thanks |
I would definitely like to see this being merged. I can not find a good solution for issue 1. in this comment. If we accelerate playback too much, client refuses to load video. Maybe we could indicate it in the docs, and leave feature disabled by default. |
|
This feature would be great for me too |
Thanks for your contribution! It seems that handling the previous frames (that occurs before the new reader comes into the server) is a big problem? The point of combining the p-frames reminds me, and I found a paper that implemented the feature in a very similar and effective way! In short, the article introduces a video decoder and a temp video encoder on the server side, always providing the newcomers with the newest frames (which are re-encoded from the normal stream, to a temp GOP that starts with a I-frame). I believe that this mechanism is effective and elegant, while the only problem is that it needs a kind of FFmpeg stuff on the server side, which I'm not sure whether we could implement by kind like external command hook, other than embedding the decoder and encoder inside MediaMTX. Refs: |
GOP Cache
This PR introduces Group of Pictures (GOP) caching to MediaMTX, enhancing its performance and reducing latency in streaming scenarios. By caching the last GOP for each stream, new subscribers can immediately receive the latest video data without waiting for the next keyframe, improving the user experience, especially for streams with long keyframe intervals.
This works for both H264 and H265, as well as for RTSP and WebRTC.
Configurable Cache Settings:
Introduced a new configuration parameter gopCache in mediamtx.yml for enabling/disabling GOP caching.
Fix: #1209