Audiovisual dataset loader#698
Audiovisual dataset loader#698vivekvjyn wants to merge 20 commits intomir-dataset-loaders:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #698 +/- ##
==========================================
- Coverage 97.13% 97.07% -0.06%
==========================================
Files 71 72 +1
Lines 7825 7942 +117
==========================================
+ Hits 7601 7710 +109
- Misses 224 232 +8 🚀 New features to boost your workflow:
|
yujin-kimmm
left a comment
There was a problem hiding this comment.
Hi @vivekvjyn , Thank you for this PR! I left some comments in the review. Please check it and let me know if you have any question about it.
|
|
||
|
|
||
| def load_video(video_path): | ||
| """Load a Saraga Audiovisual video file. |
There was a problem hiding this comment.
For Video dependencies, mirdata is using moviepy instead of opencv. Therefore, we suggest to use moviepy for loading video frames. You can check load_video from Multivox loader as a reference.
| return load_metadata(self.metadata_path) | ||
|
|
||
| @core.cached_property | ||
| def audio(self): |
There was a problem hiding this comment.
I'm not sure why the audio should be cached for this loader. Is there specific reason for this?
| license_info=LICENSE_INFO, | ||
| ) | ||
|
|
||
| def load_audio(self, *args, **kwargs): |
There was a problem hiding this comment.
I believe this is the old API that is deprecated from version 0.3.4. We suggest to follow the instruction and example for loader module here.
|
|
||
|
|
||
| @io.coerce_to_string_io | ||
| def load_metadata(fhandle): |
| self.metadata_path = self.get_path("metadata") | ||
|
|
||
| @core.cached_property | ||
| def metadata(self): |
|
|
||
| * - Saraga Audiovisual | ||
| - - audio: ✅ | ||
| - annotations: ✅ |
There was a problem hiding this comment.
There is indent error here. It should be same indent as audio above. You can check the error here
| "1.0": core.Index( | ||
| filename="saraga_audiovisual_index.json", | ||
| url="https://zenodo.org/records/18291024/files/saraga_audiovisual_index.json?download=1", # TODO | ||
| checksum="b847ca946f2a88956569c897b186a148 ", # TODO |
There was a problem hiding this comment.
...saraga_audiovisual_index.json has an MD5 checksum (b847ca946f2a88956569c897b186a148) differing from expected (b847ca946f2a88956569c897b186a148 ), file may be corrupted.
At the end of the checsum string, there is an empty space, and that occurs the checksum error.
| ) | ||
| audio_vocal = (audio_vocal_path, audio_vocal_checksum) | ||
|
|
||
| video_path = os.path.join(DATASET + " visual", concert, song, song + ".mov") |
There was a problem hiding this comment.
| - .. image:: https://img.shields.io/badge/License-MIT-blue.svg | ||
| :target: https://lbesson.mit-license.org/ | ||
|
|
||
| * - .. :: |
There was a problem hiding this comment.
Mind if I ask why this is added again? It's already in line 460
| self.confidence = confidence | ||
|
|
||
|
|
||
| class GestureData(Annotation): |
There was a problem hiding this comment.
Question: For gesture data (keypoints, scores), can't it be just a Track attribute instead of adding having a separate class for this? or am I missing some context?
saraga_audiovisual. Link: zenodoload_videohas an additional optional dependency "opencv-python"GestureDatais used withkeypointsandscores.