Audiovisual dataset loader by vivekvjyn · Pull Request #698 · mir-dataset-loaders/mirdata

vivekvjyn · 2026-02-07T16:23:43Z

New dataset loader saraga_audiovisual. Link: zenodo
Dataset contains video recordings and pose estimations.
load_video has an additional optional dependency "opencv-python"
For pose estimations, a new annotation GestureData is used with keypoints and scores.

… demand that

codecov · 2026-02-19T12:15:42Z

Codecov Report

❌ Patch coverage is 93.16239% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.07%. Comparing base (b95bf38) to head (9a41eb1).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #698      +/-   ##
==========================================
- Coverage   97.13%   97.07%   -0.06%     
==========================================
  Files          71       72       +1     
  Lines        7825     7942     +117     
==========================================
+ Hits         7601     7710     +109     
- Misses        224      232       +8

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yujin-kimmm

Hi @vivekvjyn , Thank you for this PR! I left some comments in the review. Please check it and let me know if you have any question about it.

yujin-kimmm · 2026-02-25T20:23:01Z

mirdata/datasets/saraga_audiovisual.py

+
+
+def load_video(video_path):
+    """Load a Saraga Audiovisual video file.


For Video dependencies, mirdata is using moviepy instead of opencv. Therefore, we suggest to use moviepy for loading video frames. You can check load_video from Multivox loader as a reference.

yujin-kimmm · 2026-02-25T20:30:39Z

mirdata/datasets/saraga_audiovisual.py

+        return load_metadata(self.metadata_path)
+
+    @core.cached_property
+    def audio(self):


I'm not sure why the audio should be cached for this loader. Is there specific reason for this?

yujin-kimmm · 2026-02-25T20:38:28Z

mirdata/datasets/saraga_audiovisual.py

+            license_info=LICENSE_INFO,
+        )
+
+    def load_audio(self, *args, **kwargs):


I believe this is the old API that is deprecated from version 0.3.4. We suggest to follow the instruction and example for loader module here.

yujin-kimmm · 2026-02-25T20:39:30Z

mirdata/datasets/saraga_audiovisual.py

+
+
+@io.coerce_to_string_io
+def load_metadata(fhandle):


Please check this comment from Class Dataset.

yujin-kimmm · 2026-02-25T20:42:09Z

mirdata/datasets/saraga_audiovisual.py

+        self.metadata_path = self.get_path("metadata")
+
+    @core.cached_property
+    def metadata(self):


Please check this comment from Class Dataset.

yujin-kimmm · 2026-02-25T21:08:53Z

docs/source/table.rst


+   * - Saraga Audiovisual
+     - - audio: ✅
+         - annotations: ✅


There is indent error here. It should be same indent as audio above. You can check the error here

yujin-kimmm · 2026-02-25T21:47:00Z

mirdata/datasets/saraga_audiovisual.py

+    "1.0": core.Index(
+        filename="saraga_audiovisual_index.json",
+        url="https://zenodo.org/records/18291024/files/saraga_audiovisual_index.json?download=1",  # TODO
+        checksum="b847ca946f2a88956569c897b186a148 ",  # TODO


...saraga_audiovisual_index.json has an MD5 checksum (b847ca946f2a88956569c897b186a148) differing from expected (b847ca946f2a88956569c897b186a148 ), file may be corrupted.

At the end of the checsum string, there is an empty space, and that occurs the checksum error.

yujin-kimmm · 2026-02-25T22:08:56Z

scripts/make_saraga_audiovisual_index.py

+                        )
+                        audio_vocal = (audio_vocal_path, audio_vocal_checksum)
+
+            video_path = os.path.join(DATASET + " visual", concert, song, song + ".mov")


From Zenodo, (and also from the remote), not all the videos are available for matching audio (or concert in this context). However, this script is creating all the video path for each track, which makes dataset.validate() fails. Please check it and apply those to the index.

yujin-kimmm · 2026-02-25T22:16:11Z

docs/source/table.rst

     - .. image:: https://img.shields.io/badge/License-MIT-blue.svg
          :target: https://lbesson.mit-license.org/

+   * - .. ::


Mind if I ask why this is added again? It's already in line 460

yujin-kimmm · 2026-02-25T23:11:14Z

mirdata/annotations.py

        self.confidence = confidence


+class GestureData(Annotation):


Question: For gesture data (keypoints, scores), can't it be just a Track attribute instead of adding having a separate class for this? or am I missing some context?

vivekvjyn and others added 16 commits November 24, 2025 15:15

audiovisual loader

9da8d13

audiovisual loader

1669422

writing tests

4898a9d

writing tests

4676a26

writing tests

9933911

writing tests

973735b

tests passed

a493767

tests passed

22949a3

tests and docs

006bf83

Merge branch 'master' into audiovisual

7cdef87

opencv for git workflow

690b0ac

made changes to include IOExceptions

e00eb80

made changes to include IOExceptions

a897cc2

array length error fix

1944920

remove array length validation for gesture because the dataset do not…

8e8f7bb

… demand that

remove array length validation for gesture because the dataset do not…

5aa2210

… demand that

codecov full covered

f21eeb6

vivekvjyn changed the title ~~Audiovisual dataset loader~~ [WIP] Audiovisual dataset loader Feb 19, 2026

vivekvjyn added 3 commits February 19, 2026 13:48

codecov full covered

8746032

fixed readthedocs errors

9d389ad

fixed readthedocs errors

9a41eb1

vivekvjyn changed the title ~~[WIP] Audiovisual dataset loader~~ Audiovisual dataset loader Feb 20, 2026

yujin-kimmm self-requested a review February 25, 2026 20:17

yujin-kimmm reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audiovisual dataset loader#698

Audiovisual dataset loader#698
vivekvjyn wants to merge 20 commits intomir-dataset-loaders:masterfrom
vivekvjyn:audiovisual

vivekvjyn commented Feb 7, 2026

Uh oh!

codecov bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

yujin-kimmm left a comment

Uh oh!

yujin-kimmm Feb 25, 2026

Uh oh!

yujin-kimmm Feb 25, 2026

Uh oh!

yujin-kimmm Feb 25, 2026

Uh oh!

yujin-kimmm Feb 25, 2026

Uh oh!

yujin-kimmm Feb 25, 2026

Uh oh!

yujin-kimmm Feb 25, 2026

Uh oh!

yujin-kimmm Feb 25, 2026

Uh oh!

yujin-kimmm Feb 25, 2026

Uh oh!

yujin-kimmm Feb 25, 2026

Uh oh!

yujin-kimmm Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		def load_video(video_path):
		"""Load a Saraga Audiovisual video file.

Conversation

vivekvjyn commented Feb 7, 2026

Uh oh!

codecov bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yujin-kimmm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Feb 19, 2026 •

edited

Loading