Skip to content

Audiovisual dataset loader#698

Open
vivekvjyn wants to merge 20 commits intomir-dataset-loaders:masterfrom
vivekvjyn:audiovisual
Open

Audiovisual dataset loader#698
vivekvjyn wants to merge 20 commits intomir-dataset-loaders:masterfrom
vivekvjyn:audiovisual

Conversation

@vivekvjyn
Copy link
Contributor

  • New dataset loader saraga_audiovisual. Link: zenodo
  • Dataset contains video recordings and pose estimations.
  • load_video has an additional optional dependency "opencv-python"
  • For pose estimations, a new annotation GestureData is used with keypoints and scores.

@codecov
Copy link

codecov bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 93.16239% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.07%. Comparing base (b95bf38) to head (9a41eb1).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #698      +/-   ##
==========================================
- Coverage   97.13%   97.07%   -0.06%     
==========================================
  Files          71       72       +1     
  Lines        7825     7942     +117     
==========================================
+ Hits         7601     7710     +109     
- Misses        224      232       +8     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vivekvjyn vivekvjyn changed the title Audiovisual dataset loader [WIP] Audiovisual dataset loader Feb 19, 2026
@vivekvjyn vivekvjyn changed the title [WIP] Audiovisual dataset loader Audiovisual dataset loader Feb 20, 2026
@yujin-kimmm yujin-kimmm self-requested a review February 25, 2026 20:17
Copy link
Collaborator

@yujin-kimmm yujin-kimmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vivekvjyn , Thank you for this PR! I left some comments in the review. Please check it and let me know if you have any question about it.



def load_video(video_path):
"""Load a Saraga Audiovisual video file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Video dependencies, mirdata is using moviepy instead of opencv. Therefore, we suggest to use moviepy for loading video frames. You can check load_video from Multivox loader as a reference.

return load_metadata(self.metadata_path)

@core.cached_property
def audio(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why the audio should be cached for this loader. Is there specific reason for this?

license_info=LICENSE_INFO,
)

def load_audio(self, *args, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the old API that is deprecated from version 0.3.4. We suggest to follow the instruction and example for loader module here.



@io.coerce_to_string_io
def load_metadata(fhandle):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check this comment from Class Dataset.

self.metadata_path = self.get_path("metadata")

@core.cached_property
def metadata(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check this comment from Class Dataset.


* - Saraga Audiovisual
- - audio: ✅
- annotations: ✅
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is indent error here. It should be same indent as audio above. You can check the error here

"1.0": core.Index(
filename="saraga_audiovisual_index.json",
url="https://zenodo.org/records/18291024/files/saraga_audiovisual_index.json?download=1", # TODO
checksum="b847ca946f2a88956569c897b186a148 ", # TODO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...saraga_audiovisual_index.json has an MD5 checksum (b847ca946f2a88956569c897b186a148) differing from expected (b847ca946f2a88956569c897b186a148 ), file may be corrupted.

At the end of the checsum string, there is an empty space, and that occurs the checksum error.

)
audio_vocal = (audio_vocal_path, audio_vocal_checksum)

video_path = os.path.join(DATASET + " visual", concert, song, song + ".mov")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Zenodo, (and also from the remote), not all the videos are available for matching audio (or concert in this context). However, this script is creating all the video path for each track, which makes dataset.validate() fails. Please check it and apply those to the index.

- .. image:: https://img.shields.io/badge/License-MIT-blue.svg
:target: https://lbesson.mit-license.org/

* - .. ::
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind if I ask why this is added again? It's already in line 460

self.confidence = confidence


class GestureData(Annotation):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: For gesture data (keypoints, scores), can't it be just a Track attribute instead of adding having a separate class for this? or am I missing some context?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants