Thank you for sharing your insightful research. I am currently working on reproducing your study and would like to request some clarification regarding the dataset preparation.
Specifically, I have the following questions:
Dataset Acquisition: What is the recommended procedure for downloading the OpenVid-1M dataset?
File Preparation: How should I generate the openvid-1m.parquet file required to execute the build_rag_database.py script? If there are specific preprocessing steps or conversion scripts needed to format the raw data into this parquet file, could you please provide guidance on that?
I look forward to your guidance. Thank you for your time and for contributing to the community.
Best regards,