You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: implementations/multimedia_rag/README.md
+60-19Lines changed: 60 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,35 +51,81 @@ This is required for embedding video/audio segments.
51
51
mkdir -p data
52
52
```
53
53
54
-
### VQA JSON Files (Place in `data/`)
54
+
### VQA JSON Files
55
55
56
-
Download:
56
+
These are included in the GCP download below for convenience — same files as the SONIC-O1 HuggingFace dataset. They define the multiple-choice video QA tasks.
Files are placed correctly after extraction — no manual reorganisation needed.
94
+
95
+
#### 3) Cleanup temporary files
79
96
80
-
* video
81
-
* audio
82
-
* caption
97
+
```bash
98
+
rm -f __MACOSX data.zip data/.DS_Store
99
+
```
100
+
101
+
The zip contains everything needed to run the notebooks:
102
+
103
+
```
104
+
data/
105
+
├── Customer_Service_Interactions/
106
+
│ ├── audio/ # base audio files
107
+
│ ├── video/ # base video files
108
+
│ ├── caption/ # base caption files
109
+
│ ├── process-audio/ # pre-generated, can be regenerated
110
+
│ ├── process-video/ # pre-generated, can be regenerated
111
+
│ ├── segment-audio_30s/ # pre-generated, can be regenerated
112
+
│ ├── segment-video_30s/ # pre-generated, can be regenerated
113
+
│ ├── segment-caption_30s/ # pre-generated, can be regenerated
114
+
│ ├── audio_embeddings.pt # pre-generated, can be regenerated
115
+
│ ├── video_embeddings.pt # pre-generated, can be regenerated
116
+
│ └── caption_embeddings.pt # pre-generated, can be regenerated
117
+
├── Job_Interviews/ # same structure as above
118
+
├── Patient-Doctor_Consultations/ # same structure as above
119
+
├── global_embeddings/ # pre-generated, can be regenerated
120
+
├── Customer_Service_Interactions.json
121
+
├── Customer_Service_Interactions_filtered.json # pre-generated, can be regenerated
122
+
├── Job_Interviews.json
123
+
├── Job_Interviews_filtered.json # pre-generated, can be regenerated
124
+
├── Patient-Doctor_Consultations.json
125
+
└── Patient-Doctor_Consultations_filtered.json # pre-generated, can be regenerated
126
+
```
127
+
128
+
Pre-generated files (`process-*`, `segment-*`, `*.pt` embeddings, `global_embeddings/`, `*_filtered.json`) are included to save time, but can all be reproduced by running the notebooks from scratch.
83
129
84
130
---
85
131
@@ -122,11 +168,6 @@ This installs everything needed for both the Video RAG (ImageBind embedding + re
***Do not download the ```train_raw.parquet```, use the ```train_sponsor_filtered.parquet``` for data_sky or ```train_singleturn_sponsor_filtered.parquet``` for data_hh_rlhf***
Files are placed correctly after extraction — no manual reorganisation needed.
109
+
110
+
#### 3) Cleanup temporary files:
111
+
112
+
```bash
113
+
rm -f __MACOSX data.zip data/.DS_Store
114
+
```
94
115
95
-
After downloading, place the ```.parquet``` file inside one of the following folders (create the folder if it does not exist):
96
-
```data_sky/``` or
97
-
```data_hh_rlhf/```
98
-
Then proceed with:
116
+
> **Note:** Use `train_sponsor_filtered.parquet` (for `data_sky`) and `train_singleturn_sponsor_filtered.parquet` (for `data_hh_rlhf`).
99
117
100
-
```01_dataset_construction.ipynb```
118
+
Then proceed with `01_dataset_construction.ipynb`.
101
119
102
120
## Using Your Own Dataset
103
121
@@ -196,7 +214,7 @@ source .venv/bin/activate
196
214
`flash-attn` requires CUDA headers and `setuptools` at compile time and cannot be installed via `uv sync`. After activating the venv, install it manually:
0 commit comments