Skip to content
/ ADDA Public

Code for Acoustically-Driven Dynamic Alignment with Differential Attention for Weakly-Supervised Audio-Visual Event Parsing

Notifications You must be signed in to change notification settings

MMVAT/ADDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Acoustically-Driven Hierarchical Alignment with Differential Attention for Weakly-Supervised Audio-Visual Video Parsing

This is the official code for the Acoustically-Driven Hierarchical Alignment with Differential Attention for Weakly-Supervised Audio-Visual Video Parsing.

image

πŸ’» Machine environment

  • Ubuntu version: 20.04.6 LTS (Focal Fossa)
  • CUDA version: 12.2
  • PyTorch: 1.12.1
  • Python: 3.10.12
  • GPU: NVIDIA A100-SXM4-40GB

πŸ›  Environment Setup

A conda environment named adda can be created and activated with:

conda env create -f environment.yaml
conda activate adda

πŸ“‚ Data Preparation

Annotation files

Please download LLP dataset annotations (6 CSV files) from AVVP-ECCV20 and place them in data/.

CLAP- & CLIP-extracted features

Please download the CLAP-extracted features (CLAP.7z) and CLIP-extracted features (CLIP.7z) from this link, unzip the two files, and place the decompressed CLAP-related files in data/feats_CLAP/ and the CLIP-related files in data/feats_CLIP/.

File structure for datasets

Please make sure that the file structure is the same as the following.

data/                                
β”‚   β”œβ”€β”€ AVVP_dataset_full.csv               
β”‚   β”œβ”€β”€ AVVP_eval_audio.csv             
β”‚   β”œβ”€β”€ AVVP_eval_visual.csv                 
β”‚   β”œβ”€β”€ AVVP_test_pd.csv                
β”‚   β”œβ”€β”€ AVVP_train.csv                     
β”‚   β”œβ”€β”€ AVVP_val_pd.csv                      
β”‚   β”œβ”€β”€ feats/                                
β”‚   β”‚   β”œβ”€β”€ CLIP/        
β”‚   β”‚   β”‚   β”œβ”€β”€ -0A9suni5YA.npy
β”‚   β”‚   β”‚   β”œβ”€β”€ -0BKyt8iZ1I.npy
β”‚   β”‚   β”‚   └── ... 
β”‚   β”‚   β”œβ”€β”€ CLAP/              
β”‚   β”‚   β”‚   β”œβ”€β”€ -0A9suni5YA.npy
β”‚   β”‚   β”‚   β”œβ”€β”€ -0BKyt8iZ1I.npy
β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   └── ...

πŸŽ“ Download trained models

Please download the trained models from this link and put the models in their corresponding model directory.

πŸ”₯ Training and Inference

We provide bash file for a quick start.

For Training

bash train.sh

For Inference

bash test.sh

🀝 Acknowledgement

We build ADDA codebase heavily on the codebase of AVVP-ECCV20, VALOR. We sincerely thank the authors for open-sourcing! We also thank CLIP and CLAP for open-sourcing pre-trained models.

About

Code for Acoustically-Driven Dynamic Alignment with Differential Attention for Weakly-Supervised Audio-Visual Event Parsing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published