VLN-MBA-VisualPerturbations

This repository is the official implementation of Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations.,(IJCNN2025 Accepted)

Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. To represent the previously visited environment, most approaches for VLN implement memory using recurrent states, topological maps, or top-down semantic maps. In contrast to these approaches, we build the top-down egocentric and dynamically growing Grid Memory Map (i.e., GridMM) to structure the visited environment. From a global perspective, historical observations are projected into a unified grid map in a top-down view, which can better represent the spatial relations of the environment. From a local perspective, we further propose an instruction relevance aggregation method to capture fine-grained visual clues in each grid region. Extensive experiments are conducted on both the REVERIE, R2R, SOON datasets in the discrete environments, and the R2R-CE dataset in the continuous environments, showing the superiority of our proposed method.

1. Requirements

Install Matterport3D simulator for R2R, REVERIE and SOON: follow instructions here.

export PYTHONPATH=Matterport3DSimulator/build:$PYTHONPATH

Install requirements:

conda create --name MBA python=3.8.5
conda activate MBA
pip install -r requirements.txt

2. Data Download

Download data from Dropbox, including processed annotations, features and pretrained models of REVERIE, SOON, R2R and R4R datasets. Put the data in `datasets' directory.
Download pretrained lxmert

mkdir -p datasets/pretrained 
wget https://nlp.cs.unc.edu/data/model_LXRT.pth -P datasets/pretrained

Download Clip-based rgb feature and Depth feature (glbson and imagenet) form (链接: https://pan.baidu.com/s/1lKend8xnwuy1uxn-aIDBtw?pwd=n8gv 提取码: n8gv) The ground truth depth image (undistorted_depth_images) is obtained from the Matterport Simulator, and depth view features are extracted through here:

python get_depth.py

The code is referenced from HAMT and here

3. Pretraining

The pretrained ckpts for REVERIE, R2R, SOON is at here. You can also pretrain the model by yourself, just change the pre training RGB of Duet from vit based to clip based. Combine behavior cloning and auxiliary proxy tasks in pretraining:

cd pretrain_src
bash run_r2r.sh # (run_reverie.sh, run_soon.sh)

4. Fine-tuning & Evaluation for `R2R`, `REVERIE` and `SOON`

Use pseudo interative demonstrator to fine-tune the model:

cd map_nav_src
bash scripts/run_r2r.sh # (run_reverie.sh, run_soon.sh)

5. Test

Our report results on the test set are from the official website of Eval.ai. R2R: https://eval.ai/web/challenges/challenge-page/97/submission REVERIE: https://eval.ai/web/challenges/challenge-page/606/overview SOON: https://eval.ai/web/challenges/challenge-page/1275/overview

6. Additional Resources

Panoramic trajectory visualization is provided by Speaker-Follower.
Top-down maps for Matterport3D are available in NRNS.
Instructions for extracting image features from Matterport3D scenes can be found in VLN-HAMT.

(copy from goat) We extend our gratitude to all the authors for their significant contributions and for sharing their resources.

7. Citation

@article{zhang2024seeing,
  title={Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations},
  author={Zhang, Xuesong and Li, Jia and Xu, Yunbo and Hu, Zhenzhen and Hong, Richang},
  journal={arXiv preprint arXiv:2409.05552},
  year={2024}
}

Acknowledgments

Our code is based on VLN-DUET and partially referenced from HAMT for extract view features. Thanks for their great works!

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
logs		logs
map_nav_src		map_nav_src
pretrain_src		pretrain_src
README.md		README.md
environment.txt		environment.txt
get_depth.py		get_depth.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLN-MBA-VisualPerturbations

1. Requirements

2. Data Download

3. Pretraining

4. Fine-tuning & Evaluation for `R2R`, `REVERIE` and `SOON`

5. Test

6. Additional Resources

7. Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VLN-MBA-VisualPerturbations

1. Requirements

2. Data Download

3. Pretraining

4. Fine-tuning & Evaluation for R2R, REVERIE and SOON

5. Test

6. Additional Resources

7. Citation

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

4. Fine-tuning & Evaluation for `R2R`, `REVERIE` and `SOON`

Packages