Multimodal AI Landscape - Explorer🔍

Navigate with the Interactive Explorer.

Click the image to explore.

This repository analyses arXiv preprints from 2019 to 2025 to reveal emerging trends in multimodal AI research. The raw data were sourced from Kaggle arXiv Dataset, and filtered using targeted search queries.

As arXiv metadata are subject to retrospective updates, counts may vary between dataset snapshots. In addition, the query process is inherently approximate. The reported numbers should therefore be interpreted as indicators of overall trends rather than exact totals.

The paper version is archived as the v2025 release.

Data Filtering

1. Identifying Multimodal AI and Specific Modalities

We first identified relevant AI preprints by searching titles and abstracts for common AI terms, further refining the search with the multimodal terms. We then categorised these preprints by performing targeted queries for specific modalities. The search terms and queries used are detailed in the query table below.

2. Query Table

Terms	Queries
AI	"AI", "A.I.", "artificial intelligence", "machine learning", "deep learning", "neural network"
Multimodal	"multimodal", "multi-modal"
Vision	"vision", "image", "video", "visual"
Language	"text", "language", "textual"
Time series	"time series", "temporal"
Graph	"graph", "relational"
Audio	"audio", "acoustic", "speech", "sound", "voice", "phonetic", "music"
Spatial	"spatial", "geospatial", "geographic", "GIS"
Sensor	"sensor", "IoT", "sensory", "wearable", "RFID", "LiDAR", "radar", "Internet of Things"
Tabular	"tabular", "structured", "spreadsheet", "table", "categorical"

Data Files

The data folder contains the raw data, as follows:

overall-preprint-counts.csv: Counts of multimodal AI preprints.
preprint-counts-by-combined-modality-number.csv: Counts of preprints for different numbers of modalities combined.
preprint-counts-by-modality.csv: Counts of individual modalities.
modality-combination-breakdown.csv: Counts of pairwise, triple, and quadruple modality combinations.
modality-pairs-YYYY.csv: Counts of preprints by modality pair for a given year (e.g., 2024, 2025).
other-modality-combinations-by-year.csv: Counts for preprints using less common modality combinations.

Citation

@article{liu2025towards,
  title={Towards deployment-centric multimodal AI beyond vision and language},
  author={Liu, Xianyuan and Zhang, Jiayang and Zhou, Shuo and van der Plas, Thijs L. and Vijayaraghavan, Avish and Grishina, Anastasiia and Zhuang, Mengdie and Schofield, Daniel and Tomlinson, Christopher and others},
  journal={Nature Machine Intelligence},
  volume={7},
  pages={1612--1624},
  year={2025},
  doi={10.1038/s42256-025-01116-5}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
data		data
scripts		scripts
LICENSE		LICENSE
README.md		README.md
explorer-preview.png		explorer-preview.png
index.html		index.html
og-image.png		og-image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal AI Landscape - Explorer🔍

Data Filtering

1. Identifying Multimodal AI and Specific Modalities

2. Query Table

Data Files

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multimodal AI Landscape - Explorer🔍

Data Filtering

1. Identifying Multimodal AI and Specific Modalities

2. Query Table

Data Files

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages