Skip to content

multimodalAI/multimodal-ai-landscape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal AI Landscape - Explorer🔍

DOI:10.1038/s42256-025-01116-5 arXiv Last updated

Navigate with the Interactive Explorer.

Click the image to explore.

This repository analyses arXiv preprints from 2019 to 2025 to reveal emerging trends in multimodal AI research. The raw data were sourced from Kaggle arXiv Dataset, and filtered using targeted search queries.

As arXiv metadata are subject to retrospective updates, counts may vary between dataset snapshots. In addition, the query process is inherently approximate. The reported numbers should therefore be interpreted as indicators of overall trends rather than exact totals.

The paper version is archived as the v2025 release.

Data Filtering

1. Identifying Multimodal AI and Specific Modalities

We first identified relevant AI preprints by searching titles and abstracts for common AI terms, further refining the search with the multimodal terms. We then categorised these preprints by performing targeted queries for specific modalities. The search terms and queries used are detailed in the query table below.

2. Query Table

Terms Queries
AI "AI", "A.I.", "artificial intelligence", "machine learning", "deep learning", "neural network"
Multimodal "multimodal", "multi-modal"
Vision "vision", "image", "video", "visual"
Language "text", "language", "textual"
Time series "time series", "temporal"
Graph "graph", "relational"
Audio "audio", "acoustic", "speech", "sound", "voice", "phonetic", "music"
Spatial "spatial", "geospatial", "geographic", "GIS"
Sensor "sensor", "IoT", "sensory", "wearable", "RFID", "LiDAR", "radar", "Internet of Things"
Tabular "tabular", "structured", "spreadsheet", "table", "categorical"

Data Files

The data folder contains the raw data, as follows:

Citation

@article{liu2025towards,
  title={Towards deployment-centric multimodal AI beyond vision and language},
  author={Liu, Xianyuan and Zhang, Jiayang and Zhou, Shuo and van der Plas, Thijs L. and Vijayaraghavan, Avish and Grishina, Anastasiia and Zhuang, Mengdie and Schofield, Daniel and Tomlinson, Christopher and others},
  journal={Nature Machine Intelligence},
  volume={7},
  pages={1612--1624},
  year={2025},
  doi={10.1038/s42256-025-01116-5}
}

About

Data and analyses for multimodal AI research trend

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors