Skip to content

Azure/realtimevideogen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

πŸ“„πŸ”‰ Real-Time Video Generation πŸ“½οΈπŸ–ΌοΈ

Modular, adaptive serving stack for real-time multi-modal generation (e.g., video, audio, images). It dynamically balances latency, cost, and quality, and supports both streaming generation (real-time playback) and offline workloads.

It uses a cluster manager called StreamWise. We have implemented multiple applications that run on top of StreamWise. For example, StreamCast is an application that generates real-time video podcasts from input documents (e.g., PDFs).


Important

This project focuses on systems research β€” specifically the infrastructure, scheduling, provisioning, and serving aspects of multi-modal generation workloads. The application workloads are used to stress-test and evaluate the system, not to assess or guarantee the quality of the generated content. Outputs may not be inconsistent, contain visual artifacts, or otherwise degraded β€” this is irrelevant to the research goals. This project is not designed for production purposes.


πŸš€ Features

  • Model on-boarding for 25+ multi-modal models (video, audio, image, LLMs)
  • Provisioning of GPUs, replicas, and model variants
  • Deadline-aware request scheduler for streaming workloads
  • Adaptive quality (resolution, FPS, sampling steps)
  • Multi-GPU + cross-region support
  • Spot-aware optimization to reduce cost
  • Caching, batching, and GPU frequency scaling

πŸ— Architecture

It consists of:

  • Model on-boarding: packaging and standardizing multi-modal models
  • Provisioning: selecting hardware, GPUs, and model replicas
  • Scheduling: orchestrating requests under latency constraints
  • Execution: running requests efficiently inside a model instance

Architecture

πŸ“¦ Model wrapper and on-boarding

We package each model as a Docker container, based on an NVIDIA image with GPU drivers and runtime tools. Each container embeds our Instance Manager, which standardizes the interface for executing inference requests. We adapt existing inference code (typically from Hugging Face) to this interface and bundle it with the model weights. A Python wrapper exposes an HTTP endpoint for existing multi-modal generation models (e.g., Flux or Wan). It allows triggering multi-modal generations (e.g., video from text) and collect statistics. The manager also handles request batching and adjusts GPU frequencies to optimize resource usage.

For the complete list of wrapped models with full details and classification, see Model Wrapper documentation.

The characteristics for each model are in (services.json). These characteristics include quality (Elo ranking), frame rate (FPS), maximum number of frames (video length), number of attention heads, VAE compression ratios, supported resolutions, and other relevant attributes. More details here.

βš™οΈ Provisioning hardware and models

We frame hardware and model selection for a workload (e.g., a 10-minute medium-quality video podcast) as an optimization problem. After selecting a configuration, the hardware and model provisioners handle setup accordingly. More details here.

πŸ“… Request scheduler

The request scheduler orchestrates execution using a live, iterative version of our greedy algorithm informed by the request DAG. More details here.

βš™οΈ Applications

We implemented multiple workflows for multi-modal generation. More details here.

πŸš€ Deployment

We build StreamWise on top of a Kubernetes (K8s) cluster: a widely adopted cluster manager that enables modular deployment, auto-scaling, service discovery, and fault tolerance. More details here.

☸️ Kubernetes

Our Docker containers to run on K8s.

☁️ Azure Kubernetes Service (AKS)

To deploy on Azure Kubernetes Service (AKS) follow the instructions here.

πŸ“„ Citation

If you use StreamWise in research, please cite:

@article{streamwise2026,
  title={{StreamWise: Adaptive Serving for Real-Time Multi-Modal Generation}},
  author={Haoran Qiu, Gohar Irfan Chaudry, Chaojie Zhang, Íñigo Goiri, Esha Choukse, Rodrigo Fonseca, Ricardo Bianchini},
  journal = {arXiv:2603.05800},
  year={2026}
}

🀝 Contributing

Pull requests are welcome! Please open an issue for major changes. More details here.

πŸ“œ License

MIT License.

About

Real time video generation

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors