The Echo Storyteller: Interactive AI Audio Experiences

The Echo Storyteller is a reference implementation for building immersive, low-latency AI voice applications on the web. It demonstrates how to combine advanced Generative AI models with real-time streaming audio to create a fluid "Choose Your Own Adventure" experience.

🎯 Objective

This project highlights the power of the Google Cloud AI stack for building next-generation web experiences:

Interactive Storytelling: Uses Gemini 3 Pro (Preview) to generate creative narratives that adapt to user choices.
Real-Time Voice: Uses Google Cloud TTS (Gemini Voices) with StreamingSynthesize to speak the story as it is being written, with near-instant latency.
- Visual Context: Uses Gemini 3 Pro Image to generate cinematic illustrations for every chapter on the fly.
- Adaptive UI: Features a responsive layout that transitions between a linear mobile feed and a side-by-side "Book & Illustration" desktop view.
- True Web Streaming: Demonstrates a robust WebSocket + Web Audio API architecture that bypasses standard browser media limitations for gapless, low-latency PCM streaming.

🎮 How to Use

Start a Story:
- Open the app and select a Voice (e.g., Puck, Zephyr) and TTS Model (Flash, Lite, Pro).
- Type a topic (e.g., "A cyberpunk detective finding a lost cat") or click the Refresh button to get AI-generated ideas.
- Click Go (Auto-Awesome).
Listen & Watch:
- The story begins immediately. Text streams in, audio plays in sync, and a unique illustration fades in.
- The app handles "Infinite Scrolling" so you can read back through previous chapters.
Choose Your Path:
- At the end of a chapter, the AI suggests 3 "What happens next?" options.
- Click one to continue the story seamlessly, or type your own custom action.
- The story context is preserved, creating a coherent multi-chapter narrative.
Reset:
- Click the "End Story" chip to clear the context and start a fresh adventure.

🚀 Tech Stack Highlights

Frontend: Flutter Web (WASM ready).
- Audio Engine: Custom PcmPlayer using dart:js_interop and the Web Audio API (AudioContext) for raw PCM playback. Standard audio players cannot handle this low-latency stream.
- State: "Rolling Summary" context management for infinite story depth.
Backend: Go (Golang) 1.25+.
- Orchestration: A Producer-Consumer concurrent pipeline handles Text Generation, Image Generation, and Audio Synthesis in parallel to minimize TTFB (Time To First Byte).
- Gemini 3 Pro: Powering the core narrative and image generation.
- Gemini 2.5 Flash: Powering the high-speed summarization and option generation.
- Quantized Streaming: Implements a robust re-connection strategy for Gemini TTS to bypass server-side context limits while maintaining a continuous stream.

🛠️ The Architecture

The Problem

Flutter's standard audio packages (just_audio, audioplayers) rely on the browser's <audio> tag or Media Source Extensions (MSE).

MSE requires valid container headers (MP4/WebM). Google TTS streams raw frames or Ogg pages that often fail MSE validation in Chrome/Safari.
Standard Playback (HTTP) requires a valid file structure.
Raw PCM cannot be played by <audio> tags directly.

The Solution

Project Echo bypasses the browser's media demuxer entirely by using the Web Audio API.

Backend (Go):
- Receives Text Topic.
- Producer: Generates Story (Gemini 3 Pro) & Image (Gemini 3 Image) concurrently.
- Consumer: buffers sentences and calls tts.StreamingSynthesize (LINEAR16) for each sentence to ensure stable prosody.
- Forwards raw AudioContent bytes to WebSocket.
Frontend (Flutter):
- Receives Uint8List chunks.
- Converts Int16 (PCM) bytes to Float32 audio data.
- Schedules AudioBuffer playback precisely using AudioContext.currentTime.

🚦 Quick Start

1. Prerequisites

Go 1.25+
Flutter 3.x
Google Cloud Project with Billing enabled.
gcloud CLI installed and configured.

2. Infrastructure Setup

Use the provided script to enable APIs and create a dedicated Service Account:

./setup_sa.sh

This will create a Service Account with Vertex AI User and Logging Writer roles.

3. Configuration

Create a .env file in the root directory:

GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1

4. Run Locally

./dev.sh

This script builds the Flutter web app and starts the Go server on port 8080.

Open http://localhost:8080.
Click the Play icon (Initializes Audio Context).
Type a topic and hit Send.

5. Deploy to Cloud Run

./deploy.sh

Make sure to uncomment the Service Account line in deploy.sh (or rely on the script's auto-detection) to use the secure identity created in step 2.

📂 Project Structure

backend/: Go server implementation.
frontend/: Flutter application.
- lib/audio/pcm_player.dart: Core Logic. The custom Web Audio API player.
docs/: Detailed architectural findings and decision logs.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.beads		.beads
backend		backend
docs		docs
frontend		frontend
.gcloudignore		.gcloudignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
GEMINI.md		GEMINI.md
LICENSE		LICENSE
README.md		README.md
deploy.sh		deploy.sh
dev.sh		dev.sh
setup_sa.sh		setup_sa.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Echo Storyteller: Interactive AI Audio Experiences

🎯 Objective

🎮 How to Use

🚀 Tech Stack Highlights

🛠️ The Architecture

The Problem

The Solution

🚦 Quick Start

1. Prerequisites

2. Infrastructure Setup

3. Configuration

4. Run Locally

5. Deploy to Cloud Run

📂 Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The Echo Storyteller: Interactive AI Audio Experiences

🎯 Objective

🎮 How to Use

🚀 Tech Stack Highlights

🛠️ The Architecture

The Problem

The Solution

🚦 Quick Start

1. Prerequisites

2. Infrastructure Setup

3. Configuration

4. Run Locally

5. Deploy to Cloud Run

📂 Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages