edgeFlow.js

Browser ML inference framework with task scheduling and smart caching.

Documentation · Examples · API Reference · English | 中文

✨ Features

📋 Task Scheduler - Priority queue, concurrency control, task cancellation
🔄 Batch Processing - Efficient batch inference out of the box
💾 Memory Management - Automatic memory tracking and cleanup with scopes
📥 Smart Model Loading - Preloading, sharding, resume download support
💿 Offline Caching - IndexedDB-based model caching for offline use
⚡ Multi-Backend - ONNX Runtime with WebGPU/WASM execution providers, automatic fallback
🤗 HuggingFace Hub - Direct model download with one line
🔤 Real Tokenizers - BPE & WordPiece tokenizers, load tokenizer.json directly
👷 Web Worker Support - Run inference in background threads
📦 Batteries Included - ONNX Runtime bundled, zero configuration needed
🎯 TypeScript First - Full type support with intuitive APIs

📦 Installation

npm install edgeflowjs

yarn add edgeflowjs

pnpm add edgeflowjs

Note: ONNX Runtime is included as a dependency. No additional setup required.

🚀 Quick Start

Try the Demo

Run the interactive demo locally to test all features:

# Clone and install
git clone https://github.com/user/edgeflow.js.git
cd edgeflow.js
npm install

# Build and start demo server
npm run demo

Open http://localhost:3000 in your browser:

Load Model - Enter a Hugging Face ONNX model URL and click "Load Model"

https://huggingface.co/Xenova/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/onnx/model_quantized.onnx

Test Features:
- 🧮 Tensor Operations - Test tensor creation, math ops, softmax, relu
- 📝 Text Classification - Run sentiment analysis on text
- 🔍 Feature Extraction - Extract embeddings from text
- ⚡ Task Scheduling - Test priority-based scheduling
- 📋 Task Scheduler - Test priority-based task scheduling
- 💾 Memory Management - Test allocation and cleanup

Basic Usage

import { pipeline } from 'edgeflowjs';

// Create a sentiment analysis pipeline
const sentiment = await pipeline('sentiment-analysis');

// Run inference
const result = await sentiment.run('I love this product!');
console.log(result);
// { label: 'positive', score: 0.98, processingTime: 12.5 }

Batch Processing

// Native batch processing support
const results = await sentiment.run([
  'This is amazing!',
  'This is terrible.',
  'It\'s okay I guess.'
]);

console.log(results);
// [
//   { label: 'positive', score: 0.95 },
//   { label: 'negative', score: 0.92 },
//   { label: 'neutral', score: 0.68 }
// ]

Multiple Pipelines

import { pipeline } from 'edgeflowjs';

// Create multiple pipelines
const classifier = await pipeline('text-classification');
const extractor = await pipeline('feature-extraction');

// Run in parallel with Promise.all
const [classification, features] = await Promise.all([
  classifier.run('Sample text'),
  extractor.run('Sample text')
]);

Image Classification

import { pipeline } from 'edgeflowjs';

const classifier = await pipeline('image-classification');

// From URL
const result = await classifier.run('https://example.com/image.jpg');

// From HTMLImageElement
const img = document.getElementById('myImage');
const result = await classifier.run(img);

// Batch
const results = await classifier.run([img1, img2, img3]);

Text Generation (Streaming)

import { pipeline } from 'edgeflowjs';

const generator = await pipeline('text-generation');

// Simple generation
const result = await generator.run('Once upon a time', {
  maxNewTokens: 50,
  temperature: 0.8,
});
console.log(result.generatedText);

// Streaming output
for await (const event of generator.stream('Hello, ')) {
  process.stdout.write(event.token);
  if (event.done) break;
}

Zero-shot Classification

import { pipeline } from 'edgeflowjs';

const classifier = await pipeline('zero-shot-classification');

const result = await classifier.classify(
  'I love playing soccer on weekends',
  ['sports', 'politics', 'technology', 'entertainment']
);

console.log(result.labels[0], result.scores[0]);
// 'sports', 0.92

Question Answering

import { pipeline } from 'edgeflowjs';

const qa = await pipeline('question-answering');

const result = await qa.run({
  question: 'What is the capital of France?',
  context: 'Paris is the capital and largest city of France.'
});

console.log(result.answer); // 'Paris'

Load from HuggingFace Hub

import { fromHub, fromTask } from 'edgeflowjs';

// Load by model ID (auto-downloads model, tokenizer, config)
const bundle = await fromHub('Xenova/distilbert-base-uncased-finetuned-sst-2-english');
console.log(bundle.tokenizer); // Tokenizer instance
console.log(bundle.config);    // Model config

// Load by task name (uses recommended model)
const sentimentBundle = await fromTask('sentiment-analysis');

Web Workers (Background Inference)

import { runInWorker, WorkerPool, isWorkerSupported } from 'edgeflowjs';

// Simple: run inference in background thread
if (isWorkerSupported()) {
  const outputs = await runInWorker(modelUrl, inputs);
}

// Advanced: use worker pool for parallel processing
const pool = new WorkerPool({ numWorkers: 4 });
await pool.init();

const modelId = await pool.loadModel(modelUrl);
const results = await pool.runBatch(modelId, batchInputs);

pool.terminate();

🎯 Supported Tasks

Task	Pipeline	Status
Text Generation	`text-generation`	✅ Production (TinyLlama, streaming, KV cache)
Image Segmentation	`image-segmentation`	✅ Production (SlimSAM, interactive prompts)
Text Classification	`text-classification`	⚠️ Experimental (heuristic, provide own model)
Sentiment Analysis	`sentiment-analysis`	⚠️ Experimental (heuristic, provide own model)
Feature Extraction	`feature-extraction`	⚠️ Experimental (mock embeddings, provide own model)
Image Classification	`image-classification`	⚠️ Experimental (heuristic, provide own model)
Object Detection	`object-detection`	⚠️ Experimental (real NMS/IoU, needs own model)
Speech Recognition	`automatic-speech-recognition`	⚠️ Experimental (preprocessing only, needs model)
Zero-shot Classification	`zero-shot-classification`	⚠️ Experimental (random scoring, needs NLI model)
Question Answering	`question-answering`	⚠️ Experimental (word overlap heuristic, needs model)

Note: Experimental pipelines work for demos and testing the API surface. For production accuracy, provide a real ONNX model via options.model or use the transformers.js adapter backend to leverage HuggingFace's model ecosystem.

⚡ Key Differentiators

edgeFlow.js is not a replacement for transformers.js — it is a production orchestration layer that can wrap any inference engine (including transformers.js) and add the features real apps need.

What edgeFlow.js adds on top of inference engines

Feature	Inference engines alone	With edgeFlow.js
Task Scheduling	None — run and hope	Priority queue with concurrency limits
Task Cancellation	Not possible	Cancel pending/queued tasks
Batch Processing	Manual	Built-in batching with configurable size
Memory Management	Manual cleanup	Automatic scopes, leak detection, GC hints
Model Preloading	Manual	Background preloading with priority queue
Resume Download	Start over on failure	Chunked download with automatic resume
Model Caching	Basic or none	IndexedDB cache with stats and eviction
Pipeline Composition	Not available	Chain multiple models (ASR → translate → TTS)
Device Adaptation	Manual model selection	Auto-select model variant by device capability
Performance Monitoring	External tooling needed	Built-in dashboard and alerting

🔌 transformers.js Adapter (Recommended)

Use edgeFlow.js as an orchestration layer on top of transformers.js to get access to 1000+ HuggingFace models with scheduling, caching, and memory management:

import { pipeline as tfPipeline } from '@xenova/transformers';
import { useTransformersBackend, pipeline } from 'edgeflowjs';

// Register transformers.js as the inference backend
useTransformersBackend({
  pipelineFactory: tfPipeline,
  device: 'webgpu',    // GPU acceleration
  dtype: 'fp16',       // Half precision
});

// Use edgeFlow.js API — scheduling, caching, memory management included
const classifier = await pipeline('text-classification', {
  model: 'Xenova/distilbert-base-uncased-finetuned-sst-2-english',
});

const result = await classifier.run('I love this product!');

Why? transformers.js is excellent at loading and running single models. edgeFlow.js adds the production features you need when running multiple models, managing memory on constrained devices, caching for offline use, and scheduling concurrent inference.

🔧 Configuration

Runtime Selection

import { pipeline } from 'edgeflowjs';

// Automatic (recommended)
const model = await pipeline('text-classification');

// Specify runtime
const model = await pipeline('text-classification', {
  runtime: 'webgpu' // or 'webnn', 'wasm', 'auto'
});

Memory Management

import { pipeline, getMemoryStats, gc } from 'edgeflowjs';

const model = await pipeline('text-classification');

// Use the model
await model.run('text');

// Check memory usage
console.log(getMemoryStats());
// { allocated: 50MB, used: 45MB, peak: 52MB, tensorCount: 12 }

// Explicit cleanup
model.dispose();

// Force garbage collection
gc();

Scheduler Configuration

import { configureScheduler } from 'edgeflowjs';

configureScheduler({
  maxConcurrentTasks: 4,
  maxConcurrentPerModel: 1,
  defaultTimeout: 30000,
  enableBatching: true,
  maxBatchSize: 32,
});

Caching

import { pipeline, Cache } from 'edgeflowjs';

// Create a cache
const cache = new Cache({
  strategy: 'lru',
  maxSize: 100 * 1024 * 1024, // 100MB
  persistent: true, // Use IndexedDB
});

const model = await pipeline('text-classification', {
  cache: true
});

🛠️ Advanced Usage

Custom Model Loading

import { loadModel, runInference } from 'edgeflowjs';

// Load from URL with caching, sharding, and resume support
const model = await loadModel('https://example.com/model.bin', {
  runtime: 'webgpu',
  quantization: 'int8',
  cache: true,           // Enable IndexedDB caching (default: true)
  resumable: true,       // Enable resume download (default: true)
  chunkSize: 5 * 1024 * 1024, // 5MB chunks for large models
  onProgress: (progress) => console.log(`Loading: ${progress * 100}%`)
});

// Run inference
const outputs = await runInference(model, inputs);

// Cleanup
model.dispose();

Preloading Models

import { preloadModel, preloadModels, getPreloadStatus } from 'edgeflowjs';

// Preload a single model in background (with priority)
preloadModel('https://example.com/model1.onnx', { priority: 10 });

// Preload multiple models
preloadModels([
  { url: 'https://example.com/model1.onnx', priority: 10 },
  { url: 'https://example.com/model2.onnx', priority: 5 },
]);

// Check preload status
const status = getPreloadStatus('https://example.com/model1.onnx');
// 'pending' | 'loading' | 'complete' | 'error' | 'not_found'

Model Caching

import { 
  isModelCached, 
  getCachedModel, 
  deleteCachedModel, 
  clearModelCache,
  getModelCacheStats 
} from 'edgeflowjs';

// Check if model is cached
if (await isModelCached('https://example.com/model.onnx')) {
  console.log('Model is cached!');
}

// Get cached model data directly
const modelData = await getCachedModel('https://example.com/model.onnx');

// Delete a specific cached model
await deleteCachedModel('https://example.com/model.onnx');

// Clear all cached models
await clearModelCache();

// Get cache statistics
const stats = await getModelCacheStats();
console.log(`${stats.models} models cached, ${stats.totalSize} bytes total`);

Resume Downloads

Large model downloads automatically support resuming from where they left off:

import { loadModelData } from 'edgeflowjs';

// Download with progress and resume support
const modelData = await loadModelData('https://example.com/large-model.onnx', {
  resumable: true,
  chunkSize: 10 * 1024 * 1024, // 10MB chunks
  parallelConnections: 4,      // Download 4 chunks in parallel
  onProgress: (progress) => {
    console.log(`${progress.percent.toFixed(1)}% downloaded`);
    console.log(`Speed: ${(progress.speed / 1024 / 1024).toFixed(2)} MB/s`);
    console.log(`ETA: ${(progress.eta / 1000).toFixed(0)}s`);
    console.log(`Chunk ${progress.currentChunk}/${progress.totalChunks}`);
  }
});

Model Quantization

import { quantize } from 'edgeflowjs/tools';

const quantized = await quantize(model, {
  method: 'int8',
  calibrationData: samples,
});

console.log(`Compression: ${quantized.compressionRatio}x`);
// Compression: 3.8x

Benchmarking

import { benchmark } from 'edgeflowjs/tools';

const result = await benchmark(
  () => model.run('sample text'),
  { warmupRuns: 5, runs: 100 }
);

console.log(result);
// {
//   avgTime: 12.5,
//   minTime: 10.2,
//   maxTime: 18.3,
//   throughput: 80 // inferences/sec
// }

Memory Scope

import { withMemoryScope, tensor } from 'edgeflowjs';

const result = await withMemoryScope(async (scope) => {
  // Tensors tracked in scope
  const a = scope.track(tensor([1, 2, 3]));
  const b = scope.track(tensor([4, 5, 6]));
  
  // Process...
  const output = process(a, b);
  
  // Keep result, dispose others
  return scope.keep(output);
});
// a and b automatically disposed

🔌 Tensor Operations

import { tensor, zeros, ones, matmul, softmax, relu } from 'edgeflowjs';

// Create tensors
const a = tensor([[1, 2], [3, 4]]);
const b = zeros([2, 2]);
const c = ones([2, 2]);

// Operations
const d = matmul(a, c);
const probs = softmax(d);
const activated = relu(d);

// Cleanup
a.dispose();
b.dispose();
c.dispose();

🌐 Browser Support

Browser	WebGPU	WebNN	WASM
Chrome 113+	✅	✅	✅
Edge 113+	✅	✅	✅
Firefox 118+	⚠️ Flag	❌	✅
Safari 17+	⚠️ Preview	❌	✅

📖 API Reference

Core

pipeline(task, options?) - Create a pipeline for a task
loadModel(url, options?) - Load a model from URL
runInference(model, inputs) - Run model inference
getScheduler() - Get the global scheduler
getMemoryManager() - Get the memory manager
runInWorker(url, inputs) - Run inference in a Web Worker
WorkerPool - Manage multiple workers for parallel inference

Pipelines

TextClassificationPipeline - Text/sentiment classification
SentimentAnalysisPipeline - Sentiment analysis
FeatureExtractionPipeline - Text embeddings
ImageClassificationPipeline - Image classification
TextGenerationPipeline - Text generation with streaming
ObjectDetectionPipeline - Object detection with bounding boxes
AutomaticSpeechRecognitionPipeline - Speech to text
ZeroShotClassificationPipeline - Classify without training
QuestionAnsweringPipeline - Extractive QA

HuggingFace Hub

fromHub(modelId, options?) - Load model bundle from HuggingFace
fromTask(task, options?) - Load recommended model for task
downloadTokenizer(modelId) - Download tokenizer only
downloadConfig(modelId) - Download config only
POPULAR_MODELS - Registry of popular models by task

Utilities

Tokenizer - BPE/WordPiece tokenization with HuggingFace support
ImagePreprocessor - Image preprocessing with HuggingFace config support
AudioPreprocessor - Audio preprocessing for Whisper/wav2vec
Cache - LRU caching utilities

Tools

quantize(model, options) - Quantize a model
prune(model, options) - Prune model weights
benchmark(fn, options) - Benchmark inference
analyzeModel(model) - Analyze model structure

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

Get Started · API Docs · Examples

Made with ❤️ for the edge AI community

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
benchmarks		benchmarks
demo		demo
dist		dist
docs		docs
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
README_CN.md		README_CN.md
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
tsconfig.json		tsconfig.json
vercel.json		vercel.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

edgeFlow.js

✨ Features

📦 Installation

🚀 Quick Start

Try the Demo

Basic Usage

Batch Processing

Multiple Pipelines

Image Classification

Text Generation (Streaming)

Zero-shot Classification

Question Answering

Load from HuggingFace Hub

Web Workers (Background Inference)

🎯 Supported Tasks

⚡ Key Differentiators

What edgeFlow.js adds on top of inference engines

🔌 transformers.js Adapter (Recommended)

🔧 Configuration

Runtime Selection

Memory Management

Scheduler Configuration

Caching

🛠️ Advanced Usage

Custom Model Loading

Preloading Models

Model Caching

Resume Downloads

Model Quantization

Benchmarking

Memory Scope

🔌 Tensor Operations

🌐 Browser Support

📖 API Reference

Core

Pipelines

HuggingFace Hub

Utilities

Tools

🤝 Contributing

📄 License

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages