Browser ML inference framework with task scheduling and smart caching.
Documentation · Examples · API Reference · English | 中文
- 📋 Task Scheduler - Priority queue, concurrency control, task cancellation
- 🔄 Batch Processing - Efficient batch inference out of the box
- 💾 Memory Management - Automatic memory tracking and cleanup with scopes
- 📥 Smart Model Loading - Preloading, sharding, resume download support
- 💿 Offline Caching - IndexedDB-based model caching for offline use
- ⚡ Multi-Backend - ONNX Runtime with WebGPU/WASM execution providers, automatic fallback
- 🤗 HuggingFace Hub - Direct model download with one line
- 🔤 Real Tokenizers - BPE & WordPiece tokenizers, load tokenizer.json directly
- 👷 Web Worker Support - Run inference in background threads
- 📦 Batteries Included - ONNX Runtime bundled, zero configuration needed
- 🎯 TypeScript First - Full type support with intuitive APIs
npm install edgeflowjsyarn add edgeflowjspnpm add edgeflowjsNote: ONNX Runtime is included as a dependency. No additional setup required.
Run the interactive demo locally to test all features:
# Clone and install
git clone https://github.com/user/edgeflow.js.git
cd edgeflow.js
npm install
# Build and start demo server
npm run demoOpen http://localhost:3000 in your browser:
-
Load Model - Enter a Hugging Face ONNX model URL and click "Load Model"
https://huggingface.co/Xenova/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/onnx/model_quantized.onnx -
Test Features:
- 🧮 Tensor Operations - Test tensor creation, math ops, softmax, relu
- 📝 Text Classification - Run sentiment analysis on text
- 🔍 Feature Extraction - Extract embeddings from text
- ⚡ Task Scheduling - Test priority-based scheduling
- 📋 Task Scheduler - Test priority-based task scheduling
- 💾 Memory Management - Test allocation and cleanup
import { pipeline } from 'edgeflowjs';
// Create a sentiment analysis pipeline
const sentiment = await pipeline('sentiment-analysis');
// Run inference
const result = await sentiment.run('I love this product!');
console.log(result);
// { label: 'positive', score: 0.98, processingTime: 12.5 }// Native batch processing support
const results = await sentiment.run([
'This is amazing!',
'This is terrible.',
'It\'s okay I guess.'
]);
console.log(results);
// [
// { label: 'positive', score: 0.95 },
// { label: 'negative', score: 0.92 },
// { label: 'neutral', score: 0.68 }
// ]import { pipeline } from 'edgeflowjs';
// Create multiple pipelines
const classifier = await pipeline('text-classification');
const extractor = await pipeline('feature-extraction');
// Run in parallel with Promise.all
const [classification, features] = await Promise.all([
classifier.run('Sample text'),
extractor.run('Sample text')
]);import { pipeline } from 'edgeflowjs';
const classifier = await pipeline('image-classification');
// From URL
const result = await classifier.run('https://example.com/image.jpg');
// From HTMLImageElement
const img = document.getElementById('myImage');
const result = await classifier.run(img);
// Batch
const results = await classifier.run([img1, img2, img3]);import { pipeline } from 'edgeflowjs';
const generator = await pipeline('text-generation');
// Simple generation
const result = await generator.run('Once upon a time', {
maxNewTokens: 50,
temperature: 0.8,
});
console.log(result.generatedText);
// Streaming output
for await (const event of generator.stream('Hello, ')) {
process.stdout.write(event.token);
if (event.done) break;
}import { pipeline } from 'edgeflowjs';
const classifier = await pipeline('zero-shot-classification');
const result = await classifier.classify(
'I love playing soccer on weekends',
['sports', 'politics', 'technology', 'entertainment']
);
console.log(result.labels[0], result.scores[0]);
// 'sports', 0.92import { pipeline } from 'edgeflowjs';
const qa = await pipeline('question-answering');
const result = await qa.run({
question: 'What is the capital of France?',
context: 'Paris is the capital and largest city of France.'
});
console.log(result.answer); // 'Paris'import { fromHub, fromTask } from 'edgeflowjs';
// Load by model ID (auto-downloads model, tokenizer, config)
const bundle = await fromHub('Xenova/distilbert-base-uncased-finetuned-sst-2-english');
console.log(bundle.tokenizer); // Tokenizer instance
console.log(bundle.config); // Model config
// Load by task name (uses recommended model)
const sentimentBundle = await fromTask('sentiment-analysis');import { runInWorker, WorkerPool, isWorkerSupported } from 'edgeflowjs';
// Simple: run inference in background thread
if (isWorkerSupported()) {
const outputs = await runInWorker(modelUrl, inputs);
}
// Advanced: use worker pool for parallel processing
const pool = new WorkerPool({ numWorkers: 4 });
await pool.init();
const modelId = await pool.loadModel(modelUrl);
const results = await pool.runBatch(modelId, batchInputs);
pool.terminate();| Task | Pipeline | Status |
|---|---|---|
| Text Generation | text-generation |
✅ Production (TinyLlama, streaming, KV cache) |
| Image Segmentation | image-segmentation |
✅ Production (SlimSAM, interactive prompts) |
| Text Classification | text-classification |
|
| Sentiment Analysis | sentiment-analysis |
|
| Feature Extraction | feature-extraction |
|
| Image Classification | image-classification |
|
| Object Detection | object-detection |
|
| Speech Recognition | automatic-speech-recognition |
|
| Zero-shot Classification | zero-shot-classification |
|
| Question Answering | question-answering |
Note: Experimental pipelines work for demos and testing the API surface. For production accuracy, provide a real ONNX model via
options.modelor use the transformers.js adapter backend to leverage HuggingFace's model ecosystem.
edgeFlow.js is not a replacement for transformers.js — it is a production orchestration layer that can wrap any inference engine (including transformers.js) and add the features real apps need.
| Feature | Inference engines alone | With edgeFlow.js |
|---|---|---|
| Task Scheduling | None — run and hope | Priority queue with concurrency limits |
| Task Cancellation | Not possible | Cancel pending/queued tasks |
| Batch Processing | Manual | Built-in batching with configurable size |
| Memory Management | Manual cleanup | Automatic scopes, leak detection, GC hints |
| Model Preloading | Manual | Background preloading with priority queue |
| Resume Download | Start over on failure | Chunked download with automatic resume |
| Model Caching | Basic or none | IndexedDB cache with stats and eviction |
| Pipeline Composition | Not available | Chain multiple models (ASR → translate → TTS) |
| Device Adaptation | Manual model selection | Auto-select model variant by device capability |
| Performance Monitoring | External tooling needed | Built-in dashboard and alerting |
Use edgeFlow.js as an orchestration layer on top of transformers.js to get access to 1000+ HuggingFace models with scheduling, caching, and memory management:
import { pipeline as tfPipeline } from '@xenova/transformers';
import { useTransformersBackend, pipeline } from 'edgeflowjs';
// Register transformers.js as the inference backend
useTransformersBackend({
pipelineFactory: tfPipeline,
device: 'webgpu', // GPU acceleration
dtype: 'fp16', // Half precision
});
// Use edgeFlow.js API — scheduling, caching, memory management included
const classifier = await pipeline('text-classification', {
model: 'Xenova/distilbert-base-uncased-finetuned-sst-2-english',
});
const result = await classifier.run('I love this product!');Why? transformers.js is excellent at loading and running single models. edgeFlow.js adds the production features you need when running multiple models, managing memory on constrained devices, caching for offline use, and scheduling concurrent inference.
import { pipeline } from 'edgeflowjs';
// Automatic (recommended)
const model = await pipeline('text-classification');
// Specify runtime
const model = await pipeline('text-classification', {
runtime: 'webgpu' // or 'webnn', 'wasm', 'auto'
});import { pipeline, getMemoryStats, gc } from 'edgeflowjs';
const model = await pipeline('text-classification');
// Use the model
await model.run('text');
// Check memory usage
console.log(getMemoryStats());
// { allocated: 50MB, used: 45MB, peak: 52MB, tensorCount: 12 }
// Explicit cleanup
model.dispose();
// Force garbage collection
gc();import { configureScheduler } from 'edgeflowjs';
configureScheduler({
maxConcurrentTasks: 4,
maxConcurrentPerModel: 1,
defaultTimeout: 30000,
enableBatching: true,
maxBatchSize: 32,
});import { pipeline, Cache } from 'edgeflowjs';
// Create a cache
const cache = new Cache({
strategy: 'lru',
maxSize: 100 * 1024 * 1024, // 100MB
persistent: true, // Use IndexedDB
});
const model = await pipeline('text-classification', {
cache: true
});import { loadModel, runInference } from 'edgeflowjs';
// Load from URL with caching, sharding, and resume support
const model = await loadModel('https://example.com/model.bin', {
runtime: 'webgpu',
quantization: 'int8',
cache: true, // Enable IndexedDB caching (default: true)
resumable: true, // Enable resume download (default: true)
chunkSize: 5 * 1024 * 1024, // 5MB chunks for large models
onProgress: (progress) => console.log(`Loading: ${progress * 100}%`)
});
// Run inference
const outputs = await runInference(model, inputs);
// Cleanup
model.dispose();import { preloadModel, preloadModels, getPreloadStatus } from 'edgeflowjs';
// Preload a single model in background (with priority)
preloadModel('https://example.com/model1.onnx', { priority: 10 });
// Preload multiple models
preloadModels([
{ url: 'https://example.com/model1.onnx', priority: 10 },
{ url: 'https://example.com/model2.onnx', priority: 5 },
]);
// Check preload status
const status = getPreloadStatus('https://example.com/model1.onnx');
// 'pending' | 'loading' | 'complete' | 'error' | 'not_found'import {
isModelCached,
getCachedModel,
deleteCachedModel,
clearModelCache,
getModelCacheStats
} from 'edgeflowjs';
// Check if model is cached
if (await isModelCached('https://example.com/model.onnx')) {
console.log('Model is cached!');
}
// Get cached model data directly
const modelData = await getCachedModel('https://example.com/model.onnx');
// Delete a specific cached model
await deleteCachedModel('https://example.com/model.onnx');
// Clear all cached models
await clearModelCache();
// Get cache statistics
const stats = await getModelCacheStats();
console.log(`${stats.models} models cached, ${stats.totalSize} bytes total`);Large model downloads automatically support resuming from where they left off:
import { loadModelData } from 'edgeflowjs';
// Download with progress and resume support
const modelData = await loadModelData('https://example.com/large-model.onnx', {
resumable: true,
chunkSize: 10 * 1024 * 1024, // 10MB chunks
parallelConnections: 4, // Download 4 chunks in parallel
onProgress: (progress) => {
console.log(`${progress.percent.toFixed(1)}% downloaded`);
console.log(`Speed: ${(progress.speed / 1024 / 1024).toFixed(2)} MB/s`);
console.log(`ETA: ${(progress.eta / 1000).toFixed(0)}s`);
console.log(`Chunk ${progress.currentChunk}/${progress.totalChunks}`);
}
});import { quantize } from 'edgeflowjs/tools';
const quantized = await quantize(model, {
method: 'int8',
calibrationData: samples,
});
console.log(`Compression: ${quantized.compressionRatio}x`);
// Compression: 3.8ximport { benchmark } from 'edgeflowjs/tools';
const result = await benchmark(
() => model.run('sample text'),
{ warmupRuns: 5, runs: 100 }
);
console.log(result);
// {
// avgTime: 12.5,
// minTime: 10.2,
// maxTime: 18.3,
// throughput: 80 // inferences/sec
// }import { withMemoryScope, tensor } from 'edgeflowjs';
const result = await withMemoryScope(async (scope) => {
// Tensors tracked in scope
const a = scope.track(tensor([1, 2, 3]));
const b = scope.track(tensor([4, 5, 6]));
// Process...
const output = process(a, b);
// Keep result, dispose others
return scope.keep(output);
});
// a and b automatically disposedimport { tensor, zeros, ones, matmul, softmax, relu } from 'edgeflowjs';
// Create tensors
const a = tensor([[1, 2], [3, 4]]);
const b = zeros([2, 2]);
const c = ones([2, 2]);
// Operations
const d = matmul(a, c);
const probs = softmax(d);
const activated = relu(d);
// Cleanup
a.dispose();
b.dispose();
c.dispose();| Browser | WebGPU | WebNN | WASM |
|---|---|---|---|
| Chrome 113+ | ✅ | ✅ | ✅ |
| Edge 113+ | ✅ | ✅ | ✅ |
| Firefox 118+ | ❌ | ✅ | |
| Safari 17+ | ❌ | ✅ |
pipeline(task, options?)- Create a pipeline for a taskloadModel(url, options?)- Load a model from URLrunInference(model, inputs)- Run model inferencegetScheduler()- Get the global schedulergetMemoryManager()- Get the memory managerrunInWorker(url, inputs)- Run inference in a Web WorkerWorkerPool- Manage multiple workers for parallel inference
TextClassificationPipeline- Text/sentiment classificationSentimentAnalysisPipeline- Sentiment analysisFeatureExtractionPipeline- Text embeddingsImageClassificationPipeline- Image classificationTextGenerationPipeline- Text generation with streamingObjectDetectionPipeline- Object detection with bounding boxesAutomaticSpeechRecognitionPipeline- Speech to textZeroShotClassificationPipeline- Classify without trainingQuestionAnsweringPipeline- Extractive QA
fromHub(modelId, options?)- Load model bundle from HuggingFacefromTask(task, options?)- Load recommended model for taskdownloadTokenizer(modelId)- Download tokenizer onlydownloadConfig(modelId)- Download config onlyPOPULAR_MODELS- Registry of popular models by task
Tokenizer- BPE/WordPiece tokenization with HuggingFace supportImagePreprocessor- Image preprocessing with HuggingFace config supportAudioPreprocessor- Audio preprocessing for Whisper/wav2vecCache- LRU caching utilities
quantize(model, options)- Quantize a modelprune(model, options)- Prune model weightsbenchmark(fn, options)- Benchmark inferenceanalyzeModel(model)- Analyze model structure
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT © edgeFlow.js Contributors
Get Started · API Docs · Examples
Made with ❤️ for the edge AI community