Skip to content

s-zx/edgeFlow.js

Repository files navigation

edgeFlow.js

Browser ML inference framework with task scheduling and smart caching.

npm version install size license

Documentation · Examples · API Reference · English | 中文


✨ Features

  • 📋 Task Scheduler - Priority queue, concurrency control, task cancellation
  • 🔄 Batch Processing - Efficient batch inference out of the box
  • 💾 Memory Management - Automatic memory tracking and cleanup with scopes
  • 📥 Smart Model Loading - Preloading, sharding, resume download support
  • 💿 Offline Caching - IndexedDB-based model caching for offline use
  • Multi-Backend - ONNX Runtime with WebGPU/WASM execution providers, automatic fallback
  • 🤗 HuggingFace Hub - Direct model download with one line
  • 🔤 Real Tokenizers - BPE & WordPiece tokenizers, load tokenizer.json directly
  • 👷 Web Worker Support - Run inference in background threads
  • 📦 Batteries Included - ONNX Runtime bundled, zero configuration needed
  • 🎯 TypeScript First - Full type support with intuitive APIs

📦 Installation

npm install edgeflowjs
yarn add edgeflowjs
pnpm add edgeflowjs

Note: ONNX Runtime is included as a dependency. No additional setup required.

🚀 Quick Start

Try the Demo

Run the interactive demo locally to test all features:

# Clone and install
git clone https://github.com/user/edgeflow.js.git
cd edgeflow.js
npm install

# Build and start demo server
npm run demo

Open http://localhost:3000 in your browser:

  1. Load Model - Enter a Hugging Face ONNX model URL and click "Load Model"

    https://huggingface.co/Xenova/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/onnx/model_quantized.onnx
    
  2. Test Features:

    • 🧮 Tensor Operations - Test tensor creation, math ops, softmax, relu
    • 📝 Text Classification - Run sentiment analysis on text
    • 🔍 Feature Extraction - Extract embeddings from text
    • Task Scheduling - Test priority-based scheduling
    • 📋 Task Scheduler - Test priority-based task scheduling
    • 💾 Memory Management - Test allocation and cleanup

Basic Usage

import { pipeline } from 'edgeflowjs';

// Create a sentiment analysis pipeline
const sentiment = await pipeline('sentiment-analysis');

// Run inference
const result = await sentiment.run('I love this product!');
console.log(result);
// { label: 'positive', score: 0.98, processingTime: 12.5 }

Batch Processing

// Native batch processing support
const results = await sentiment.run([
  'This is amazing!',
  'This is terrible.',
  'It\'s okay I guess.'
]);

console.log(results);
// [
//   { label: 'positive', score: 0.95 },
//   { label: 'negative', score: 0.92 },
//   { label: 'neutral', score: 0.68 }
// ]

Multiple Pipelines

import { pipeline } from 'edgeflowjs';

// Create multiple pipelines
const classifier = await pipeline('text-classification');
const extractor = await pipeline('feature-extraction');

// Run in parallel with Promise.all
const [classification, features] = await Promise.all([
  classifier.run('Sample text'),
  extractor.run('Sample text')
]);

Image Classification

import { pipeline } from 'edgeflowjs';

const classifier = await pipeline('image-classification');

// From URL
const result = await classifier.run('https://example.com/image.jpg');

// From HTMLImageElement
const img = document.getElementById('myImage');
const result = await classifier.run(img);

// Batch
const results = await classifier.run([img1, img2, img3]);

Text Generation (Streaming)

import { pipeline } from 'edgeflowjs';

const generator = await pipeline('text-generation');

// Simple generation
const result = await generator.run('Once upon a time', {
  maxNewTokens: 50,
  temperature: 0.8,
});
console.log(result.generatedText);

// Streaming output
for await (const event of generator.stream('Hello, ')) {
  process.stdout.write(event.token);
  if (event.done) break;
}

Zero-shot Classification

import { pipeline } from 'edgeflowjs';

const classifier = await pipeline('zero-shot-classification');

const result = await classifier.classify(
  'I love playing soccer on weekends',
  ['sports', 'politics', 'technology', 'entertainment']
);

console.log(result.labels[0], result.scores[0]);
// 'sports', 0.92

Question Answering

import { pipeline } from 'edgeflowjs';

const qa = await pipeline('question-answering');

const result = await qa.run({
  question: 'What is the capital of France?',
  context: 'Paris is the capital and largest city of France.'
});

console.log(result.answer); // 'Paris'

Load from HuggingFace Hub

import { fromHub, fromTask } from 'edgeflowjs';

// Load by model ID (auto-downloads model, tokenizer, config)
const bundle = await fromHub('Xenova/distilbert-base-uncased-finetuned-sst-2-english');
console.log(bundle.tokenizer); // Tokenizer instance
console.log(bundle.config);    // Model config

// Load by task name (uses recommended model)
const sentimentBundle = await fromTask('sentiment-analysis');

Web Workers (Background Inference)

import { runInWorker, WorkerPool, isWorkerSupported } from 'edgeflowjs';

// Simple: run inference in background thread
if (isWorkerSupported()) {
  const outputs = await runInWorker(modelUrl, inputs);
}

// Advanced: use worker pool for parallel processing
const pool = new WorkerPool({ numWorkers: 4 });
await pool.init();

const modelId = await pool.loadModel(modelUrl);
const results = await pool.runBatch(modelId, batchInputs);

pool.terminate();

🎯 Supported Tasks

Task Pipeline Status
Text Generation text-generation ✅ Production (TinyLlama, streaming, KV cache)
Image Segmentation image-segmentation ✅ Production (SlimSAM, interactive prompts)
Text Classification text-classification ⚠️ Experimental (heuristic, provide own model)
Sentiment Analysis sentiment-analysis ⚠️ Experimental (heuristic, provide own model)
Feature Extraction feature-extraction ⚠️ Experimental (mock embeddings, provide own model)
Image Classification image-classification ⚠️ Experimental (heuristic, provide own model)
Object Detection object-detection ⚠️ Experimental (real NMS/IoU, needs own model)
Speech Recognition automatic-speech-recognition ⚠️ Experimental (preprocessing only, needs model)
Zero-shot Classification zero-shot-classification ⚠️ Experimental (random scoring, needs NLI model)
Question Answering question-answering ⚠️ Experimental (word overlap heuristic, needs model)

Note: Experimental pipelines work for demos and testing the API surface. For production accuracy, provide a real ONNX model via options.model or use the transformers.js adapter backend to leverage HuggingFace's model ecosystem.

⚡ Key Differentiators

edgeFlow.js is not a replacement for transformers.js — it is a production orchestration layer that can wrap any inference engine (including transformers.js) and add the features real apps need.

What edgeFlow.js adds on top of inference engines

Feature Inference engines alone With edgeFlow.js
Task Scheduling None — run and hope Priority queue with concurrency limits
Task Cancellation Not possible Cancel pending/queued tasks
Batch Processing Manual Built-in batching with configurable size
Memory Management Manual cleanup Automatic scopes, leak detection, GC hints
Model Preloading Manual Background preloading with priority queue
Resume Download Start over on failure Chunked download with automatic resume
Model Caching Basic or none IndexedDB cache with stats and eviction
Pipeline Composition Not available Chain multiple models (ASR → translate → TTS)
Device Adaptation Manual model selection Auto-select model variant by device capability
Performance Monitoring External tooling needed Built-in dashboard and alerting

🔌 transformers.js Adapter (Recommended)

Use edgeFlow.js as an orchestration layer on top of transformers.js to get access to 1000+ HuggingFace models with scheduling, caching, and memory management:

import { pipeline as tfPipeline } from '@xenova/transformers';
import { useTransformersBackend, pipeline } from 'edgeflowjs';

// Register transformers.js as the inference backend
useTransformersBackend({
  pipelineFactory: tfPipeline,
  device: 'webgpu',    // GPU acceleration
  dtype: 'fp16',       // Half precision
});

// Use edgeFlow.js API — scheduling, caching, memory management included
const classifier = await pipeline('text-classification', {
  model: 'Xenova/distilbert-base-uncased-finetuned-sst-2-english',
});

const result = await classifier.run('I love this product!');

Why? transformers.js is excellent at loading and running single models. edgeFlow.js adds the production features you need when running multiple models, managing memory on constrained devices, caching for offline use, and scheduling concurrent inference.

🔧 Configuration

Runtime Selection

import { pipeline } from 'edgeflowjs';

// Automatic (recommended)
const model = await pipeline('text-classification');

// Specify runtime
const model = await pipeline('text-classification', {
  runtime: 'webgpu' // or 'webnn', 'wasm', 'auto'
});

Memory Management

import { pipeline, getMemoryStats, gc } from 'edgeflowjs';

const model = await pipeline('text-classification');

// Use the model
await model.run('text');

// Check memory usage
console.log(getMemoryStats());
// { allocated: 50MB, used: 45MB, peak: 52MB, tensorCount: 12 }

// Explicit cleanup
model.dispose();

// Force garbage collection
gc();

Scheduler Configuration

import { configureScheduler } from 'edgeflowjs';

configureScheduler({
  maxConcurrentTasks: 4,
  maxConcurrentPerModel: 1,
  defaultTimeout: 30000,
  enableBatching: true,
  maxBatchSize: 32,
});

Caching

import { pipeline, Cache } from 'edgeflowjs';

// Create a cache
const cache = new Cache({
  strategy: 'lru',
  maxSize: 100 * 1024 * 1024, // 100MB
  persistent: true, // Use IndexedDB
});

const model = await pipeline('text-classification', {
  cache: true
});

🛠️ Advanced Usage

Custom Model Loading

import { loadModel, runInference } from 'edgeflowjs';

// Load from URL with caching, sharding, and resume support
const model = await loadModel('https://example.com/model.bin', {
  runtime: 'webgpu',
  quantization: 'int8',
  cache: true,           // Enable IndexedDB caching (default: true)
  resumable: true,       // Enable resume download (default: true)
  chunkSize: 5 * 1024 * 1024, // 5MB chunks for large models
  onProgress: (progress) => console.log(`Loading: ${progress * 100}%`)
});

// Run inference
const outputs = await runInference(model, inputs);

// Cleanup
model.dispose();

Preloading Models

import { preloadModel, preloadModels, getPreloadStatus } from 'edgeflowjs';

// Preload a single model in background (with priority)
preloadModel('https://example.com/model1.onnx', { priority: 10 });

// Preload multiple models
preloadModels([
  { url: 'https://example.com/model1.onnx', priority: 10 },
  { url: 'https://example.com/model2.onnx', priority: 5 },
]);

// Check preload status
const status = getPreloadStatus('https://example.com/model1.onnx');
// 'pending' | 'loading' | 'complete' | 'error' | 'not_found'

Model Caching

import { 
  isModelCached, 
  getCachedModel, 
  deleteCachedModel, 
  clearModelCache,
  getModelCacheStats 
} from 'edgeflowjs';

// Check if model is cached
if (await isModelCached('https://example.com/model.onnx')) {
  console.log('Model is cached!');
}

// Get cached model data directly
const modelData = await getCachedModel('https://example.com/model.onnx');

// Delete a specific cached model
await deleteCachedModel('https://example.com/model.onnx');

// Clear all cached models
await clearModelCache();

// Get cache statistics
const stats = await getModelCacheStats();
console.log(`${stats.models} models cached, ${stats.totalSize} bytes total`);

Resume Downloads

Large model downloads automatically support resuming from where they left off:

import { loadModelData } from 'edgeflowjs';

// Download with progress and resume support
const modelData = await loadModelData('https://example.com/large-model.onnx', {
  resumable: true,
  chunkSize: 10 * 1024 * 1024, // 10MB chunks
  parallelConnections: 4,      // Download 4 chunks in parallel
  onProgress: (progress) => {
    console.log(`${progress.percent.toFixed(1)}% downloaded`);
    console.log(`Speed: ${(progress.speed / 1024 / 1024).toFixed(2)} MB/s`);
    console.log(`ETA: ${(progress.eta / 1000).toFixed(0)}s`);
    console.log(`Chunk ${progress.currentChunk}/${progress.totalChunks}`);
  }
});

Model Quantization

import { quantize } from 'edgeflowjs/tools';

const quantized = await quantize(model, {
  method: 'int8',
  calibrationData: samples,
});

console.log(`Compression: ${quantized.compressionRatio}x`);
// Compression: 3.8x

Benchmarking

import { benchmark } from 'edgeflowjs/tools';

const result = await benchmark(
  () => model.run('sample text'),
  { warmupRuns: 5, runs: 100 }
);

console.log(result);
// {
//   avgTime: 12.5,
//   minTime: 10.2,
//   maxTime: 18.3,
//   throughput: 80 // inferences/sec
// }

Memory Scope

import { withMemoryScope, tensor } from 'edgeflowjs';

const result = await withMemoryScope(async (scope) => {
  // Tensors tracked in scope
  const a = scope.track(tensor([1, 2, 3]));
  const b = scope.track(tensor([4, 5, 6]));
  
  // Process...
  const output = process(a, b);
  
  // Keep result, dispose others
  return scope.keep(output);
});
// a and b automatically disposed

🔌 Tensor Operations

import { tensor, zeros, ones, matmul, softmax, relu } from 'edgeflowjs';

// Create tensors
const a = tensor([[1, 2], [3, 4]]);
const b = zeros([2, 2]);
const c = ones([2, 2]);

// Operations
const d = matmul(a, c);
const probs = softmax(d);
const activated = relu(d);

// Cleanup
a.dispose();
b.dispose();
c.dispose();

🌐 Browser Support

Browser WebGPU WebNN WASM
Chrome 113+
Edge 113+
Firefox 118+ ⚠️ Flag
Safari 17+ ⚠️ Preview

📖 API Reference

Core

  • pipeline(task, options?) - Create a pipeline for a task
  • loadModel(url, options?) - Load a model from URL
  • runInference(model, inputs) - Run model inference
  • getScheduler() - Get the global scheduler
  • getMemoryManager() - Get the memory manager
  • runInWorker(url, inputs) - Run inference in a Web Worker
  • WorkerPool - Manage multiple workers for parallel inference

Pipelines

  • TextClassificationPipeline - Text/sentiment classification
  • SentimentAnalysisPipeline - Sentiment analysis
  • FeatureExtractionPipeline - Text embeddings
  • ImageClassificationPipeline - Image classification
  • TextGenerationPipeline - Text generation with streaming
  • ObjectDetectionPipeline - Object detection with bounding boxes
  • AutomaticSpeechRecognitionPipeline - Speech to text
  • ZeroShotClassificationPipeline - Classify without training
  • QuestionAnsweringPipeline - Extractive QA

HuggingFace Hub

  • fromHub(modelId, options?) - Load model bundle from HuggingFace
  • fromTask(task, options?) - Load recommended model for task
  • downloadTokenizer(modelId) - Download tokenizer only
  • downloadConfig(modelId) - Download config only
  • POPULAR_MODELS - Registry of popular models by task

Utilities

  • Tokenizer - BPE/WordPiece tokenization with HuggingFace support
  • ImagePreprocessor - Image preprocessing with HuggingFace config support
  • AudioPreprocessor - Audio preprocessing for Whisper/wav2vec
  • Cache - LRU caching utilities

Tools

  • quantize(model, options) - Quantize a model
  • prune(model, options) - Prune model weights
  • benchmark(fn, options) - Benchmark inference
  • analyzeModel(model) - Analyze model structure

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

MIT © edgeFlow.js Contributors


Get Started · API Docs · Examples

Made with ❤️ for the edge AI community

About

Browser ML inference framework with task scheduling and smart caching.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors