Tensor Cache API Reference

Complete type reference, configuration tables, storage patterns, error types, and method signatures for the tensor_cache crate.

Core Types

Type	Description
`Cache`	Main API - multi-layer LLM response cache
`CacheConfig`	Configuration (capacity, TTL, eviction, metrics)
`CacheHit`	Successful cache lookup result
`CacheStats`	Thread-safe statistics with atomic counters
`StatsSnapshot`	Point-in-time snapshot for reporting
`CacheLayer`	Enum: `Exact`, `Semantic`, `Embedding`
`CacheError`	Error types for cache operations

Configuration Types

Type	Description
`EvictionStrategy`	`LRU`, `LFU`, `CostBased`, `Hybrid`
`EvictionManager`	Background eviction task controller
`EvictionScorer`	Calculates eviction priority scores
`EvictionHandle`	Handle for controlling background eviction
`EvictionConfig`	Interval, batch size, and strategy settings

Token Counting Types

Type	Description
`TokenCounter`	GPT-4 compatible token counting via tiktoken
`ModelPricing`	Predefined pricing for GPT-4, Claude 3, etc.

Index Types (Internal)

Type	Description
`CacheIndex`	HNSW wrapper with key-to-node mapping
`IndexSearchResult`	Semantic search result with similarity score

Error Types

Error	Description	Recovery
`NotFound`	Cache entry not found	Check key exists
`DimensionMismatch`	Embedding dimension does not match config	Verify embedding size
`StorageError`	Underlying tensor store error	Check store health
`SerializationError`	Serialization/deserialization failed	Verify data format
`TokenizerError`	Token counting failed	Falls back to estimation
`CacheFull`	Cache capacity exceeded	Run eviction or increase capacity
`InvalidConfig`	Invalid configuration provided	Fix config values
`Cancelled`	Operation was cancelled	Retry operation
`LockPoisoned`	Internal lock was poisoned	Restart cache

Error Conversion

impl From<tensor_store::TensorStoreError> for CacheError {
    fn from(e: TensorStoreError) -> Self {
        Self::StorageError(e.to_string())
    }
}

impl From<bitcode::Error> for CacheError {
    fn from(e: bitcode::Error) -> Self {
        Self::SerializationError(e.to_string())
    }
}

Cache Methods

Method	Description
`new()`	Create with default config
`with_config(config)`	Create with custom config
`with_store(store, config)`	Create with shared TensorStore
`get(prompt, embedding)`	Look up cached response
`get_with_metric(prompt, embedding, metric)`	Look up with explicit metric
`put(prompt, embedding, response, model, ttl)`	Store response
`get_embedding(source, content)`	Get cached embedding
`put_embedding(source, content, embedding, model)`	Store embedding
`get_or_compute_embedding(source, content, model, compute)`	Get or compute embedding
`get_simple(key)`	Simple key-value lookup
`put_simple(key, value)`	Simple key-value store
`invalidate(prompt)`	Remove exact entry
`invalidate_version(version)`	Remove entries by version
`invalidate_embeddings(source)`	Remove embeddings by source
`evict(count)`	Manually evict entries
`cleanup_expired()`	Remove expired entries
`clear()`	Clear all entries
`stats()`	Get statistics reference
`stats_snapshot()`	Get statistics snapshot
`config()`	Get configuration reference
`len()`	Total cached entries
`is_empty()`	Check if cache is empty

CacheHit Fields

Field	Type	Description
`response`	`String`	Cached response text
`layer`	`CacheLayer`	Which layer matched
`similarity`	`Option<f32>`	Similarity score (semantic only)
`input_tokens`	`usize`	Input tokens saved
`output_tokens`	`usize`	Output tokens saved
`cost_saved`	`f64`	Estimated cost saved (dollars)
`metric_used`	`Option<DistanceMetric>`	Metric used (semantic only)

StatsSnapshot Fields

Field	Type	Description
`exact_hits`	`u64`	Exact cache hits
`exact_misses`	`u64`	Exact cache misses
`semantic_hits`	`u64`	Semantic cache hits
`semantic_misses`	`u64`	Semantic cache misses
`embedding_hits`	`u64`	Embedding cache hits
`embedding_misses`	`u64`	Embedding cache misses
`tokens_saved_in`	`u64`	Total input tokens saved
`tokens_saved_out`	`u64`	Total output tokens saved
`cost_saved_dollars`	`f64`	Total cost saved
`evictions`	`u64`	Total evictions
`expirations`	`u64`	Total expirations
`exact_size`	`usize`	Current exact cache size
`semantic_size`	`usize`	Current semantic cache size
`embedding_size`	`usize`	Current embedding cache size
`uptime_secs`	`u64`	Cache uptime in seconds

Cache Layers

Layer	Complexity	Key Prefix	Best For
Exact	O(1)	`_cache:exact:`	Identical queries (FAQ, canned responses)
Semantic	O(log n)	`_cache:sem:`	Natural language variations
Embedding	O(1)	`_cache:emb:`	Precomputed embedding reuse

Storage Format

Cache entries are stored as TensorData with standardized fields:

Field	Type	Description
`_response`	String	Cached response text
`_embedding`	Vector/Sparse	Embedding (semantic/embedding layers)
`_embedding_dim`	Int	Embedding dimension
`_input_tokens`	Int	Input token count
`_output_tokens`	Int	Output token count
`_model`	String	Model identifier
`_layer`	String	Cache layer (exact/semantic/embedding)
`_created_at`	Int	Creation timestamp (millis)
`_expires_at`	Int	Expiration timestamp (millis)
`_access_count`	Int	Access count for LFU
`_last_access`	Int	Last access timestamp for LRU
`_version`	String	Optional version tag
`_source`	String	Embedding source identifier
`_content_hash`	Int	Content hash for deduplication

Distance Metrics

Metric	Best For	Range	Formula
Cosine	Dense embeddings (default)	-1 to 1	`dot(a,b) / (
Angular	Linear angle relationships	0 to PI	`acos(cosine_sim)`
Jaccard	Sparse/binary embeddings	0 to 1	`
Euclidean	Absolute distances	0 to inf	`sqrt(sum((a-b)^2))`
WeightedJaccard	Sparse with magnitudes	0 to 1	Weighted set similarity

Auto-selection: when auto_select_metric is true, the cache uses Jaccard for sparse embeddings (sparsity >= threshold, default 70%) and the configured metric otherwise.

Configuration

Default Configuration

CacheConfig {
    exact_capacity: 10_000,
    semantic_capacity: 5_000,
    embedding_capacity: 50_000,
    default_ttl: Duration::from_secs(3600),
    max_ttl: Duration::from_secs(86400),
    semantic_threshold: 0.92,
    embedding_dim: 1536,
    eviction_strategy: EvictionStrategy::Hybrid {
        lru_weight: 40,
        lfu_weight: 30,
        cost_weight: 30
    },
    eviction_interval: Duration::from_secs(60),
    eviction_batch_size: 100,
    input_cost_per_1k: 0.0015,
    output_cost_per_1k: 0.002,
    inline_threshold: 4096,
    distance_metric: DistanceMetric::Cosine,
    auto_select_metric: true,
    sparsity_metric_threshold: 0.7,
}

Configuration Presets

Preset	Use Case	Exact	Semantic	Embedding	Eviction Batch
`default()`	General purpose	10,000	5,000	50,000	100
`high_throughput()`	High-traffic server	50,000	20,000	100,000	500
`low_memory()`	Memory-constrained	1,000	500	5,000	50
`development()`	Dev/testing	100	50	200	10
`sparse_embeddings()`	Sparse vectors	10,000	5,000	50,000	100

Configuration Validation

pub fn validate(&self) -> Result<(), String> {
    if self.semantic_threshold < 0.0 || self.semantic_threshold > 1.0 {
        return Err("semantic_threshold must be between 0.0 and 1.0");
    }
    if self.embedding_dim == 0 {
        return Err("embedding_dim must be greater than 0");
    }
    if self.eviction_batch_size == 0 {
        return Err("eviction_batch_size must be greater than 0");
    }
    if self.default_ttl > self.max_ttl {
        return Err("default_ttl cannot exceed max_ttl");
    }
    if self.sparsity_metric_threshold < 0.0 || self.sparsity_metric_threshold > 1.0 {
        return Err("sparsity_metric_threshold must be between 0.0 and 1.0");
    }
    Ok(())
}

Model Pricing

Model	Input/1K	Output/1K	Notes
GPT-4o	$0.005	$0.015	Best for complex tasks
GPT-4o mini	$0.00015	$0.0006	Cost-effective
GPT-4 Turbo	$0.01	$0.03	High capability
GPT-3.5 Turbo	$0.0005	$0.0015	Budget option
Claude 3 Opus	$0.015	$0.075	Highest quality
Claude 3 Sonnet	$0.003	$0.015	Balanced
Claude 3 Haiku	$0.00025	$0.00125	Fast and cheap

CacheIndex Structure

pub struct CacheIndex {
    index: RwLock<HNSWIndex>,           // HNSW graph
    config: HNSWConfig,                  // For recreation on clear
    key_to_node: DashMap<String, usize>, // Cache key -> HNSW node
    node_to_key: DashMap<usize, String>, // HNSW node -> Cache key
    dimension: usize,                    // Expected embedding dimension
    entry_count: AtomicUsize,            // Entry count
    distance_metric: DistanceMetric,     // Default metric
}

Insert Methods

Method	Description
`insert(key, embedding)`	Dense embedding insert
`insert_sparse(key, embedding)`	Sparse embedding insert (memory efficient)
`insert_auto(key, embedding, threshold)`	Auto-select based on sparsity

Performance

Benchmarks (10,000 entries, 128-dim embeddings)

Operation	Time	Notes
Exact lookup (hit)	~50ns	Hash lookup + TensorStore get
Exact lookup (miss)	~30ns	Hash lookup only
Semantic lookup	~5us	HNSW search + re-scoring
Put (exact + semantic)	~10us	Two stores + HNSW insert
Eviction (100 entries)	~200us	Batch deletion
Clear (full index)	~1ms	HNSW recreation

Distance Metric Performance (128-dim, 1000 entries)

Metric	Search Time	Notes
Cosine	21 us	Default, best for dense
Jaccard	18 us	Best for sparse
Angular	23 us	+acos overhead
Euclidean	19 us	Absolute distance

Auto-Selection Overhead

Operation	Time
Sparsity check	~50 ns
Metric selection	~10 ns

Memory Efficiency

Storage Type	Memory per Entry	Best For
Dense Vector	4 * dim bytes	Low sparsity (<50% zeros)
Sparse Vector	8 * nnz bytes	High sparsity (>50% zeros)

Edge Cases and Gotchas

TTL Behavior

Entries with expires_at = 0 never expire
Expired entries return None on lookup but remain in storage until cleanup
cleanup_expired() must be called explicitly or via background eviction

Capacity Limits

put() fails with CacheFull when capacity is reached
Capacity is checked per-layer (exact, semantic, embedding)
No automatic eviction on put -- must be explicit

Hash Collisions

Extremely unlikely with 64-bit hashes (~1 in 18 quintillion)
If collision occurs, exact cache will return wrong response
Semantic cache provides fallback for semantically different queries

Metric Re-scoring

HNSW always uses cosine similarity for graph navigation
Re-scoring with different metrics may change result order
Retrieves 3x candidates to account for re-ranking

Sparse Storage Threshold

Uses sparse format when nnz * 2 <= len (50% zeros)
Different from auto-metric selection threshold (default 70%)
Both thresholds are configurable

Shell Commands

CACHE INIT     Initialize semantic cache
CACHE STATS    Show cache statistics
CACHE CLEAR    Clear all cache entries

Dependencies

Crate	Purpose
`tensor_store`	HNSW index implementation, TensorStore
`tiktoken-rs`	GPT-compatible token counting
`dashmap`	Concurrent hash maps
`tokio`	Async runtime for background eviction
`uuid`	Unique ID generation
`thiserror`	Error type derivation
`serde`	Configuration serialization
`bincode`	Binary serialization

Related Modules

tensor_store -- Backing storage and HNSW index
query_router -- Cache integration for query execution
neumann_shell -- CLI commands for cache management

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tensor Cache API Reference

Core Types

Configuration Types

Token Counting Types

Index Types (Internal)

Error Types

Error Conversion

Cache Methods

CacheHit Fields

StatsSnapshot Fields

Cache Layers

Storage Format

Distance Metrics

Configuration

Default Configuration

Configuration Presets

Configuration Validation

Model Pricing

CacheIndex Structure

Insert Methods

Performance

Benchmarks (10,000 entries, 128-dim embeddings)

Distance Metric Performance (128-dim, 1000 entries)

Auto-Selection Overhead

Memory Efficiency

Edge Cases and Gotchas

TTL Behavior

Capacity Limits

Hash Collisions

Metric Re-scoring

Sparse Storage Threshold

Shell Commands

Dependencies

Related Modules

See Also

Uh oh!

FilesExpand file tree

tensor-cache.md

Latest commit

History

tensor-cache.md

File metadata and controls

Tensor Cache API Reference

Core Types

Configuration Types

Token Counting Types

Index Types (Internal)

Error Types

Error Conversion

Cache Methods

CacheHit Fields

StatsSnapshot Fields

Cache Layers

Storage Format

Distance Metrics

Configuration

Default Configuration

Configuration Presets

Configuration Validation

Model Pricing

CacheIndex Structure

Insert Methods

Performance

Benchmarks (10,000 entries, 128-dim embeddings)

Distance Metric Performance (128-dim, 1000 entries)

Auto-Selection Overhead

Memory Efficiency

Edge Cases and Gotchas

TTL Behavior

Capacity Limits

Hash Collisions

Metric Re-scoring

Sparse Storage Threshold

Shell Commands

Dependencies

Related Modules

See Also