MUG uses a hybrid compression strategy combining Zstandard (zstd) for native efficiency and Flate2/zlib for Git compatibility. The architecture provides abstraction through trait-based compression that allows for extensibility and fallback mechanisms.
Usage: Native MUG pack files and object storage
Characteristics:
- 5-10x faster than zlib at similar compression levels
- Better compression ratios than zlib
- Streaming support for large files
- Preferred for new MUG operations
Implementation:
pub struct ZstdCompressor {
level: i32, // 3 (fast) to 10 (best)
}Compression Levels:
- Level 3 (Fast): ~40% compression, ~100x faster
- Level 10 (Default): ~45% compression, balanced speed/ratio
Used In:
src/pack/pack_builder.rs- Creating pack filessrc/pack/pack_reader.rs- Reading pack filessrc/pack/pack_file.rs- Writing/reading individual chunkssrc/remote/parallel_fetch.rs- Downloading chunks
Usage: Git compatibility and fallback compression
Characteristics:
- Standard zlib format (gzip wrapper)
- Compatible with Git's compression
- Slower than zstd but more universally supported
- Used for legacy data and Git object handling
Implementation:
pub struct FlateCompressor;Used In:
src/remote/git_compat.rs- Git object decompression- Fallback for incompatible data formats
All compression implementations follow the Compressor trait:
pub trait Compressor {
fn compress(&self, data: &[u8]) -> std::io::Result<Vec<u8>>;
fn decompress(&self, data: &[u8]) -> std::io::Result<Vec<u8>>;
}This trait-based design allows:
- Runtime compression selection based on data type
- Easy fallback from zstd to flate2
- Future compression algorithms without breaking existing code
- Testing with mock compressors
src/pack/
├── compression.rs
│ ├── Compressor (trait)
│ ├── ZstdCompressor
│ ├── FlateCompressor
│ └── Tests
├── pack_builder.rs (uses ZstdCompressor)
├── pack_reader.rs (uses ZstdCompressor)
├── pack_file.rs (uses ZstdCompressor)
├── manifest.rs (tracks compression metadata)
└── packer.rs (high-level packing orchestration)
src/remote/
├── git_compat.rs (uses FlateCompressor for Git)
└── parallel_fetch.rs (uses ZstdCompressor for downloads)
Object Data
↓
[Chunking] (rolling hash deduplication)
↓
[Compression] (ZstdCompressor::fast())
↓
[Pack Storage] (.mug/objects/packs/)
↓
[Manifest Update] (track compressed sizes)
Pack File (on disk)
↓
[Load Chunk Header] (read size metadata)
↓
[Read Compressed Data]
↓
[Decompression] (ZstdCompressor::decompress())
↓
[Reconstruct Object]
Git Object (zlib compressed)
↓
[FlateCompressor::decompress()]
↓
[Parse Content]
↓
[Import to MUG] (recompress with zstd if needed)
The compression architecture supports automatic fallback:
// Try primary compression (zstd)
match zstd_compressor.decompress(data) {
Ok(result) => Ok(result),
Err(_) => {
// Fall back to flate2 if zstd fails
flate2_compressor.decompress(data)
}
}- Git Repository Migration: Importing from Git uses zlib
- Legacy Data: Old zlib-compressed objects automatically decompress
- External Data: Data from non-MUG sources may use different compression
- Format Detection: Automatic format detection based on headers
| Operation | zstd (L3) | zstd (L10) | zlib |
|---|---|---|---|
| Compress 1GB | ~10s | ~100s | ~400s |
| Decompress 1GB | ~1s | ~1s | ~30s |
| Compression Ratio | 45% | 47% | 43% |
// Fast operations (real-time packing)
ZstdCompressor::fast() // Level 3
// Balanced operations (normal pack creation)
ZstdCompressor::default() // Level 10
// Custom levels
ZstdCompressor::new(5) // Custom level- Compression: ~1-2 MB buffer (level dependent)
- Decompression: ~64 KB window buffer
- Streaming: Constant memory regardless of file size
- Compression: ~32 KB
- Decompression: ~32 KB
- Streaming: Constant memory regardless of file size
[Pack Header]
- Format version
- Chunk count
- Manifest offset
[Compressed Chunks] (repeating)
- Chunk Hash (32 bytes)
- Compressed Size (4 bytes, little-endian)
- Compressed Data
- CRC32 checksum (4 bytes, optional)
[Manifest]
- Chunk registry (hash -> offset mapping)
- Compression metadata
- Deduplication info
Each chunk entry in the manifest tracks:
{
"hash": "sha256hash",
"size": 1048576,
"compressed_size": 262144,
"compression": "zstd",
"offset": 65536,
"dedup_count": 3
}let compressor = ZstdCompressor::fast();
let compressed = compressor.compress(&chunk_data)?;
current_pack.data.write_all(&compressed)?;
manifest.add_chunk_compressed(
chunk_hash,
chunk_data.len() as u64,
compressed.len() as u64,
"zstd".to_string(),
)?;let compressor = ZstdCompressor::default();
let compressed = read_chunk_from_disk()?;
let decompressed = compressor.decompress(&compressed)?;// Downloads use fast compression for bandwidth
let compressor = ZstdCompressor::fast();Currently, compression levels are hardcoded for optimal balance:
- Fast operations: Level 3 (real-time feedback)
- Batch operations: Level 10 (maximum efficiency)
Planned enhancements include:
[compression]
default_level = 10
pack_level = 10
remote_level = 3
enable_zstd = true
enable_flate2 = truecargo test core::pack::compressionTests verify:
- Round-trip compression/decompression
- Compression ratio expectations
- Empty data handling
- Large data handling
- Different compression levels
#[test]
fn test_zstd_compression() {
let compressor = ZstdCompressor::default();
let data = b"hello world".repeat(100);
let compressed = compressor.compress(&data).unwrap();
let decompressed = compressor.decompress(&compressed).unwrap();
assert_eq!(data.to_vec(), decompressed);
assert!(compressed.len() < data.len());
}
#[test]
fn test_compression_ratio() {
let compressor = ZstdCompressor::default();
let data = vec![b'a'; 10000];
let compressed = compressor.compress(&data).unwrap();
let ratio = compressed.len() as f64 / data.len() as f64;
// Highly repetitive data should compress < 1%
assert!(ratio < 0.01);
}If mug pack create is slow:
- Check compression level: Default is level 10 (most time-consuming)
- Use streaming: For real-time operations, use
ZstdCompressor::fast() - Hardware: Increase available CPU cores (multi-threading supported)
If unpacking fails with "decompression error":
- Check format: Verify pack file isn't corrupted (
mug pack verify) - Try fallback: System should auto-fallback to zlib if needed
- Verify repository: Run
mug verifyto check integrity - Garbage collection: Run
mug gcto clean corrupted data
If compression isn't achieving expected ratios:
- Data type: Already-compressed data (images, video) won't compress well
- Level settings: Verify using appropriate level for data type
- Metadata overhead: Small files have high header overhead
- zstd:
0.13.3(Zstandard compression) - flate2:
1.1.5(gzip/zlib compression) - sha2:
0.10.9(chunk hashing)
- Adaptive Compression: Auto-select level based on data type
- Parallel Compression: Multi-threaded compression for large files
- Compression Hints: User-provided hints for incompressible data
- Dictionary-based: Use compression dictionaries for similar objects
- Network Compression: Transparent compression for remote transfers
- Benchmarking: Built-in compression benchmarking tools