GitHub - uncoalesced/Peridot: Peridot v1.3 [Beta] | Sovereign AI Kernel. Engineered by uncoalesced. Offline, GPU accelerated Llama-3 framework. Features 21ms WebSocket VRAM scheduling, Voice Synthesis, idle protein folding, Layer 1 RAG & hardware telemetry. Zero data exfiltration. Absolute hardware control.

██████╗ ███████╗██████╗ ██╗██████╗  ██████╗ ████████╗
██╔══██╗██╔════╝██╔══██╗██║██╔══██╗██╔═══██╗╚══██╔══╝
██████╔╝█████╗  ██████╔╝██║██║  ██║██║   ██║   ██║   
██╔═══╝ ██╔══╝  ██╔══██╗██║██║  ██║██║   ██║   ██║   
██║     ███████╗██║  ██║██║██████╔╝╚██████╔╝   ██║   
╚═╝     ╚══════╝╚═╝  ╚═╝╚═╝╚═════╝  ╚═════╝    ╚═╝

`SOVEREIGN AI KERNEL — v1.3 BETA`

⚠️ BETA RELEASE
Medical research module under active validation. Report issues →

Engineered by uncoalesced

GPU-accelerated, air-gapped AI runtime with defense-in-depth security.
Zero telemetry. Zero cloud dependency. Absolute user sovereignty.

`> OVERVIEW`

Peridot is a local LLM runtime with permission-based function calling that runs entirely on your hardware and logs every action it takes.

Most AI assistants are surveillance infrastructure with a chat interface. Peridot is the opposite.

┌─────────────────────────────────────────────────────────┐
│  USER INPUT                                             │
│     │                                                   │
│     ▼                                                   │
│  SECURITY GATE                                          │
│  • Input Sanitization (XSS/Code Injection)              │
│  • File Access Blacklist (.env, .ssh/, /etc/)           │
│  • Path Traversal Prevention                            │
│     │                                                   │
│     ▼                                                   │
│  PERMISSION LAYER                                       │
│  • constitution.json (user-controlled)                  │
│  • Function call authorization                          │
│     │                                                   │
│     ▼                                                   │
│  INFERENCE ENGINE                                       │
│  • Llama-3-8B-Instruct (Q4_K_M)                         │
│  • localhost:5000 (air-gapped)                          │
│  • 45-55 tokens/sec sustained                           │
│     │                                                   │
│     ▼                                                   │
│  AUDIT LOG                                              │
│  • SHA-256 verified                                     │
│  • Append-only (immutable)                              │
│  • Security events logged separately                    │
└─────────────────────────────────────────────────────────┘

File Access Blacklist

The kernel actively blocks access to sensitive files and directories:

Blocked Files:

.env (environment variables)
.ssh/id_rsa (SSH private keys)
passwords.txt (credential stores)
auth.token (authentication tokens)

Blocked Directories:

C:\Windows\ (Windows system files)
/etc/ (Linux configuration)
/root/ (Linux root home)
/boot/ (Bootloader files)

Path traversal attacks (../../../etc/passwd) are automatically neutralized via path normalization.

Ephemeral API Authentication

API keys are generated cryptographically in RAM at boot and destroyed on shutdown.

Features:

Zero disk footprint (CWE-312 mitigation)
secrets.compare_digest() prevents timing attacks
Keys exist only in os.environ (process memory)
Automatic cleanup on exit

No API key files are ever written to disk.

Rate Limiting

Local inference API enforces strict request throttling:

60 requests per minute per client IP

Prevents:

Local DoS attacks
Automation abuse
Runaway scripts

Subprocess Command Whitelisting

Medical research integration (Folding@Home) uses hardcoded command whitelist:

ALLOWED_COMMANDS = ("pause", "unpause", "finish", "shutdown")

Any other command is immediately rejected and logged as a security violation.

`> SECURITY`

Peridot v1.2.2 introduced a hardened defense-in-depth security architecture protecting the inference engine from malicious input and unauthorized access.

Input Sanitization

All user prompts are sanitized before reaching the LLM. Blocked patterns include:

<script>         # XSS attacks
eval()           # Code execution
os.system()      # Shell injection
__import__       # Python import abuse
subprocess.      # Subprocess exploitation

Malicious inputs are immediately rejected and logged to logs/security.log.

Constitution Validation

If constitution.json is missing or corrupted, Peridot automatically falls back to a locked-down safe mode:

{
  "allow_file_read": false,
  "allow_file_write": false,
  "allow_code_execute": false,
  "allow_web_fetch": false
}

No privileged operations occur without explicit user authorization.

For full threat model and vulnerability disclosure process, see SECURITY.md.

`> PERFORMANCE`

Measured on real hardware. No overclocking. No cherry-picked runs.

Test Hardware:

GPU: NVIDIA GeForce RTX 5050 Laptop (8GB VRAM)
CPU: AMD Ryzen 7 250 AI
Model: Llama-3-8B-Instruct (Q4_K_M quantization)

Inference Benchmarks

Note: Benchmark images will be added in the upcoming commits.
Current data is measured via the \benchmarking folder in the repository.

Task	Output Tokens	Throughput
Short Response (chat)	~50 tokens	~55 t/s
Medium Response (logic)	~150 tokens	~50 t/s
Long Response (creative)	~512 tokens	~45 t/s

Measured sustained throughput: 45–55 tokens/sec

Cold start: ~6.2 seconds (model load into VRAM)

For comparison: Average human reading speed is ~4 tokens/sec.
Peridot generates text ~12× faster than you can read it.

VRAM Handoff Benchmarks

Dynamic GPU resource reallocation between Folding@Home and inference.

Note: VRAM handoff benchmark images will be added in the upcoming commits.

Measured Latencies:

VRAM Hot-Swap: 6.55 ms (pause command → VRAM freed)
Post-Handoff Inference: ~50 t/s sustained (no degradation)

Technical Implementation:
When a user query arrives, the system sends a WebSocket pause command to Folding@Home. The FAHClient releases GPU memory in 6.55 ms, allowing the inference engine to immediately reclaim VRAM without performance loss.

Zero overhead. Inference always takes priority.

`> ARCHITECTURE`

Peridot is built as a set of independent, composable modules. Each subsystem can be enabled, configured, or disabled without touching the core kernel.

Core Architecture & Feature Matrix

1. High-Velocity RAG Pipeline (Layer 1 RAM Cache)

The Retrieval-Augmented Generation (RAG) engine operates entirely in-memory for zero-latency context retrieval.

Vector Search Engine: Utilizes faiss-cpu for localized, high-density vector indexing and similarity search. This avoids the overhead of external vector databases, keeping the embedding search strictly within local RAM.
Semantic Embeddings: Powered by sentence-transformers, generating dense vector representations of textual data locally.
Context Injection: Seamlessly fetches mathematically relevant document chunks and injects them into the LLM context window prior to generation, ensuring grounded and context-aware responses without internet access.

2. Optimized Local Inference Engine

The generation layer is built for hardware efficiency and execution speed.

GGUF/Quantized Execution: Built on llama-cpp-python, allowing the kernel to run heavily quantized LLMs (e.g., 4-bit or 8-bit). This is strictly optimized to keep large models within an 8GB VRAM threshold while offloading secondary layers to the system CPU.
PyTorch Backend: Integrated torch support for custom tensor operations, embedding generation, and potential multimodal routing.

3. Dynamic Hardware Telemetry & Load Balancing

The kernel does not operate blindly; it maintains real-time awareness of the host hardware state to prevent thermal throttling and out-of-memory (OOM) crashes.

GPU Monitoring: Uses nvidia-ml-py to track VRAM allocation, GPU utilization, and core temperatures on Nvidia hardware (e.g., RTX 50-series) at the driver level.
System Telemetry: Employs psutil to monitor system RAM (optimizing for 16GB environments) and CPU thread saturation (optimized for Ryzen architectures).
Adaptive Throttling: The pipeline can dynamically adjust batch sizes or queue requests if the hardware telemetry detects resource exhaustion.

4. Asynchronous API & Gateway Services

Peridot acts as a localized server backbone, ready to interface with client applications.

RESTful Backbone: Built on Flask and Werkzeug to provide secure, local HTTP endpoints for client requests.
Real-Time Bi-Directional Streaming: Integrates websocket-client for continuous, low-latency data streams—critical for real-time transcription, screen-sharing analysis, or ongoing chat generation without HTTP overhead.
Cross-Origin Support: flask-cors ensures seamless integration with separate front-end interfaces or local network applications.

5. Persistent State & Thread-Safe Caching

Maintains structural integrity during continuous read/write operations.

Asynchronous File Locking: Uses filelock to prevent race conditions when multiple concurrent kernel processes attempt to read or write to the same memory banks or configuration files.
High-Speed Disk Caching: Utilizes diskcache backed by SQLite for lightning-fast retrieval of frequent queries or intermediate tensor states, reducing redundant computational overhead.
Multimodal Readiness: Incorporates Pillow for localized image processing and transformation before passing visual data into the inference or RAG pipelines.

`[01] — Inference Engine`

Core LLM runtime built on llama-cpp-python with cuBLAS GPU acceleration.

Model:     Llama-3-8B-Instruct (GGUF · Q4_K_M)
Backend:   llama-cpp-python + cuBLAS
Endpoint:  localhost:5000 (no external routing)
Context:   8192 tokens (sliding window)
Precision: 4-bit quantization (optimal VRAM/quality balance)

Why Llama-3-8B?
Best instruction-following accuracy at the 8B parameter scale. Fits comfortably in 6GB VRAM with Q4 quantization, leaving headroom for system processes.

`[02] — Sensory Subsystems`

Local audio processing. No cloud APIs.

Auditory System — powered by OpenAI Whisper

Voice-to-text transcription
Hands-free command input
100% offline (no audio transmission)

`[03] — Permission Layer`

Function-call interceptor that blocks execution before any action runs.

Edit constitution.json to control behavior:

{
  "allow_file_read": true,
  "allow_file_write": false,
  "allow_code_execute": false,
  "allow_web_fetch": true,
  "approved_domains": ["arxiv.org", "pubmed.ncbi.nlm.nih.gov"],
  "blocked_domains": ["example-malicious-site.com"]
}

To remove all restrictions: Delete constitution.json. Peridot enters unrestricted mode.

To enable safe mode: Delete constitution.json and restart. Peridot regenerates with all permissions disabled.

`[04] — Audit Log`

Append-only log of every query, action, and permission decision.

[2026-03-14 14:32:01] QUERY     "analyze this data"
[2026-03-14 14:32:01] PERMISSION read(data.csv) → ALLOWED
[2026-03-14 14:32:01] ACTION    file_read(data.csv) → SUCCESS
[2026-03-14 14:32:03] RESPONSE  delivered (312 tokens, 5.2s)
[2026-03-14 14:32:03] HASH      sha256: a3f9c2e8...

SHA-256 session hashing applied at shutdown to cryptographically verify log integrity.

Security events are logged separately to logs/security.log for forensic analysis.

`[05] — Medical Research Module (Folding@Home Integration)`

When Peridot is idle, your GPU contributes to medical research via Folding@home (Stanford University).

Idle State:

GPU Utilization:  <5%
Action:           Folding@home activated
Research:         Cancer protein dynamics, Alzheimer's, COVID-19 variants
Contribution:     ~400,000 points/day (varies by GPU)

Active State:

User query detected
Action:           WebSocket pause command sent
Latency:          6.55 ms (VRAM freed)
GPU Utilization:  85% (inference)

Features:

Opt-in (disabled by default)
Audited (all sessions logged)
Zero overhead (inference always takes priority)
Transparent (see exactly when GPU contributed)

Diseases targeted: Alzheimer's, Cancer, Parkinson's, COVID-19 variants

Commands:

research enable   # Activate medical research contribution
research disable  # Disable (VRAM locked to inference only)
research status   # Check current folding state + VRAM stats

`[06] — Terminal UI`

Custom tkinter interface designed for technical users.

Features:

Real-time hardware telemetry (CPU/RAM/GPU VRAM)
Drag-and-drop image input (for future vision modules)
Persistent conversation history
Medical research status indicator
Command palette

Not designed to look like a consumer product. Designed to be functional.

`> AUDITING & TESTING`

Dedicated Security Logger

Asynchronous security event logger:

logs/security.log

Records:

Authentication failures
Blocked file access attempts
Malicious input rejections
Constitution validation errors

Runs silently without affecting UI performance.

Automated Penetration Testing

Built-in red team test suite validates security measures:

python tests/security_tests.py

Tests include:

API authentication bypass attempts
File blacklist enforcement
Input sanitization effectiveness
Path traversal attack prevention

Threat Model Documentation

See SECURITY.md for:

Formal threat model
Security assumptions
Active defense mechanisms
Responsible vulnerability disclosure process

`> HARDWARE SUPPORT`

Tier	Hardware	Mode	Expected Speed
✅ Full Support	NVIDIA RTX 3060+ (6GB+)	Standard	40–70 t/s
✅ Full Support	NVIDIA RTX 4050+ (8GB+)	Standard	50–80 t/s
✅ Full Support	NVIDIA RTX 5050 (8GB)	Standard	45–55 t/s (tested)
⚙️ CPU Fallback	Any modern x64 CPU	CPU-Only	8–12 t/s
⚠️ Lite Mode	AMD Radeon 680M/780M	Phi-3	8–15 t/s
⚠️ Lite Mode	Intel Iris Xe	Phi-3	5–10 t/s
🛠️ Community	AMD RX 6000/7000 series	ROCm (Linux)	35–50 t/s
🛠️ Community	Intel Arc A750/A770	Vulkan	25–40 t/s

Lite Mode: Automatically selects Phi-3 Mini and reduces context to 2048 tokens.
Community Builds: Maintained by contributors. See COMMUNITY_INSTALL.md.

`> INSTALLATION`

Prerequisites

OS:      Windows 10/11 (64-bit)
GPU:     NVIDIA RTX Series, 6GB+ VRAM recommended
Python:  3.11
Storage: ~10GB free (SSD strongly recommended)

Setup

1. Clone the repository

git clone https://github.com/uncoalesced/Peridot.git
cd Peridot

2. Create virtual environment

python -m venv venv
.\venv\Scripts\activate

3. Run smart installer

The setup wizard performs a hardware audit, selects the correct CUDA build, and downloads the appropriate model automatically.

python setup.py

Expected output:

PERIDOT SETUP WIZARD
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[✓] NVIDIA GPU detected: RTX 5050 (8.0GB VRAM)
[✓] CUDA 12.1 compatible
[✓] Recommended model: Llama-3-8B-Instruct (Q4_K_M)
[✓] Installing CUDA-enabled llama-cpp-python...
[✓] Downloading model (4.7GB)...
[✓] Writing config...

Setup complete. Run: python launcher.py
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

`> USAGE`

Launch

python launcher.py

Wait for initialization:

>> Initializing Peridot Sovereign Kernel...
>> [1/2] Igniting Neural Engine (server.py)...
>> [WAIT] Verifying VRAM and API health...
>> [2/2] Launching Interface (main.py)...

[OK] Inference engine online — localhost:5000
[OK] Audio Subsystem: [ONLINE]
[OK] VRAM State Machine: [ACTIVE]
[OK] Peridot ready.

Command Reference

Command	Description
`help`	Show all available commands
`clear`	Clear chat history and screen
`status`	Display system diagnostics (Audio, VRAM, Brain)
`research enable`	Activate Folding@home contribution
`research disable`	Disable research (lock VRAM to inference)
`research status`	Check folding state + free VRAM
`exit`	Shutdown Peridot gracefully

All other input is treated as natural language and processed by the inference engine.

Configuration

Edit constitution.json to modify Peridot's permissions and behavior:

{
  "system_prompt": "You are Peridot, a sovereign AI assistant...",
  "allow_file_read": true,
  "allow_file_write": false,
  "allow_code_execute": false,
  "allow_web_fetch": true,
  "approved_domains": ["arxiv.org", "pubmed.ncbi.nlm.nih.gov"],
  "blocked_domains": []
}

Reset to defaults: Delete constitution.json and restart. Peridot regenerates it automatically.

`> ROADMAP`

[████████████████████] v1.0    Core Inference Engine (NVIDIA/Windows)
[████████████████████] v1.1    Performance Optimization (BETA)
[████████████████████] v1.2    Stability + VRAM Handoff + Medical Research
[████████████████████] v1.2.2  Security Hardening + Benchmarking
[████████████████████] v1.3    RAG Engine (Document Analysis)
[██░░░░░░░░░░░░░░░░░░] v1.4    Performance Optimisation (RAM, CPU & VRAM usage)
[░░░░░░░░░░░░░░░░░░░░] v1.4.3  TurboQuant Implementation.
[░░░░░░░░░░░░░░░░░░░░] v1.5    Linux Support (Ubuntu/Debian)
[░░░░░░░░░░░░░░░░░░░░] v1.6    AMD GPU Support (ROCm)
[░░░░░░░░░░░░░░░░░░░░] v1.7    macOS Support (Apple Silicon)
[░░░░░░░░░░░░░░░░░░░░] v2.0    WebUI (FastAPI + React)

Current Focus (v1.3):
RAG engine for local document analysis with FAISS vector storage and PyMuPDF ingestion pipeline.

`> PHILOSOPHY`

Peridot exists because the AI industry's default assumption is that your data belongs to them.

It does not.

Every design decision reflects a single principle: the user is sovereign.

That means:

No telemetry without explicit consent
No autonomous action without permission
No ethical guardrails that cannot be modified or removed by the person running the software

The constitution.json system ships with sensible defaults. You can make them stricter. You can make them looser. You can delete the file entirely.

That choice belongs to you, not the developer.

This is what AI should look like.

For full philosophical reasoning, see PHILOSOPHY.md.

`> LICENSE & DISCLAIMER`

License: MIT — free for personal and commercial use. Fork it, break it, build on it.

Disclaimer: Peridot is experimental software. The user assumes full responsibility for all commands executed, content generated, and hardware usage. Provided as-is, without warranty of any kind.

PERIDOT · SOVEREIGN AI KERNEL · v1.3 BETA

Engineered by uncoalesced

Your hardware. Your model. Your rules.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github		.github
assets		assets
benchmarking		benchmarking
config		config
core_system		core_system
diagnostics		diagnostics
docs		docs
modules		modules
scripts		scripts
tests		tests
utils		utils
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
COMMUNITY_INSTALL.md		COMMUNITY_INSTALL.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PHILOSOPHY.md		PHILOSOPHY.md
README.md		README.md
SECURITY.md		SECURITY.md
config.py		config.py
core.py		core.py
diagnostic.py		diagnostic.py
install.py		install.py
launcher.py		launcher.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
server.py		server.py
ui.py		ui.py

Folders and files

Latest commit

History

Repository files navigation

SOVEREIGN AI KERNEL — v1.3 BETA

> OVERVIEW

File Access Blacklist

Ephemeral API Authentication

Rate Limiting

Subprocess Command Whitelisting

> SECURITY

Input Sanitization

Constitution Validation

> PERFORMANCE

Inference Benchmarks

VRAM Handoff Benchmarks

> ARCHITECTURE

Core Architecture & Feature Matrix

[01] — Inference Engine

[02] — Sensory Subsystems

[03] — Permission Layer

[04] — Audit Log

[05] — Medical Research Module (Folding@Home Integration)

[06] — Terminal UI

> AUDITING & TESTING

Dedicated Security Logger

Automated Penetration Testing

Threat Model Documentation

> HARDWARE SUPPORT

> INSTALLATION

Prerequisites

Setup

> USAGE

Launch

Command Reference

Configuration

> ROADMAP

> PHILOSOPHY

> LICENSE & DISCLAIMER

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`SOVEREIGN AI KERNEL — v1.3 BETA`

`> OVERVIEW`

`> SECURITY`

`> PERFORMANCE`

`> ARCHITECTURE`

`[01] — Inference Engine`

`[02] — Sensory Subsystems`

`[03] — Permission Layer`

`[04] — Audit Log`

`[05] — Medical Research Module (Folding@Home Integration)`

`[06] — Terminal UI`

`> AUDITING & TESTING`

`> HARDWARE SUPPORT`

`> INSTALLATION`

`> USAGE`

`> ROADMAP`

`> PHILOSOPHY`

`> LICENSE & DISCLAIMER`

Packages