Deploy intelligence. Open-source infrastructure for AI agents in production.
-
Updated
Mar 26, 2026
Deploy intelligence. Open-source infrastructure for AI agents in production.
Deploy AI agents to dedicated VMs in 90 seconds. Interactive TUI. Automatic DNS and TLS. You own the infrastructure.
Mechanistic analysis of a GPT-2–like model exploring the compositionality gap in transformers. Using Logit Lens and Causal Tracing, the study identifies and overcomes a deep-layer bottleneck via dataset enhancement addressing the stated Compositionality Gap (NeurIPS24).
A mock Azure OpenAI API for seamless testing and development, supporting both streaming and non-streaming responses. Easily emulate OpenAI completions with token-based streaming in a local or Dockerized environment.
DeepSeek-V3 Windows Deployment Fixer: Resolves CUDA_ERROR_OUT_OF_MEMORY, missing cublas64_12.dll, and Triton compiler errors. Optimized for RTX 30/40 series GPUs.
My Digital Twin Companion
Complete guide to deploying private, on-premise AI and LLMs: hardware selection, model comparison (ollama vs vLLM vs llama.cpp), security hardening, and AI governance policy templates. By Petronella Technology Group.
Comprehensive guide to FastAPI, Pydantic, and SQLAlchemy for AI engineers. Learn API design, validation, and ORM workflows with practical examples and setup 🐙
A minimal, high-performance starter kit for running AI model inference on NVIDIA GPUs using CUDA. Includes environment setup, sample kernels, and guidance for integrating ONNX/TensorRT pipelines for fast, optimized inference on modern GPU hardware.
Production-oriented GPU inference stack with FastAPI, Docker, Redis, Prometheus, and Grafana for AI workload serving
A Streamlit-based spam classifier that predicts whether a message is spam or not spam using machine learning.
Skeleton Slack bot that lets developers trigger deployments, rollbacks, and status checks via simple chat commands.
Deployment of a self-hosted LLM infrastructure using Ollama and Open WebUI on Linux, including custom model creation, API integration, and system-level troubleshooting.
Multi-agentic researcher (RAG)
Full-stack web application (React + Flask) for Multimodal Video Captioning. Deploys the MixCap model (BLIP-2 + Wav2Vec2) to generate video descriptions for end-users.
End-to-end pipeline for deploying deep learning models on edge devices: model conversion, quantization, hardware acceleration, and Android integration.
🧠 Инфраструктура для деплоймента Zeroclaw, докерезированная и совместимая с Portainer
🛠 Fix DeepSeek-V3 Windows setup issues by resolving CUDA memory errors, missing DLLs, and PyTorch-CUDA conflicts for smooth local deployment.
Compare PyTorch vs Triton inference latency with CLI tools, benchmarks, and performance plots.
Add a description, image, and links to the ai-deployment topic page so that developers can more easily learn about it.
To associate your repository with the ai-deployment topic, visit your repo's landing page and select "manage topics."