Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.
-
Updated
May 14, 2026 - Python
Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.
Deterministic safety solutions for probabilistic AI agents
Introducing XSafeClaw: The Open-Source Agent Safety Platform from Fudan University
An open taxonomy and scoring framework for evaluating AI agent sandboxes: 7 defense layers, 7 threat categories, 3 evaluation dimensions, 27 "sandboxes" scored.
Ethicore Engine™ is an AI safety, ethics, and compliance platform. This repo consists of the open-source components of Ethicore Engine™ - Guardian SDK; designed to protect your AI applications from prompt injection, jailbreaks, role hijacking, system-prompt extraction, and 100+ additional threat categories through a multi-layer analysis pipeline
Practices, protocols, and skills for AI-driven software development. 18 skills + 1 Bash safety hook for Claude Code, Codex CLI, OpenCode, Cursor, Gemini CLI, Antigravity, and any agent supporting the Agent Skills standard.
Trust nothing. Ship safely. — Skeptical-reading and prompt-injection defense skill for AI agents. Provenance tagging, red-flag patterns, refusal templates, and a read-only injection auditor. MIT.
Human-in-the-loop execution for LLM agents
Fast local Rust scanner for AI-agent prompt injection, credential leaks, exfiltration, and risky tool calls
OpenClaw-compatible MASL safety gate with public RAG packs for memory-aware AI agents
Guardrails service for AI agents. Default-deny tool call evaluation with LLM safety analysis, priority-ordered decision matrix, and human-in-the-loop escalations. Session recording, behavioral analysis, MCP proxy, secret redaction, and real-time audit.
Claude Code agent-in-container orchestration and automation
Security scanner for AI agent tool definitions
The open-source safety layer for AI agents — block unsafe tool calls, require approval, enforce budgets, audit, replay.
Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1 on 5,391 trajectories).
🛡️ Safe AI Agents through Action Classifier
Open Threat Classification (OTC) — 10 threat patterns for AI agent skills, MCP servers, and plugins. CC-BY-4.0.
Deterministic execution authorization for AI agents
tethered — Runtime network egress control for Python. One function call to restrict which hosts your code can connect to.
Add a description, image, and links to the agent-safety topic page so that developers can more easily learn about it.
To associate your repository with the agent-safety topic, visit your repo's landing page and select "manage topics."