agent-safety

Star

Here are 213 public repositories matching this topic...

wuyoscar / ISC-Bench

Star

Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.

benchmark jailbreak ai-safety red-teaming large-language-models llm-safety safety-evaluation agent-safety

Updated May 14, 2026
Python

SponsioLabs / Sponsio

Star

Deterministic safety solutions for probabilistic AI agents

Updated May 22, 2026
Python

XSafeAI / XSafeClaw

Star

Introducing XSafeClaw: The Open-Source Agent Safety Platform from Fudan University

ai-safety red-teaming prompt-injection llm-security agentic-ai agent-safety openclaw safe-claw

Updated May 21, 2026
Python

Hyperion-GPU / ProofFlow-v0.1

Star

mcp audit developer-tools code-review codex ai-agents agent-safety

Updated May 20, 2026
Python

kajogo777 / the-agent-sandbox-taxonomy

Star

An open taxonomy and scoring framework for evaluating AI agent sandboxes: 7 defense layers, 7 threat categories, 3 evaluation dimensions, 27 "sandboxes" scored.

security devops taxonomy sandbox threat-modeling ai-agents container-security microvm defense-in-depth infrastructure-security llm-agents agent-safety scoring-framework compute-isolation

Updated May 19, 2026
Go

Ethicore Engine™ is an AI safety, ethics, and compliance platform. This repo consists of the open-source components of Ethicore Engine™ - Guardian SDK; designed to protect your AI applications from prompt injection, jailbreaks, role hijacking, system-prompt extraction, and 100+ additional threat categories through a multi-layer analysis pipeline

multilingual api detection ai-safety threat-intelligence ai-agents adversarial-machine-learning continuous-learning multimodal ai-security pypi-package proactive-security llm-security agent-security agent-safety agentic-loop closed-loop-learning

Updated May 21, 2026
Python

CodeAlive-AI / ai-driven-development

Star

Practices, protocols, and skills for AI-driven software development. 18 skills + 1 Bash safety hook for Claude Code, Codex CLI, OpenCode, Cursor, Gemini CLI, Antigravity, and any agent supporting the Agent Skills standard.

Updated May 12, 2026
Python

bridge-mind / BridgeWard

Star

Trust nothing. Ship safely. — Skeptical-reading and prompt-injection defense skill for AI agents. Provenance tagging, red-flag patterns, refusal templates, and a read-only injection auditor. MIT.

plugin skill mcp ai-agents ai-security prompt-injection llm-security vibe-coding claude-code mcp-security agent-safety claude-code-skill bridgemind

Updated Apr 30, 2026
Shell

corv89 / shannot

Star

Human-in-the-loop execution for LLM agents

python linux cli security devops automation mcp sandbox sysadmin python3 developer-tools human-in-the-loop llm llm-agents agent-safety supervised-execution

Updated Apr 13, 2026
Python

ArmorerLabs / Armorer-Guard

Star

Fast local Rust scanner for AI-agent prompt injection, credential leaks, exfiltration, and risky tool calls

rust mcp cybersecurity security-scanner ai-agents ai-security guardrails secrets-detection local-first prompt-injection llm-security tool-calling ai-security-tool agent-security mcp-security agent-safety vulnify

Updated May 15, 2026
Rust

norika1207-lab / afu-brain

Star

OpenClaw-compatible MASL safety gate with public RAG packs for memory-aware AI agents

open-source ai-agents rag local-first personal-ai masl agent-memory memory-agents agent-safety openclaw

Updated May 8, 2026
Python

fpytloun / intaris

Star

Guardrails service for AI agents. Default-deny tool call evaluation with LLM safety analysis, priority-ordered decision matrix, and human-in-the-loop escalations. Session recording, behavioral analysis, MCP proxy, secret redaction, and real-time audit.

automation mcp human-in-the-loop policy-engine ai-agents guardrails tool-use agent-safety