Protect every action your agent takes.
-
Updated
Mar 27, 2026 - TypeScript
Protect every action your agent takes.
Review and moderation, your way. Online safety dashboard, queues, routing and automatic enforcement rules, and integrations.
An intelligent task management assistant built with .NET, Next.js, Microsoft Agent Framework, AG-UI protocol, and Azure OpenAI, demonstrating Clean Architecture and autonomous AI agent capabilities
NudeDetect is a Python-based tool for detecting nudity and adult content in images. This project combines the capabilities of the NudeNet library, EasyOCR for text detection, and the Better Profanity library for identifying offensive language in text.
A JavaScript-based content safety system designed to detect and filter sensitive media in real-time, ensuring platform compliance and user protection.
Step-by-Step tutorial that teaches you how to use Azure Safety Content - the prebuilt AI service that helps ensure that content sent to user is filtered to safeguard them from risky or undesirable outcomes
Transform uncertainty into absolute confidence.
🔍 Benchmark jailbreak resilience in LLMs with JailBench for clear insights and improved model defenses against jailbreak attempts.
│ Real-time NSFW & harmful content detection as a service
Benchmark LLM jailbreak resilience across providers with standardized tests, adversarial mode, rich analytics, and a clean Web UI.
AI application firewall for LLM-powered apps — multi-layered detection (heuristic, ML classifier, semantic, LLM-judge) against prompt injection, jailbreaks, and data leakage
Production-Grade LLM Alignment Engine (TruthProbe + ADT)
Technical presentations with hands-on demos
A Chrome extension that uses Claude AI to protect users under 18 from inappropriate content by analyzing webpage content in real-time.
The open-source safety stack for AI agents. Policy engine, content scanner, approval workflows, audit trails. 924+ tests. MIT licensed.
Content moderation (text and image) in a social network demo
Responsible AI toolkit for LLM applications: PII/PHI redaction, prompt injection detection, bias scoring, content safety filters, and output validation. Framework-agnostic Python library with FastAPI demo.
Pre-Publish Security Gate - Scan and redact sensitive information before sharing
Study Buddy is a user-friendly AI-powered web app that helps students generate safe, factual study notes and Q&A on any topic. It features user accounts, study history, and strong content safety filters—making learning interactive and secure.
Content safety evaluator built on OpenAI's gpt-oss-safeguard-20b — zero dependencies, streamed verdicts, editable policies
Add a description, image, and links to the content-safety topic page so that developers can more easily learn about it.
To associate your repository with the content-safety topic, visit your repo's landing page and select "manage topics."