ai-quality

AI model health monitor for LLM apps – runtime checks for drift, hallucination risk, latency, and JSON/format quality on any OpenAI, Anthropic, or local client.

ai runtime-metrics ai-safety drift-detection llm ai-quality ai-code-review hallucination-detection llm-monitoring ai-observalibility

Updated Mar 30, 2026
TypeScript

Ryo-Hunter / suzaku

Star

朱雀 Suzaku — AI 生成品質模組。諂媚抑制、建設性挑戰、輸出適配、上下文錨定、一致性守護。基於 LDRIT 設計。

ai-safety claude ai-assistant ai-tools ai-agent llm prompt-engineering ai-quality claude-code anti-sycophancy ldrit

Updated Apr 11, 2026

ivycheck / ivycheck-python-sdk

Star

Python SDK for IvyCheck

ai gpt ai-security generative-ai ai-quality generative-ai-security-assurance

Updated Apr 17, 2024
Jupyter Notebook

HubWizard / second-pass

Star

Universal skill enhancement layer for Claude Code. Sees what your skill was trying to do, grades the gap, drives the rewrite.

ai-quality anthropic skill-enhancement claude-code claude-skills meta-skill

Updated Apr 29, 2026

syncreus / syncreus-eval

Star

Evaluate your LLM apps with one function call. Hallucination detection, RAG scoring, and agent evals for OpenAI, Anthropic, and more. 14 evaluators, pytest plugin, composite trust scores.

Updated Apr 3, 2026
Python

jkorzeniowski / safeagentguard

Star

Open-source AI agent security testing framework. Test for prompt injection, data leakage, and privilege escalation before production.

python ai-safety ai-agents security-testing red-teaming ai-security ai-quality prompt-injection llm-testing

Updated Feb 23, 2026
Python

nshkrdotcom / Assessor

Sponsor

Star

The definitive CI/CD platform for AI Quality.

testing elixir otp ai functional-programming continuous-integration beam erlang-vm quality-assurance cicd ai-testing ai-quality ml-quality quality-platform nshkr-archive

Updated Apr 9, 2026
Elixir

Rofi7777 / ratchet-review

Star

A 5-layer adversarial quality gate for Claude Code. Catches factual errors, score inflation, and buried conclusions before your AI output ships.

quality-assurance ratchet ai-quality llm-as-judge claude-code claude-code-skill adversarial-review ai-hallucination llm-quality-gate ai-output-review

Updated Apr 9, 2026
Shell

josephsenior / agent-evaluation-platform

Star

🚀 Professional-grade AI Agent Evaluation Platform. Multi-provider LLM-as-a-Judge (OpenAI, Anthropic, Gemini), automated testing, A/B benchmarking, and safety auditing.

Updated Dec 26, 2025
Python

moranbickel / russian-judge

Star

Adversarial AI review with structured verdicts — C/I/M defect taxonomy, numerical pass floor, single- and cross-model audit modes.

multi-agent code-review ai-agents legal-ai ai-evaluation prompt-engineering ai-quality llm-evaluation claude-code adversarial-review cross-model-audit

Updated May 20, 2026

adrianlol7 / evaldriven.org

Star

Define, measure, and enforce code correctness with Eval-Driven Development, ensuring every probabilistic system ships with automated proof of quality.

testing devops benchmarking machine-learning automation best-practices evaluation manifesto software-engineering methodology quality-assurance ai-safety continuous-evaluation ai-engineering ai-evaluation ai-testing ai-quality llm-evaluation eval-driven-development

Updated May 21, 2026
Nunjucks

kothakota-bindu / finsight-ai-testing

Star

Production-grade LLM evaluation pipeline for RAG chatbot — DeepEval + RAGAS + Garak + CI/CD | Financial domain | 7 metrics | Adversarial testing

python pytest fintech llama rag github-actions groq langchain ai-quality llm-evaluation ragas llm-testing deepeval garak

Updated May 6, 2026
Python

TeamSPWK / nova

Star

AI Agent Ops framework for Claude Code — independent evaluator, adversarial review, and pre-commit quality gate for AI-generated code.

developer-tools code-review ai-agents llm prompt-engineering ai-quality anthropic mcp-server agent-ops claude-code claude-code-plugin harness-engineering

Updated May 21, 2026
Shell

VictorVVedtion / tcell

Star

A cognitive immune system for AI agents. Self-evolving critics detect thinking pattern biases through context isolation. Inspired by Karpathy's autoresearch.

python open-source ai-safety cognitive-bias context-isolation ai-agent ai-monitoring ai-testing ai-quality llm-evaluation self-evolving claude-code autoresearch critic-evolution

Updated Mar 28, 2026
Python

Improve this page

Add a description, image, and links to the ai-quality topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-quality topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-quality

Here are 43 public repositories matching this topic...

Giskard-AI / awesome-ai-safety

greynewell / evaldriven.org

greynewell / matchspec

vishwanathakuthota / openvals

converra / agent-triage

DUBSOpenHub / shadow-score-spec

subodhkc / llmverify-npm

Ryo-Hunter / suzaku

ivycheck / ivycheck-python-sdk

HubWizard / second-pass

syncreus / syncreus-eval

jkorzeniowski / safeagentguard

nshkrdotcom / Assessor

Rofi7777 / ratchet-review

josephsenior / agent-evaluation-platform

moranbickel / russian-judge

adrianlol7 / evaldriven.org

kothakota-bindu / finsight-ai-testing

TeamSPWK / nova

VictorVVedtion / tcell

Improve this page

Add this topic to your repo