Human-in-the-loop adversarial workflows for high-stakes research audit: from ChatGPT-Gemini duels to 4-model MAD.
-
Updated
May 29, 2026
Human-in-the-loop adversarial workflows for high-stakes research audit: from ChatGPT-Gemini duels to 4-model MAD.
RevealVLLMSafetyEval is a comprehensive pipeline for evaluating Vision-Language Models (VLMs) on their compliance with harm-related policies. It automates the creation of adversarial multi-turn datasets and the evaluation of model responses, supporting responsible AI development and red-teaming efforts.
Claude Code plugin implementing Anthropic's 3-agent harness (Planner, Generator, Evaluator) for long-running app development with pluggable rubrics and adversarial evaluation
Scientific QA robustness evaluation pipeline for evidence-missing RAG scenarios on PeerQA, with EM/F1 reliability analysis.
Multi-agent deep research engine with SIA (Semantic Intelligence Architecture) — thermodynamic entropy control, adversarial critique, multi-reactor swarm orchestration
GuardMCP - Deterministic Runtime Semantic Enforcement for Agentic Tool Execution using Directional Intent–Action Alignment
Three Claude Code skills for working with Codex CLI: codex-bridge (one-shot Codex calls), mad-build (Claude+Codex collaboration with cross-review), and mad-research (three-stream adversarial audit of papers, grants, reports with anonymized cross-critique and fresh-Codex synthesis).
Add a description, image, and links to the adversarial-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the adversarial-evaluation topic, visit your repo's landing page and select "manage topics."