The Self-Evolving Agent Ecosystem — Trading agents that evolve through Darwinian selection and adversarial self-play
-
Updated
Apr 13, 2026 - Python
The Self-Evolving Agent Ecosystem — Trading agents that evolve through Darwinian selection and adversarial self-play
AI Robustness Evaluation System
Open-source AI agent red-team engine, SDK, and CLI. Run offline or against the Humanbound Platform.
Open-source framework for building and testing LLM-powered applications: IRIS (single-agent orchestration), AETHER (declarative multi-agent systems), and AEGIS (adversarial security testing). Developed at MSU Denver's Community-Centered Computing (C3) Lab.
Red-team your AI agents from any coding IDE. Adversarial security testing skills for Claude Code, Cursor, Codex, and 40+ agents.
MCP server that wraps the xAI Grok CLI. Lets Claude Code, Cursor, Cline, and any MCP host use Grok as a peer code reviewer, adversary, and second-opinion consultant.
Elenchus MCP Server - Adversarial verification system for code review
A marketplace of Claude Code plugins for adversarial security and architectural code review.
Claude Code skill that stress-tests startup ideas with adversarial AI agents — 68 animals, elimination rounds, blind scoring. Your idea either survives or you get 3 pivots
Mechanism-grounded taxonomy of 40 LLM jailbreak patterns across 10 categories. 8,000-trial bootstrap evaluation for the June 2026 frontier (Claude Opus 4-8, GPT-5.5, Gemini 3.5, DeepSeek V4). Every citation direct-WebFetch verified; refuted claims documented.
AI safety evaluation framework testing LLM epistemic robustness under adversarial self-history manipulation
Adversarial testing of LLMs on constraint satisfaction deadlocks
Context engineering toolkit for LLMs — pack, cache, debug, red-team, and orchestrate context windows. Council of Experts, adversarial testing, immune system, context compiler, drift detection, multi-agent entanglement. TypeScript + Python.
API for generating LLM bot/agent personalities based on the Big Five personality model.
Benchmark LLM jailbreak resilience across providers with standardized tests, adversarial mode, rich analytics, and a clean Web UI.
Agent-driven adversarial paper audit framework
Multi-perspective code review council for Claude Code. 3 advisors by default, 10 agents in deep mode (Opus + Codex). Evidence chains, adversarial self-test, dual-path verdict. Based on Karpathy's LLM Council.
Cross-model orchestration for Claude Code — Claude builds, Codex validates. Blind TDD, adversarial stress testing, mixed-model teams, and automatic fallback. Two AI models enter, better code leaves.
CLI for Audn.ai — CI/CD security gate and developer workflows for AI agent red-teaming
Add a description, image, and links to the adversarial-testing topic page so that developers can more easily learn about it.
To associate your repository with the adversarial-testing topic, visit your repo's landing page and select "manage topics."