From 26c7d4f16a37138068c11f81a030615a32a7f48a Mon Sep 17 00:00:00 2001 From: GAP Promoter Date: Mon, 25 May 2026 22:20:00 +0000 Subject: [PATCH] Add fajarsajid/agent-redteam to the registry --- agents/fajarsajid__agent-redteam/README.md | 73 +++++++++++++++++++ .../fajarsajid__agent-redteam/metadata.json | 15 ++++ 2 files changed, 88 insertions(+) create mode 100644 agents/fajarsajid__agent-redteam/README.md create mode 100644 agents/fajarsajid__agent-redteam/metadata.json diff --git a/agents/fajarsajid__agent-redteam/README.md b/agents/fajarsajid__agent-redteam/README.md new file mode 100644 index 0000000..54f6ea5 --- /dev/null +++ b/agents/fajarsajid__agent-redteam/README.md @@ -0,0 +1,73 @@ +# agent-redteam + +> *Agentic LLM Red Team Harness — systematically probe AI agent system prompts for adversarial vulnerabilities.* + +**Author:** [Fajar Sajid](https://github.com/fajarsajid) · Purdue University +**Category:** Security +**Model:** Claude (Anthropic) + +--- + +## What It Does + +`agent-redteam` is a CLI evaluation tool that uses Claude as an adversarial engine to red-team any AI agent's system prompt. You provide a system prompt; the harness generates realistic attack probes across eight vulnerability categories and returns structured findings with CVSS-like scores, MITRE ATT&CK mappings, and actionable remediation advice. + +It's designed for security researchers, AI engineers, and teams that need to validate agent safety constraints before deployment. + +--- + +## Key Capabilities + +| Capability | Detail | +|---|---| +| **8 attack categories** | Prompt injection (direct + indirect), identity spoofing, credential exfiltration, privilege escalation, goal hijacking, data exfiltration, safety boundary bypass | +| **MITRE ATT&CK mapping** | Every finding linked to a MITRE technique ID | +| **CVSS-like scoring** | Float 0.0–10.0 per finding, severity bucket: critical / high / medium / low | +| **CI/CD integration** | Exits `1` on critical/high; exits `0` on pass — drop straight into a pipeline | +| **Dual output** | `--output report.md` (human-readable) + `--json findings.json` (SIEM/tooling) | +| **Zero extra deps** | Only `requests` required — supply-chain security by design | +| **Reproducible research** | All empirical results and experiment configs included in the repo | + +--- + +## Example Usage + +```bash +git clone https://github.com/fajarsajid/agent-redteam +cd agent-redteam +pip install requests +export ANTHROPIC_API_KEY=sk-ant-... + +# Full red team scan +python redteam.py --prompt examples/orderbot_prompt.txt \ + --output report.md --json findings.json + +# Quick scan (CI mode — exits 1 on critical/high) +python redteam.py --prompt examples/orderbot_prompt.txt --quiet + +# List all attack categories +python redteam.py --list-categories +``` + +--- + +## Research Findings + +This harness was used in a Purdue University study (2025) evaluating LLM agent safety: + +- **49.5%** mean violation rate across attack categories +- **Indirect injection** caused violations at 70.8% vs 54.2% for direct injection +- **Multi-turn interactions** increased violation rate from 45.8% (1-turn) to 77.1% (7-turn) +- **Context drift** is the most dangerous failure mode in production agentic systems + +--- + +## Compliance / Safety + +- `human_in_the_loop: destructive` — findings are advisory; human operators decide remediation +- `audit_logging: true` — all probe/finding data is structured for downstream audit +- Never autonomously modifies the target agent or its deployment + +--- + +*See the [full research paper](https://github.com/fajarsajid/agent-redteam/blob/main/paper.pdf) for methodology, results tables, and implications.* diff --git a/agents/fajarsajid__agent-redteam/metadata.json b/agents/fajarsajid__agent-redteam/metadata.json new file mode 100644 index 0000000..b33b04f --- /dev/null +++ b/agents/fajarsajid__agent-redteam/metadata.json @@ -0,0 +1,15 @@ +{ + "name": "agent-redteam", + "author": "fajarsajid", + "description": "CLI red team harness that probes AI agent system prompts for vulnerabilities: prompt injection, identity spoofing, credential exfiltration, and safety bypass — powered by Claude.", + "repository": "https://github.com/fajarsajid/agent-redteam", + "path": "", + "version": "1.0.0", + "category": "security", + "tags": ["red-team", "llm-security", "adversarial", "prompt-injection", "ai-safety", "claude", "vulnerability-assessment", "mitre-attack", "ci-cd", "research"], + "license": "MIT", + "model": "claude-sonnet-4-5-20250929", + "adapters": ["claude-code", "system-prompt"], + "icon": false, + "banner": false +}