Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions agents/fajarsajid__agent-redteam/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# agent-redteam

> *Agentic LLM Red Team Harness — systematically probe AI agent system prompts for adversarial vulnerabilities.*

**Author:** [Fajar Sajid](https://github.com/fajarsajid) · Purdue University
**Category:** Security
**Model:** Claude (Anthropic)

---

## What It Does

`agent-redteam` is a CLI evaluation tool that uses Claude as an adversarial engine to red-team any AI agent's system prompt. You provide a system prompt; the harness generates realistic attack probes across eight vulnerability categories and returns structured findings with CVSS-like scores, MITRE ATT&CK mappings, and actionable remediation advice.

It's designed for security researchers, AI engineers, and teams that need to validate agent safety constraints before deployment.

---

## Key Capabilities

| Capability | Detail |
|---|---|
| **8 attack categories** | Prompt injection (direct + indirect), identity spoofing, credential exfiltration, privilege escalation, goal hijacking, data exfiltration, safety boundary bypass |
| **MITRE ATT&CK mapping** | Every finding linked to a MITRE technique ID |
| **CVSS-like scoring** | Float 0.0–10.0 per finding, severity bucket: critical / high / medium / low |
| **CI/CD integration** | Exits `1` on critical/high; exits `0` on pass — drop straight into a pipeline |
| **Dual output** | `--output report.md` (human-readable) + `--json findings.json` (SIEM/tooling) |
| **Zero extra deps** | Only `requests` required — supply-chain security by design |
| **Reproducible research** | All empirical results and experiment configs included in the repo |

---

## Example Usage

```bash
git clone https://github.com/fajarsajid/agent-redteam
cd agent-redteam
pip install requests
export ANTHROPIC_API_KEY=sk-ant-...

# Full red team scan
python redteam.py --prompt examples/orderbot_prompt.txt \
--output report.md --json findings.json

# Quick scan (CI mode — exits 1 on critical/high)
python redteam.py --prompt examples/orderbot_prompt.txt --quiet

# List all attack categories
python redteam.py --list-categories
```

---

## Research Findings

This harness was used in a Purdue University study (2025) evaluating LLM agent safety:

- **49.5%** mean violation rate across attack categories
- **Indirect injection** caused violations at 70.8% vs 54.2% for direct injection
- **Multi-turn interactions** increased violation rate from 45.8% (1-turn) to 77.1% (7-turn)
- **Context drift** is the most dangerous failure mode in production agentic systems

---

## Compliance / Safety

- `human_in_the_loop: destructive` — findings are advisory; human operators decide remediation
- `audit_logging: true` — all probe/finding data is structured for downstream audit
- Never autonomously modifies the target agent or its deployment

---

*See the [full research paper](https://github.com/fajarsajid/agent-redteam/blob/main/paper.pdf) for methodology, results tables, and implications.*
15 changes: 15 additions & 0 deletions agents/fajarsajid__agent-redteam/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"name": "agent-redteam",
"author": "fajarsajid",
"description": "CLI red team harness that probes AI agent system prompts for vulnerabilities: prompt injection, identity spoofing, credential exfiltration, and safety bypass — powered by Claude.",
"repository": "https://github.com/fajarsajid/agent-redteam",
"path": "",
"version": "1.0.0",
"category": "security",
"tags": ["red-team", "llm-security", "adversarial", "prompt-injection", "ai-safety", "claude", "vulnerability-assessment", "mitre-attack", "ci-cd", "research"],
"license": "MIT",
"model": "claude-sonnet-4-5-20250929",
"adapters": ["claude-code", "system-prompt"],
"icon": false,
"banner": false
}
Loading