Self-improving agent framework powered by LangChain and LangGraph.
Inspired by HyperAgents (Meta Research, 2026) -- ported to TypeScript with a generic, pluggable architecture.
HyperAgents runs an evolutionary self-improvement loop where a MetaAgent rewrites a TaskAgent's code to make it better at solving tasks. Each generation:
- Select a parent agent from the archive
- MetaAgent reads past evaluation scores and edits the source code
- The modified TaskAgent is evaluated on domain tasks
- Score + code diff are saved to the archive
- Repeat
The TaskAgent gets better over generations without manual intervention.
New here? Read docs/concepts.md for a detailed explanation of every concept with examples. For a browsable site with workflow diagrams, run
cd docs_site && pnpm install && pnpm docs:dev(orpnpm docs:devfrom the repo root).
npm install @lablnet/hyperagentsgit clone https://github.com/Framework-Island/hyperagents.git
cd hyperagents
pnpm install
cp .env.example .env # Set your OPENAI_API_KEY
pnpm demo:scoring # Watch the score go from 0.42 to 1.00docs/
└── concepts.md Detailed concepts guide (archive, strategies, self-modification, etc.)
src/
├── agent/ Agents
│ ├── base_agent.ts Abstract base class
│ ├── llm.ts Multi-provider LLM factory (OpenAI, Anthropic, Gemini, Ollama)
│ ├── llm_with_tools.ts LangGraph ReAct agentic loop
│ ├── meta_agent.ts Modifies code to improve the TaskAgent
│ ├── task_agent.ts Solves domain tasks
│ └── tool_registry.ts Generic tool registry
├── prompts/ Prompt templates (separated from logic)
│ ├── task_agent.ts TaskAgent instruction prompt
│ ├── meta_agent.ts MetaAgent improvement prompt
│ └── llm_judge.ts LLM judge scoring prompt
├── tools/ Framework tools (used by MetaAgent)
│ ├── bash.ts Shell command execution
│ └── editor.ts File viewing and editing
├── core/ Evolutionary loop
│ ├── generate_loop.ts Self-improvement loop
│ ├── select_parent.ts Parent selection strategies
│ └── ensemble.ts Best-of-archive ensemble
├── domains/ Evaluation framework
│ ├── base.ts Domain interface
│ ├── harness.ts Generic evaluation harness
│ ├── report.ts Score reporting
│ └── evaluators.ts Pluggable evaluators (static, LLM judge, human feedback)
└── utils/ Infrastructure
├── archive.ts JSONL archive management
├── executor.ts Local + Docker execution
├── docker.ts Docker container management
├── git.ts Git diff/patch operations
└── common.ts Shared utilities
| TaskAgent | MetaAgent | |
|---|---|---|
| Role | Solves tasks | Rewrites the TaskAgent's code |
| Input | A task description | Repo path + past eval scores |
| Output | A prediction | Modified source code on disk |
| Tools | Domain-specific (optional) | bash + editor (built-in) |
import { staticEvaluator, llmJudgeEvaluator, humanFeedbackEvaluator } from "@lablnet/hyperagents";
// 1. Static: exact string match (free, for tasks with one right answer)
staticEvaluator("42", "42") // => 1.0
// 2. LLM Judge: ask an LLM to score (for subjective tasks)
await llmJudgeEvaluator(prediction, {
description: "Generate tasks from this email",
rubric: "Score based on relevance and actionability",
}) // => 0.85
// 3. Human Feedback: pass in user ratings (for production apps)
humanFeedbackEvaluator(4 / 5) // => 0.8The archive stores every agent generation. Parent selection picks which ancestor to improve next (not necessarily the previous one -- it picks from all valid generations):
random-- any valid parentlatest-- most recent generationbest-- highest scoringscore_prop-- probability proportional to scorescore_child_prop-- score-weighted, penalizes over-explored parents (default)
The loop also includes early termination: if the best score in the archive reaches 1.0 (100%), the loop stops automatically to avoid wasting compute.
Both agents can load prompts from editable files instead of hardcoded defaults. This enables the MetaAgent to modify its own instructions across generations:
// Per-agent prompt file
const metaAgent = new MetaAgent({ model, promptFile: "./prompts/meta_agent.txt" });
// Or auto-scaffold via the generate loop
const config: GenerateLoopConfig = {
// ...
promptsDir: "./prompts", // creates meta_agent.txt + task_agent.txt
};When promptsDir is set, the MetaAgent can edit meta_agent.txt to improve how it approaches future generations — the improver improves itself.
See docs/concepts.md for full details.
- Local (default): runs in a temp directory, fast for development
- Docker: container per generation, safe for untrusted LLM-generated code
pnpm demo:scoringA math grading domain where the TaskAgent starts with a bad prompt (strict string matching). The MetaAgent reads the failures and rewrites the prompt to handle mathematical equivalence. Score jumps from 0.42 to 1.00 in one generation.
pnpm example:bash # single evaluation
npx tsx examples/bash/run.ts evolve # evolutionary loopTaskAgent generates bash commands from descriptions. Supports both single eval and full evolutionary self-improvement.
pnpm example:calculatorThe TaskAgent has a deliberately buggy calculator tool (only supports +, -, *, /). The MetaAgent reads the failures and edits calc_tool.ts to add missing operations (power, modulo, sqrt, abs).
npx tsx examples/factcheck/run.ts # single evaluation
npx tsx examples/factcheck/run.ts evolve # evolutionary loopTaskAgent classifies statements as true/false. Includes tricky common myths (e.g., "The Great Wall is visible from space"). Uses runGenerateLoop for full evolutionary self-improvement.
pnpm example:paper-reviewTaskAgent predicts accept/reject for research papers.
Implement the Domain interface:
import type { Domain, DomainConfig, DomainTask, EvalResult, ReportSummary } from "@lablnet/hyperagents";
class MyDomain implements Domain {
config: DomainConfig = {
name: "my_domain",
evalSubsets: ["train"],
splits: ["train"],
stagedEvalSamples: 5,
scoreKey: "accuracy",
};
async loadTasks(subset: string, numSamples?: number): Promise<DomainTask[]> {
// Load from JSON, database, API, etc.
}
async evaluate(prediction: string, task: DomainTask): Promise<number> {
// Use staticEvaluator, llmJudgeEvaluator, or humanFeedbackEvaluator
}
formatInput(task: DomainTask): string {
// Format the task as a prompt for the TaskAgent
}
async report(results: EvalResult[]): Promise<ReportSummary> {
// Aggregate scores
}
}import { createLLM } from "@lablnet/hyperagents";
createLLM({ model: "openai/gpt-4o" })
createLLM({ model: "anthropic/claude-sonnet-4-5-20250929" })
createLLM({ model: "gemini/gemini-2.5-pro" })
createLLM({ model: "ollama/llama3" }) // free, runs locallyBuild and run without installing anything locally (except Docker):
# Build the image
docker build -t hyperagents .
# Run the scoring demo
docker run --rm -e OPENAI_API_KEY=sk-... hyperagents examples/scoring/run.ts
# Run the bash example
docker run --rm -e OPENAI_API_KEY=sk-... hyperagents examples/bash/run.ts
# Run the evolutionary loop
docker run --rm -e OPENAI_API_KEY=sk-... hyperagents examples/bash/run.ts evolve
# Use a different model
docker run --rm \
-e OPENAI_API_KEY=sk-... \
-e HYPERAGENTS_MODEL=openai/gpt-4o-mini \
hyperagents examples/scoring/run.ts
# Mount a volume to persist outputs
docker run --rm \
-e OPENAI_API_KEY=sk-... \
-v $(pwd)/outputs:/hyperagents/outputs \
hyperagents examples/scoring/run.tsFor Anthropic or Gemini models, pass the corresponding API key:
docker run --rm \
-e ANTHROPIC_API_KEY=sk-ant-... \
-e HYPERAGENTS_MODEL=anthropic/claude-sonnet-4-5-20250929 \
hyperagents examples/scoring/run.ts- HyperAgents -- Self-referential self-improving agents (Meta Research, 2026)
- LangChain -- LLM framework
- LangGraph -- Agentic state machines
MIT
