HyperAgents

Self-improving agent framework powered by LangChain and LangGraph.

Inspired by HyperAgents (Meta Research, 2026) -- ported to TypeScript with a generic, pluggable architecture.

What it does

HyperAgents runs an evolutionary self-improvement loop where a MetaAgent rewrites a TaskAgent's code to make it better at solving tasks. Each generation:

Select a parent agent from the archive
MetaAgent reads past evaluation scores and edits the source code
The modified TaskAgent is evaluated on domain tasks
Score + code diff are saved to the archive
Repeat

The TaskAgent gets better over generations without manual intervention.

New here? Read docs/concepts.md for a detailed explanation of every concept with examples. For a browsable site with workflow diagrams, run cd docs_site && pnpm install && pnpm docs:dev (or pnpm docs:dev from the repo root).

Quick start

As an npm package

npm install @lablnet/hyperagents

From source

git clone https://github.com/Framework-Island/hyperagents.git
cd hyperagents
pnpm install
cp .env.example .env   # Set your OPENAI_API_KEY
pnpm demo:scoring      # Watch the score go from 0.42 to 1.00

Architecture

docs/
└── concepts.md           Detailed concepts guide (archive, strategies, self-modification, etc.)
src/
├── agent/              Agents
│   ├── base_agent.ts     Abstract base class
│   ├── llm.ts            Multi-provider LLM factory (OpenAI, Anthropic, Gemini, Ollama)
│   ├── llm_with_tools.ts LangGraph ReAct agentic loop
│   ├── meta_agent.ts     Modifies code to improve the TaskAgent
│   ├── task_agent.ts     Solves domain tasks
│   └── tool_registry.ts  Generic tool registry
├── prompts/            Prompt templates (separated from logic)
│   ├── task_agent.ts     TaskAgent instruction prompt
│   ├── meta_agent.ts     MetaAgent improvement prompt
│   └── llm_judge.ts      LLM judge scoring prompt
├── tools/              Framework tools (used by MetaAgent)
│   ├── bash.ts           Shell command execution
│   └── editor.ts         File viewing and editing
├── core/               Evolutionary loop
│   ├── generate_loop.ts  Self-improvement loop
│   ├── select_parent.ts  Parent selection strategies
│   └── ensemble.ts       Best-of-archive ensemble
├── domains/            Evaluation framework
│   ├── base.ts           Domain interface
│   ├── harness.ts        Generic evaluation harness
│   ├── report.ts         Score reporting
│   └── evaluators.ts     Pluggable evaluators (static, LLM judge, human feedback)
└── utils/              Infrastructure
    ├── archive.ts        JSONL archive management
    ├── executor.ts       Local + Docker execution
    ├── docker.ts         Docker container management
    ├── git.ts            Git diff/patch operations
    └── common.ts         Shared utilities

Key concepts

TaskAgent vs MetaAgent

	TaskAgent	MetaAgent
Role	Solves tasks	Rewrites the TaskAgent's code
Input	A task description	Repo path + past eval scores
Output	A prediction	Modified source code on disk
Tools	Domain-specific (optional)	bash + editor (built-in)

Three evaluator strategies

import { staticEvaluator, llmJudgeEvaluator, humanFeedbackEvaluator } from "@lablnet/hyperagents";

// 1. Static: exact string match (free, for tasks with one right answer)
staticEvaluator("42", "42") // => 1.0

// 2. LLM Judge: ask an LLM to score (for subjective tasks)
await llmJudgeEvaluator(prediction, {
  description: "Generate tasks from this email",
  rubric: "Score based on relevance and actionability",
}) // => 0.85

// 3. Human Feedback: pass in user ratings (for production apps)
humanFeedbackEvaluator(4 / 5) // => 0.8

Parent selection strategies

The archive stores every agent generation. Parent selection picks which ancestor to improve next (not necessarily the previous one -- it picks from all valid generations):

random -- any valid parent
latest -- most recent generation
best -- highest scoring
score_prop -- probability proportional to score
score_child_prop -- score-weighted, penalizes over-explored parents (default)

The loop also includes early termination: if the best score in the archive reaches 1.0 (100%), the loop stops automatically to avoid wasting compute.

Self-referential improvement (prompt files)

Both agents can load prompts from editable files instead of hardcoded defaults. This enables the MetaAgent to modify its own instructions across generations:

// Per-agent prompt file
const metaAgent = new MetaAgent({ model, promptFile: "./prompts/meta_agent.txt" });

// Or auto-scaffold via the generate loop
const config: GenerateLoopConfig = {
  // ...
  promptsDir: "./prompts",  // creates meta_agent.txt + task_agent.txt
};

When promptsDir is set, the MetaAgent can edit meta_agent.txt to improve how it approaches future generations — the improver improves itself.

See docs/concepts.md for full details.

Execution modes

Local (default): runs in a temp directory, fast for development
Docker: container per generation, safe for untrusted LLM-generated code

Examples

Scoring demo (self-improvement in action)

pnpm demo:scoring

A math grading domain where the TaskAgent starts with a bad prompt (strict string matching). The MetaAgent reads the failures and rewrites the prompt to handle mathematical equivalence. Score jumps from 0.42 to 1.00 in one generation.

Bash scripting

pnpm example:bash          # single evaluation
npx tsx examples/bash/run.ts evolve  # evolutionary loop

TaskAgent generates bash commands from descriptions. Supports both single eval and full evolutionary self-improvement.

Calculator (tool improvement)

pnpm example:calculator

The TaskAgent has a deliberately buggy calculator tool (only supports +, -, *, /). The MetaAgent reads the failures and edits calc_tool.ts to add missing operations (power, modulo, sqrt, abs).

Fact-check

npx tsx examples/factcheck/run.ts          # single evaluation
npx tsx examples/factcheck/run.ts evolve   # evolutionary loop

TaskAgent classifies statements as true/false. Includes tricky common myths (e.g., "The Great Wall is visible from space"). Uses runGenerateLoop for full evolutionary self-improvement.

Paper review

pnpm example:paper-review

TaskAgent predicts accept/reject for research papers.

Creating your own domain

Implement the Domain interface:

import type { Domain, DomainConfig, DomainTask, EvalResult, ReportSummary } from "@lablnet/hyperagents";

class MyDomain implements Domain {
  config: DomainConfig = {
    name: "my_domain",
    evalSubsets: ["train"],
    splits: ["train"],
    stagedEvalSamples: 5,
    scoreKey: "accuracy",
  };

  async loadTasks(subset: string, numSamples?: number): Promise<DomainTask[]> {
    // Load from JSON, database, API, etc.
  }

  async evaluate(prediction: string, task: DomainTask): Promise<number> {
    // Use staticEvaluator, llmJudgeEvaluator, or humanFeedbackEvaluator
  }

  formatInput(task: DomainTask): string {
    // Format the task as a prompt for the TaskAgent
  }

  async report(results: EvalResult[]): Promise<ReportSummary> {
    // Aggregate scores
  }
}

LLM providers

import { createLLM } from "@lablnet/hyperagents";

createLLM({ model: "openai/gpt-4o" })
createLLM({ model: "anthropic/claude-sonnet-4-5-20250929" })
createLLM({ model: "gemini/gemini-2.5-pro" })
createLLM({ model: "ollama/llama3" })  // free, runs locally

Docker

Build and run without installing anything locally (except Docker):

# Build the image
docker build -t hyperagents .

# Run the scoring demo
docker run --rm -e OPENAI_API_KEY=sk-... hyperagents examples/scoring/run.ts

# Run the bash example
docker run --rm -e OPENAI_API_KEY=sk-... hyperagents examples/bash/run.ts

# Run the evolutionary loop
docker run --rm -e OPENAI_API_KEY=sk-... hyperagents examples/bash/run.ts evolve

# Use a different model
docker run --rm \
  -e OPENAI_API_KEY=sk-... \
  -e HYPERAGENTS_MODEL=openai/gpt-4o-mini \
  hyperagents examples/scoring/run.ts

# Mount a volume to persist outputs
docker run --rm \
  -e OPENAI_API_KEY=sk-... \
  -v $(pwd)/outputs:/hyperagents/outputs \
  hyperagents examples/scoring/run.ts

For Anthropic or Gemini models, pass the corresponding API key:

docker run --rm \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -e HYPERAGENTS_MODEL=anthropic/claude-sonnet-4-5-20250929 \
  hyperagents examples/scoring/run.ts

Based on

HyperAgents -- Self-referential self-improving agents (Meta Research, 2026)
LangChain -- LLM framework
LangGraph -- Agentic state machines

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
docs_site		docs_site
examples		examples
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HyperAgents

What it does

Quick start

As an npm package

From source

Architecture

Key concepts

TaskAgent vs MetaAgent

Three evaluator strategies

Parent selection strategies

Self-referential improvement (prompt files)

Execution modes

Examples

Scoring demo (self-improvement in action)

Bash scripting

Calculator (tool improvement)

Fact-check

Paper review

Creating your own domain

LLM providers

Docker

Based on

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HyperAgents

What it does

Quick start

As an npm package

From source

Architecture

Key concepts

TaskAgent vs MetaAgent

Three evaluator strategies

Parent selection strategies

Self-referential improvement (prompt files)

Execution modes

Examples

Scoring demo (self-improvement in action)

Bash scripting

Calculator (tool improvement)

Fact-check

Paper review

Creating your own domain

LLM providers

Docker

Based on

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages