Add 6 public-baseline forecasting bots by CodexVeritas · Pull Request #293 · Metaculus/forecasting-tools

CodexVeritas · 2026-06-25T17:01:58Z

Summary

Adds 6 cheap, agentic public-baseline bots so the Metaculus Community Prediction can be benchmarked against "what various groups of people would forecast". Each bot estimates the forecast a randomized, representative sample of a target group would give — it is explicitly a group-belief proxy, not a best-guess of the true outcome.

The 6 bots (in forecasting_tools/forecast_bots/public_baselines/):

PublicSentimentBaselineBot — the general public
ExpertOpinionBaselineBot — domain experts on the question
CredibleNewsBaselineBot — credible news outlets' implied forecasts
LeftLeaningBaselineBot / CenterLeaningBaselineBot / RightLeaningBaselineBot — left/center/right public figures & outlets

How they work

Built on PydanticAI agents wrapped in the existing ForecastBot (SummerTemplateBot2026) interface so they run identically to the other open-source bots. Research is skipped (run_research returns ""); the agent does its own evidence gathering.
New ExaQuoteSearcher tool gives each agent article summaries + verbatim highlight quotes (with an AskNews fallback) so it can find and cite what the target group actually says. ExaSearcher/ExaSource extended with optional summary support.
Each PopulationBaselineBot subclass only defines a PopulationSpec (who is sampled + how to find/interpret their views). All agentic machinery, prediction conversion, and comment formatting lives in the base class.
Every question is forecast by 3 independent model branches — Claude Sonnet 4.5, Grok 4.20, GLM 5.1 (via OpenRouter) — round-robined per prediction and aggregated by the framework (mirroring the research-only bot's multi-sample aggregation, one model per branch). Each branch returns a structured object with a reasoning scratchpad, the sources it sampled, and each source's implied forecast.
The Metaculus comment summarizes, per model branch, which sources were found and the forecast each implies, plus the aggregate headline figure.

Wiring

run_bots.py: registers the 6 bots and passes the 3 branch LLMs.
.github/workflows/run-bot-aib-tournament.yaml: 6 baseline jobs, each now also receiving EXA_API_KEY.

Test plan

pytest code_tests/unit_tests/test_forecast_bots/test_population_baseline_bot.py (6 passed) — covers binary/MC/numeric conversion + clamping, option normalization, round-robin branch-model dispatch, and Exa quote formatting.
Verified all 3 OpenRouter model slugs resolve (anthropic/claude-sonnet-4.5, x-ai/grok-4.20, z-ai/glm-5.1).
Live tournament smoke run (requires keys; not run in CI).

Adds agentic PydanticAI bots that estimate what a sampled group of people would forecast (public sentiment, expert opinion, credible news outlets, left, center, and right), as public baselines to compare against the Community Prediction. Each bot is wrapped in the ForecastBot interface, searches for evidence of its group's views, and reports the sampled sources and their implied forecasts in the final comment. Wires the bots into run_bots.py, bot_lists.py, and the AIB tournament workflow, and adds pydantic-ai-slim as a dependency plus unit tests. Co-authored-by: Cursor <cursoragent@cursor.com>

…ratchpad - Add ExaQuoteSearcher tool returning article summaries + verbatim highlight quotes (with AskNews fallback) so each bot can cite the sources it sampled. - Extend ExaSearcher/ExaSource with optional summary support. - Forecast each question with 3 independent model branches (Claude Sonnet 4.5, Grok 4.20, GLM 5.1 via OpenRouter), round-robined per prediction and aggregated by the framework; add a reasoning scratchpad per branch. - Surface per-model source breakdowns and implied forecasts in the comment. - Wire branch LLMs through run_bots.py and pass EXA_API_KEY to the 6 baseline workflow jobs. Co-authored-by: Cursor <cursoragent@cursor.com>

CodexVeritas and others added 3 commits June 25, 2026 17:01

Updates

511c9fc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add 6 public-baseline forecasting bots#293

Add 6 public-baseline forecasting bots#293
CodexVeritas wants to merge 3 commits into
mainfrom
public-baseline-bots

CodexVeritas commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

CodexVeritas commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How they work

Wiring

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CodexVeritas commented Jun 25, 2026 •

edited

Loading