Skip to content

Add 6 public-baseline forecasting bots#293

Open
CodexVeritas wants to merge 3 commits into
mainfrom
public-baseline-bots
Open

Add 6 public-baseline forecasting bots#293
CodexVeritas wants to merge 3 commits into
mainfrom
public-baseline-bots

Conversation

@CodexVeritas

@CodexVeritas CodexVeritas commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds 6 cheap, agentic public-baseline bots so the Metaculus Community Prediction can be benchmarked against "what various groups of people would forecast". Each bot estimates the forecast a randomized, representative sample of a target group would give — it is explicitly a group-belief proxy, not a best-guess of the true outcome.

The 6 bots (in forecasting_tools/forecast_bots/public_baselines/):

  • PublicSentimentBaselineBot — the general public
  • ExpertOpinionBaselineBot — domain experts on the question
  • CredibleNewsBaselineBot — credible news outlets' implied forecasts
  • LeftLeaningBaselineBot / CenterLeaningBaselineBot / RightLeaningBaselineBot — left/center/right public figures & outlets

How they work

  • Built on PydanticAI agents wrapped in the existing ForecastBot (SummerTemplateBot2026) interface so they run identically to the other open-source bots. Research is skipped (run_research returns ""); the agent does its own evidence gathering.
  • New ExaQuoteSearcher tool gives each agent article summaries + verbatim highlight quotes (with an AskNews fallback) so it can find and cite what the target group actually says. ExaSearcher/ExaSource extended with optional summary support.
  • Each PopulationBaselineBot subclass only defines a PopulationSpec (who is sampled + how to find/interpret their views). All agentic machinery, prediction conversion, and comment formatting lives in the base class.
  • Every question is forecast by 3 independent model branches — Claude Sonnet 4.5, Grok 4.20, GLM 5.1 (via OpenRouter) — round-robined per prediction and aggregated by the framework (mirroring the research-only bot's multi-sample aggregation, one model per branch). Each branch returns a structured object with a reasoning scratchpad, the sources it sampled, and each source's implied forecast.
  • The Metaculus comment summarizes, per model branch, which sources were found and the forecast each implies, plus the aggregate headline figure.

Wiring

  • run_bots.py: registers the 6 bots and passes the 3 branch LLMs.
  • .github/workflows/run-bot-aib-tournament.yaml: 6 baseline jobs, each now also receiving EXA_API_KEY.

Test plan

  • pytest code_tests/unit_tests/test_forecast_bots/test_population_baseline_bot.py (6 passed) — covers binary/MC/numeric conversion + clamping, option normalization, round-robin branch-model dispatch, and Exa quote formatting.
  • Verified all 3 OpenRouter model slugs resolve (anthropic/claude-sonnet-4.5, x-ai/grok-4.20, z-ai/glm-5.1).
  • Live tournament smoke run (requires keys; not run in CI).

CodexVeritas and others added 3 commits June 25, 2026 17:01
Adds agentic PydanticAI bots that estimate what a sampled group of people
would forecast (public sentiment, expert opinion, credible news outlets,
left, center, and right), as public baselines to compare against the
Community Prediction. Each bot is wrapped in the ForecastBot interface,
searches for evidence of its group's views, and reports the sampled sources
and their implied forecasts in the final comment.

Wires the bots into run_bots.py, bot_lists.py, and the AIB tournament
workflow, and adds pydantic-ai-slim as a dependency plus unit tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ratchpad

- Add ExaQuoteSearcher tool returning article summaries + verbatim highlight
  quotes (with AskNews fallback) so each bot can cite the sources it sampled.
- Extend ExaSearcher/ExaSource with optional summary support.
- Forecast each question with 3 independent model branches (Claude Sonnet 4.5,
  Grok 4.20, GLM 5.1 via OpenRouter), round-robined per prediction and
  aggregated by the framework; add a reasoning scratchpad per branch.
- Surface per-model source breakdowns and implied forecasts in the comment.
- Wire branch LLMs through run_bots.py and pass EXA_API_KEY to the 6 baseline
  workflow jobs.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant