FastAPI service that collects and serves League of Legends professional match data. Fetches schedules from LoL Esports API, enriches with per-player stats from Leaguepedia, and stores everything in Elasticsearch for fast REST queries.
- Features
- Architecture
- API Endpoints
- Quick Start
- Production Deployment
- Stack
- File Structure
- Environment Variables
- Troubleshooting
- License
- FastAPI REST API — serve professional match data via HTTP endpoints
- Two-phase live pipeline — sync (LoL Esports) + background enrichment (Leaguepedia)
- Full player stats — champion, KDA, gold, CS, items (names), runes (names), summoner spells
- Leaguepedia integration — only public source for competitive game data (Riot Match-V5 does not expose tournament server games)
- Enrichment daemon — background job processes pending games every 30 minutes, respects rate limits
- Deduplication —
riot_enrichedflag prevents re-processing;enrichment_attemptscounter abandons after 3 failures - Historical backfill — imports ALL editions of a league from Leaguepedia (CBLOL since 2013, resumable)
- Oracle's Elixir ingest — bulk-indexes OE CSV exports (97K+ games, all major leagues); idempotent (
op_type=create) - Oracle's Elixir backfill — fills missing stats (damage_taken, wards) on Leaguepedia docs using OE CSV as join source
- Multi-league — CBLOL, CBLOL Academy, Circuito Desafiante, LCS, LEC, LCK, LPL, and more
- Production ready — Docker Compose with Traefik/SSL for Coolify deployment
The system has four independent pipelines that all write to the same ES index (lol_pro_matches):
Phase 1 — Live Sync (scraper-cron, every 1h)
LoL Esports API
└─ getCompletedEvents → series with games + YouTube VOD IDs
└─ competitive_pipeline.py
└─ bulk_index → ES (riot_enriched: false)
Phase 2 — Enrichment (enrichment-daemon, every 30min)
query_unenriched(ES) → pending games
└─ For each game (2 Leaguepedia requests + 9s sleep each):
1. ScoreboardGames → page_name, winner, patch, gamelength
2. ScoreboardPlayers → 10 players with champion/KDA/items/runes
└─ update_document(ES, riot_enriched: true, participants: [...])
Phase 3 — Historical Backfill (one-off / daily cron via Rails Sidekiq)
Leaguepedia Tournaments table → all OverviewPages for a league
└─ historical_backfill.py (resumable — persists progress to JSON)
└─ For each tournament not yet completed:
└─ leaguepedia_pipeline.py → bulk_index → ES
Phase 4 — Oracle's Elixir Ingest (one-off / manual re-run)
OE CSV files (annual download, 97K+ games, all major leagues)
└─ oracles_elixir_ingest.py
└─ op_type='create' → ES (idempotent, skips duplicates)
└─ oracles_elixir_backfill.py → fills damage_taken/wards on existing docs
Why Leaguepedia instead of Riot Match-V5: competitive games run on Riot's internal tournament servers and do not appear in the public Match-V5 API. Leaguepedia receives official data from Riot's esports disclosure program and is the only public source for these stats.
Why Oracle's Elixir: broader league coverage and additional stat columns (damage_taken, wards_placed, wards_killed) not always available via Leaguepedia. OE CSVs are a login-gated annual download — re-run ingest when a new year's CSV is available.
For the full architecture diagram and detailed flow, see docs/Arquitetura.md.
GET /health # Health check (Elasticsearch connectivity)
GET / # Service info
GET /api/v1/leagues # List leagues from LoL Esports
GET /api/v1/matches?league=CBLOL # Query matches (paginated)
GET /api/v1/matches/{match_id} # Single match with full participant stats
GET /api/v1/stats/leagues # Match count per leaguePOST /api/v1/sync?league=CBLOL&limit=50 # Trigger manual sync
POST /api/v1/enrich?batch=10 # Trigger background enrichment
GET /api/v1/enrich/status # Enrichment progress (pending/enriched counts)GET /api/v1/matches/115565621821672075_2
{
"match_id": "115565621821672075",
"game_number": 2,
"league": "CBLOL",
"patch": "26.02",
"win_team": "Leviatan",
"gamelength": "32:43",
"game_duration_seconds": 1963,
"riot_enriched": true,
"participants": [
{
"summoner_name": "tinowns",
"team_name": "paiN Gaming",
"champion_name": "Ahri",
"role": "Mid",
"kills": 4, "deaths": 1, "assists": 3,
"gold": 14320, "cs": 245, "damage": 22100,
"win": false,
"items": ["Rabadon's Deathcap", "Shadowflame", "Void Staff"],
"keystone": "Electrocute",
"primary_runes": ["Cheap Shot", "Eyeball Collection", "Treasure Hunter"],
"secondary_runes": ["Presence of Mind", "Cut Down"],
"stat_shards": ["Adaptive Force", "Adaptive Force", "Health"],
"summoner_spells": ["Flash", "Ignite"]
}
]
}See full Swagger UI at https://scraper.prostaff.gg/docs
# 1. Copy and configure environment
cp .env.example .env
# Edit .env: add RIOT_API_KEY, ESPORTS_API_KEY, SCRAPER_API_KEY
# 2. Start services (Elasticsearch + API + enrichment daemon)
docker compose up -d
# 3. Verify health
curl http://localhost:8000/health
# 4. Sync CBLOL matches
curl -X POST "http://localhost:8000/api/v1/sync?league=CBLOL&limit=20" \
-H "X-API-Key: your-key"
# 5. Check enrichment progress (daemon runs automatically every 30min)
curl "http://localhost:8000/api/v1/enrich/status" \
-H "X-API-Key: your-key"
# 6. Query enriched matches
curl "http://localhost:8000/api/v1/matches?league=CBLOL&limit=5"Deploy to Coolify: see DEPLOYMENT.md for full guide.
- Create Docker Compose application in Coolify
- Point to repository with
docker-compose.production.yml - Configure environment variables (see Environment Variables)
- Set domain:
scraper.prostaff.gg - Deploy and verify:
curl https://scraper.prostaff.gg/health
The lol_pro_matches Elasticsearch index is created automatically on first sync.
If deploying over an existing installation with the old schema (pre-Leaguepedia),
delete the index first so it is recreated with the updated mapping:
curl -X DELETE https://your-elasticsearch-host:9200/lol_pro_matches| Component | Technology |
|---|---|
| Framework | FastAPI 0.115 (async REST API) |
| Server | Uvicorn (ASGI) |
| Language | Python 3.11 |
| HTTP client | httpx + tenacity (retry/backoff) |
| Data validation | Pydantic 2.9 |
| Storage | Elasticsearch 8.x |
| Deployment | Docker Compose + Traefik (Coolify) |
| Data sources | LoL Esports Persisted Gateway, Leaguepedia Cargo API, Oracle's Elixir CSV |
ProStaff-Scraper/
├── api/
│ └── main.py # FastAPI: all endpoints
├── providers/
│ ├── esports.py # LoL Esports Gateway API client
│ ├── leaguepedia.py # Leaguepedia Cargo API client
│ │ # get_game_scoreboard() + get_game_players()
│ ├── riot.py # Riot Account/Match V5 client
│ └── riot_rate_limited.py # Riot client with rate limit tiers
├── etl/
│ ├── competitive_pipeline.py # Phase 1: live sync from LoL Esports
│ ├── enrichment_pipeline.py # Phase 2: enrich from Leaguepedia (daemon)
│ ├── historical_backfill.py # Phase 3: full league history from Leaguepedia
│ ├── leaguepedia_pipeline.py # Leaguepedia game + player indexer (used by phases 2 & 3)
│ ├── oracles_elixir_ingest.py # Phase 4: bulk ingest from Oracle's Elixir CSVs
│ ├── oracles_elixir_backfill.py # Phase 4b: fill missing stats using OE CSV join
│ └── historical_data_migration.py # One-off migration helper (legacy)
├── indexers/
│ ├── elasticsearch_client.py # ES helpers (bulk, update, query_unenriched)
│ └── mappings.py # Index mappings (participant fields are strings)
├── docs/
│ └── Arquitetura.md # Full architecture documentation
├── docker-compose.yml # Development (ES + Kibana + API + enrichment)
├── docker-compose.production.yml # Production (Coolify + Traefik, 3 services)
├── Dockerfile.production # Production Docker image
├── DEPLOYMENT.md # Coolify deployment guide
├── QUICKSTART.md # 5-minute setup guide
├── requirements.txt # Python dependencies (elasticsearch==8.13.1)
└── .env.example # Environment variables template
See .env.example for the full template.
| Variable | Description |
|---|---|
ESPORTS_API_KEY |
LoL Esports Persisted Gateway key (for sync) |
RIOT_API_KEY |
Riot Games API key (for sync, not needed for enrichment) |
SCRAPER_API_KEY |
Secret key to protect write endpoints (sync, enrich) |
| Variable | Default | Description |
|---|---|---|
ELASTICSEARCH_URL |
http://elasticsearch:9200 |
ES connection URL |
DEFAULT_PLATFORM_REGION |
BR1 |
Default Riot platform region |
API_PORT |
8000 |
FastAPI server port |
CORS_ALLOWED_ORIGINS |
https://api.prostaff.gg,... |
Comma-separated allowed origins |
| Variable | Default | Description |
|---|---|---|
SYNC_LEAGUES |
CBLOL |
Space-separated leagues to sync |
SYNC_INTERVAL_HOURS |
1 |
Sync interval in hours |
SYNC_LIMIT |
100 |
Match limit per league per run |
Note:
RIOT_API_KEYis only used by the sync pipeline to call LoL Esports endpoints. The enrichment daemon uses Leaguepedia anonymously — no API key required.
ETL scripts (
oracles_elixir_ingest.py,historical_backfill.py) must be run with the project's.venv(Python 3.11,elasticsearch==8.13.1). The system-wideelasticsearchpackage may be v9 and is incompatible with an ES 8.x server. Create the venv once:python3.11 -m venv .venv && .venv/bin/pip install -r requirements.txt
GET /health returns 503
Elasticsearch is still starting. Wait 30s and retry.
docker logs prostaff-scraper-elasticsearch-1 | tail -20GET /api/v1/matches returns empty
Run a sync first:
curl -X POST "http://localhost:8000/api/v1/sync?league=CBLOL&limit=20" \
-H "X-API-Key: your-key"Enrichment stuck — all games at enrichment_attempts: 3
Leaguepedia may not have data for these games yet (common for very recent matches). They will be picked up automatically on the next daemon run after Leaguepedia updates. To reset attempts and force retry:
# Reset attempts for all games (use with care)
curl -X POST http://localhost:9200/lol_pro_matches/_update_by_query \
-H "Content-Type: application/json" \
-d '{"query":{"range":{"enrichment_attempts":{"gte":3}}},"script":{"source":"ctx._source.enrichment_attempts=0"}}'Leaguepedia rate limit errors in logs
Expected behavior during rapid testing. The enrichment daemon respects 9s between
requests. Errors automatically retry up to 3 times before incrementing enrichment_attempts.
401 Unauthorized on sync/enrich endpoints
Ensure X-API-Key header matches SCRAPER_API_KEY in your .env.
Elasticsearch mapping conflict after upgrading from old schema
The participant fields changed from integer IDs to string names. Delete and recreate:
curl -X DELETE http://localhost:9200/lol_pro_matches
# Restart API and run sync — index is recreated automaticallyThe Rails API (prostaff-api) talks to this scraper in two ways:
Live match sync — ProStaffScraperService calls the scraper's REST API:
POST /api/v1/sync— trigger a sync runGET /api/v1/enrich/status— poll enrichment progress- Used by
SyncScraperMatchesJobandHistoricalBackfillJob
Direct ES queries — ElasticsearchClient queries the shared lol_pro_matches index directly:
GET /competitive/pro-matches/match-preview— per-game picks + stats for a recent seriesGET /competitive/pro-matches/es-series— H2H history between two teams- The data lake (97K+ games) is populated by all four pipelines above
Setup:
- Set
SCRAPER_API_URL=https://scraper.prostaff.ggin the Rails API environment - Set
ELASTICSEARCH_URLto the same ES instance in both repos - See
PROSTAFF_SCRAPER_INTEGRATION_ANALYSIS.mdfor the full integration guide
Running Oracle's Elixir ingest (requires .venv — ES client v8):
cd /path/to/ProStaff-Scraper
ELASTICSEARCH_URL=https://user:pass@elastic.example.com \
.venv/bin/python etl/oracles_elixir_ingest.py --years 2026
# Re-run is safe — duplicate gameids are skipped (op_type=create)
# To add a new year's CSV: download from oracleselixir.com and re-run with --years <year>- Full deployment guide:
DEPLOYMENT.md - Quick start:
QUICKSTART.md - Architecture:
docs/Arquitetura.md - API docs (Swagger):
https://scraper.prostaff.gg/docs
CC BY-NC-SA 4.0 — Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International