diff --git a/README.md b/README.md index 6f17bfd..4286952 100644 --- a/README.md +++ b/README.md @@ -1,244 +1,64 @@ # PoliticalEventTrackingResearch - +[Chinese README](README.zh-CN.md) -> ⚠️ 投资有风险,不构成投资建议,仅供学习交流用途。 > ⚠️ Investing involves risk. This project does not provide investment advice and is for educational and research purposes only. -## Open-source overview / 开源项目入口 +## What this project does -| Item | Description | -| --- | --- | -| Project type | research pipeline | -| What it does | Tracks public political/policy event evidence and transforms it into auditable source items/events for research. | -| 中文说明 | 公开政治/政策事件研究管线,产出可审计 source_items/source_events,不做 AI 决策或交易执行。 | -| Current status | Research-only evidence pipeline. | +PoliticalEventTrackingResearch is a **Research evidence pipeline** in the QuantStrategyLab ecosystem. It tracks public political and policy events from source/RSS evidence for US equity research context. -### Quick start +## Who this is for -- `python -m pip install -e '.[test]'` -- `python -m pytest -q` +- Engineers and researchers who want to inspect, reproduce, or extend this part of the QuantStrategyLab stack. +- Operators who need a clear entry point before reading the deeper runbooks or workflow files. +- Reviewers who need to understand the repository purpose, safety boundary, and evidence requirements before enabling automation. -### Deploy / operate safely +## Current status -Run source collection and publication workflows only after checking rate limits, source terms and output paths. +Research-only pipeline; no AI trading agent and no order execution. -### Strategy performance / evidence boundary +## Repository layout -The research plan covers post-event 1/5/20 trading-day returns and benchmark-relative analysis; see `docs/research_plan.zh-CN.md`. +- `src/`: main library and runtime code. +- `tests/`: unit and contract tests. +- `docs/`: detailed design notes, runbooks, and evidence docs. +- `.github/workflows/`: CI, scheduled jobs, and deployment workflows. +- `scripts/`: operator scripts and local helpers. -> Detailed runbooks, migration notes, workflow internals, and historical decisions are kept below. Start with this overview before using the lower-level operational sections. +## Quick start - - -> ⚠️ 投资有风险,不构成投资建议,仅供学习交流用途。 - - -## 中文摘要 - -- 完整中文版见 [`README.zh-CN.md`](README.zh-CN.md);本节保留在英文文件顶部,方便从当前文件直接找到中文入口。 -- 用途:本文档围绕 `PoliticalEventTrackingResearch`,用于理解 `PoliticalEventTrackingResearch` 的配置、运行、部署、研究或验收边界。 -- 主要覆盖:`Repository Role`、`Current Status`、`Local Validation`、`Data Contracts`、`Live Pipeline Notes`。 -- 阅读顺序:先确认边界、输入输出和权限要求,再执行文档里的命令、CI、dry-run、发布或切换步骤。 -- 风险提示:涉及实盘、密钥、权限、Cloud Run、交易所或券商 API 的变更,必须先在测试环境或 dry-run 验证;不要只凭示例直接修改生产。 -- 英文正文保留更完整的命令、字段名和配置键;如果摘要和正文不一致,以正文中的实际命令和配置为准。 -[English](README.md) | [简体中文](README.zh-CN.md) - -Research-only political event and public disclosure tracking for US equities. - -This repository asks whether public disclosure, official remarks, policy capital, -procurement, and other public events can be tracked in a repeatable, -point-in-time way. - -## Repository Role - -This is a deterministic research artifact repository. It does not place trades, -store broker credentials, scrape private accounts, call AI models, or own live -allocation policy. - -The stable release scope is: - -- collect public disclosure, official remark, policy funding, issuer release, - financial-media lead, and market reaction events into a consistent CSV schema -- build a candidate tracker from watchlists and event timelines -- run small event studies against local daily close price files -- preserve source links and confidence levels for later manual review - -Out of scope for this release: - -- X / Twitter ingestion -- Truth Social ingestion -- Longbridge community, profile, or following-list ingestion -- logged-in browser scraping or cookie-based collectors - -## Current Status - -The committed CSV files under `examples/` are synthetic schema fixtures only. -They are not investment evidence and are not derived from any article. - -Tracked event families: - -- `disclosure_buy`: public financial disclosure or transaction filing -- `public_mention`: official remarks, issuer statements, or media leads -- `policy_capital`: government capital, procurement, or industrial-policy - support -- `market_reaction`: earnings, contract, analyst, or price reaction marker - -## Local Validation - -Build the seed tracker: - -```bash -python scripts/build_tracker.py \ - --watchlist examples/political_watchlist.example.csv \ - --events examples/political_events.example.csv \ - --output data/output/political_tracker.example.csv -``` - -Normalize official, issuer, and media-lead records into the event schema: - -```bash -python scripts/import_source_events.py \ - --input examples/official_records.example.csv \ - --output data/output/official_events.example.csv -``` - -Extract mention events from raw official remarks / RSS / financial-media exports: - -```bash -python scripts/extract_source_mentions.py \ - --raw-items examples/source_items.example.csv \ - --aliases examples/symbol_aliases.example.csv \ - --output data/output/source_events.example.csv -``` - -Fetch RSS/Atom sources into the same raw item schema: - -```bash -python scripts/fetch_rss_sources.py \ - --feeds examples/rss_feeds.example.csv \ - --output data/output/rss_source_items.example.csv \ - --max-items-per-feed 10 -``` - -`.github/workflows/rss_source_pipeline.yml` fetches configured RSS/Atom feeds, -extracts mentions, builds a tracker, and uploads the results as an artifact. On -scheduled runs it also commits public live CSV outputs under `data/live/`: - -```text -data/live/source_items.csv -data/live/source_events.csv -data/live/political_events.csv -data/live/source_tracker.csv -data/live/source_fetch_status.json -data/live/source_manifest.json -``` - -`data/live/political_events.csv` is the stable input consumed by -`QuantAdvisorResearch`. This repository still only publishes source evidence; it -does not generate investment recommendations. - -`.github/workflows/source_event_pipeline.yml` runs the same extraction for an -operator-provided `source_items.csv`. Scheduled runs use `data/live/source_items.csv` -after the RSS pipeline and can refresh `data/live/source_events.csv`, -`data/live/political_events.csv`, and `data/live/source_tracker.csv`. - -Free-source setup notes are in -[`docs/free_source_setup.zh-CN.md`](docs/free_source_setup.zh-CN.md). - -Run the synthetic event study: - -```bash -python scripts/run_event_study.py \ - --events examples/political_events.example.csv \ - --prices examples/price_history.example.csv \ - --windows 1,2 \ - --output data/output/event_study.example.csv -``` - -Run tests: +From a fresh clone: ```bash +python -m pip install -e . python -m pytest -q ``` -## Data Contracts - -Event input schema: - -```text -event_id,event_date,symbol,event_type,direction,confidence,source_url,notes -``` - -Watchlist input schema: - -```text -symbol,name,bucket,research_status,thesis,source_url -``` - -Price input schema: - -```text -date,symbol,close -``` - -The price loader also accepts `as_of` instead of `date`. - -## Live Pipeline Notes - -The RSS fetcher records per-feed health in `data/live/source_fetch_status.json` and keeps a small hash/row-count plus data-quality manifest in `data/live/source_manifest.json`. The manifest summarizes feed health, covered symbols, confidence counts, and event-type counts. A single blocked or unavailable feed should not stop the rest of the official-source refresh. - -If the RSS workflow downloads articles but `source_events.csv` is empty, the -usual cause is deterministic alias coverage rather than a fetch failure: broad -policy items often mention sectors such as grid infrastructure, HBM, foundry, -AI servers, or crypto assets instead of company names. Alias updates should -remain targeted, documented, and reviewable to avoid turning every broad policy -article into a false positive. - -Local macOS Python installations can also fail HTTPS RSS fetches with a local CA -certificate error. GitHub-hosted runners use a normal CA bundle; local operators -can either repair the Python certificate store or validate extraction from a -downloaded `source_items.csv` artifact. +If a command requires credentials, run it only after reading the relevant workflow or runbook and configuring secrets outside Git. -## Short-Horizon Event Boundary +## Deployment and operation -This repository is the short-horizon event evidence layer for Advisor `1-10 trading days` catalyst checks. It emits source URLs, dates, event types, and confidence only. It does not directly produce short-term buy/sell recommendations; final short/medium/long recommendations remain in `QuantAdvisorResearch`. +Run the collection pipeline with configured source lists, review source_items and source_events, then publish artifacts for downstream research only. -## Boundary +Prefer manual or dry-run execution first. Enable schedules or live execution only after logs, artifacts, permissions, and rollback steps are reviewed. -This repo owns: +## Strategy performance and evidence -- research schemas for political/disclosure/mention event tracking -- seed watchlists and event timelines -- deterministic event-study utilities over local daily prices -- source registry and promotion notes +Not a trading strategy repository. Evidence quality is measured by source traceability, event normalization, freshness, and reviewability. -This repo does not own: +README files are intentionally not a source of dated performance promises. Re-run the relevant tests, backtests, or pipeline jobs before relying on any result. -- broker API access or order placement -- Telegram or runtime notifications -- paid market-data redistribution -- legal claims about conflicts of interest -- AI-generated shadow signals; those belong in `ResearchSignalContextPipelines` -- live strategy promotion into `UsEquityStrategies` or broker platforms +## Safety notes -## Next Work +- Never commit API keys, broker credentials, OAuth tokens, cookies, or account identifiers. +- Run new strategies and platform changes in dry-run or paper mode before any live execution. +- Review generated orders, artifacts, and logs manually before enabling schedules. -1. Add more templates for official filings, official remarks, issuer releases, - government procurement, and financial-media leads. -2. Add source adapters for OGE disclosure PDFs or normalized public datasets. -3. Add public-remarks ingestion from stable government or issuer pages. -4. Backfill enough point-in-time events to evaluate hit rate, lag, and false - positives before considering any downstream strategy contract. +## Contributing -## Cross-Sector Source Principle +Keep changes small, reproducible, and covered by the narrowest useful tests. For strategy-facing changes, include the evidence artifact or command used to validate behavior. -Stable source ingestion is not limited to AI. Semiconductors, data-center power, -cybersecurity, defense, energy, financials, healthcare, consumer platforms, -industrials, and EV/auto themes can all enter the same `source_items.csv` / -`source_events.csv` structure when durable primary sources exist. +## License -Theme membership and long-horizon semantic bias belong in -`ResearchSignalContextPipelines`; this repository only preserves point-in-time -factual evidence so the source boundary is not changed just because a symbol is -currently popular. +See [LICENSE](LICENSE) if present in this repository. diff --git a/README.zh-CN.md b/README.zh-CN.md index 88a9ab2..b7e3c0a 100644 --- a/README.zh-CN.md +++ b/README.zh-CN.md @@ -1,153 +1,64 @@ # PoliticalEventTrackingResearch -> ⚠️ 投资有风险,不构成投资建议,仅供学习交流用途。 - - -## English summary - -- Full English version: [`README.md`](README.md). This summary keeps an English entry point in the Chinese file. -- Purpose: this document covers `PoliticalEventTrackingResearch` for `PoliticalEventTrackingResearch`. -- Main topics: `仓库定位`, `当前状态`, `本地验证`, `Live pipeline 说明`, `研究判断`. -- Read the boundaries, inputs, outputs, and permission requirements before running commands, CI jobs, dry-runs, releases, or runtime switches. -- For live trading, secrets, Cloud Run, exchange, or broker API changes, validate in test or dry-run mode first and do not change production only from examples. -- If this summary differs from the detailed Chinese body, follow the concrete commands, configuration keys, and constraints in the body. - -[English](README.md) | [简体中文](README.zh-CN.md) - -QuantStrategyLab 的确定性研究仓库,用来验证“公开持仓/交易披露 + 官方讲话/公开材料 + 政策资金事件”能否形成可追踪的美股事件线索。 - -## 仓库定位 - -这是研究证据仓库,不是 AI 仓库,也不是交易执行仓库。 - -它负责: +[English README](README.md) -- 把公开披露、官方讲话、政策资金、发行人公告、财经媒体 lead、市场反应事件整理成统一 CSV 结构 -- 从观察池和事件时间线生成候选追踪表 -- 用本地日线收盘价做轻量事件研究 -- 保留来源链接、置信度和人工复核入口 +> ⚠️ 投资有风险,不构成投资建议,仅供学习交流用途。 -本次稳定发布版先不包含: +## 这个项目做什么 -- X / Twitter 采集 -- Truth Social 采集 -- Longbridge 社区、用户主页、关注列表采集 -- 登录态页面抓取或 Cookie 型采集器 +PoliticalEventTrackingResearch 是 QuantStrategyLab 体系中的**研究证据流水线**。从公开来源和 RSS 证据追踪政治与政策事件,为美股研究提供上下文。 -它不负责: +## 适合谁使用 -- 券商 API、下单、账户同步 -- Telegram 或实盘通知 -- 受版权限制的行情数据分发 -- 对利益冲突作法律结论 -- AI 生成的长期影子信号;这类产物继续归 `ResearchSignalContextPipelines` -- 直接把信号推广到实盘策略 +- 希望阅读、复现或扩展 QuantStrategyLab 相关模块的工程师和研究人员。 +- 在阅读详细 runbook 或 workflow 前,需要先理解项目入口的运维人员。 +- 在启用自动化前,需要确认项目职责、安全边界和证据要求的 reviewer。 ## 当前状态 -当前提交的 `examples/` 数据是完全合成的 schema fixture,只用于跑通工具链,不是投资证据,也不是从任何文章抽取出来的样本。 - -事件类型: - -- `disclosure_buy`:公开财务披露或交易披露中的买入 -- `public_mention`:官方讲话、发行人声明或财经媒体 lead 中的公开点名 -- `policy_capital`:政府入股、采购、产业政策资金支持 -- `market_reaction`:财报、合同、分析师评级或价格反应标记 - -## 本地验证 - -生成合成示例追踪表: - -```bash -python scripts/build_tracker.py \ - --watchlist examples/political_watchlist.example.csv \ - --events examples/political_events.example.csv \ - --output data/output/political_tracker.example.csv -``` - -把官方来源、发行人公告和财经媒体线索归一化为事件 schema: +只用于研究;不包含 AI 交易代理,也不执行订单。 -```bash -python scripts/import_source_events.py \ - --input examples/official_records.example.csv \ - --output data/output/official_events.example.csv -``` - -从官方讲话 / RSS / 财经媒体导出的原始文本 CSV 抽取 mention 事件: - -```bash -python scripts/extract_source_mentions.py \ - --raw-items examples/source_items.example.csv \ - --aliases examples/symbol_aliases.example.csv \ - --output data/output/source_events.example.csv -``` - -把 RSS/Atom 拉取为同一个原始文本 schema: - -```bash -python scripts/fetch_rss_sources.py \ - --feeds examples/rss_feeds.example.csv \ - --output data/output/rss_source_items.example.csv \ - --max-items-per-feed 10 -``` - -`.github/workflows/rss_source_pipeline.yml` 会拉取配置的 RSS/Atom,抽取 mention,生成 tracker,并上传为 artifact。定时运行时还会把公开 live CSV 提交回 `data/live/`: +## 仓库结构 -```text -data/live/source_items.csv -data/live/source_events.csv -data/live/political_events.csv -data/live/source_tracker.csv -data/live/source_fetch_status.json -data/live/source_manifest.json -``` +- `src/`:主要库代码和运行时代码。 +- `tests/`:单元测试和契约测试。 +- `docs/`:详细设计说明、运行手册和证据文档。 +- `.github/workflows/`:CI、定时任务和部署 workflow。 +- `scripts/`:运维脚本和本地辅助工具。 -其中 `data/live/political_events.csv` 是 `QuantAdvisorResearch` 读取的稳定事件输入。本仓库仍然只发布来源证据,不生成投资建议。 +## 快速开始 -`.github/workflows/source_event_pipeline.yml` 可处理人工提供的 `source_items.csv`。定时运行会在 RSS pipeline 之后使用 `data/live/source_items.csv`,刷新 `data/live/source_events.csv`、`data/live/political_events.csv` 和 `data/live/source_tracker.csv`。 - -用合成价格样本跑事件研究: - -```bash -python scripts/run_event_study.py \ - --events examples/political_events.example.csv \ - --prices examples/price_history.example.csv \ - --windows 1,2 \ - --output data/output/event_study.example.csv -``` - -运行测试: +从全新 clone 开始: ```bash +python -m pip install -e . python -m pytest -q ``` -## Live pipeline 说明 - -RSS fetcher 会把每个源的成功/失败写入 `data/live/source_fetch_status.json`,并在 `data/live/source_manifest.json` 保留 live CSV 的 hash、行数和数据质量摘要。manifest 会记录 feed 健康度、覆盖 symbol、置信度分布和事件类型分布。单个源被屏蔽或临时不可用时,不应阻断其他官方源刷新。 +如果命令需要凭据,请先阅读相关 workflow 或 runbook,并把密钥配置在 Git 之外。 -如果 RSS workflow 能拉到 `source_items.csv`,但 `source_events.csv` 为空,通常不是新闻没跑,而是确定性 alias 覆盖不足:很多政策文件只写“grid infrastructure”“HBM”“foundry”“AI server”“crypto assets”这类主题词,不直接写公司名。后续补 alias 要保持克制、可审计,避免把所有宽泛政策新闻都误映射成公司事件。 +## 部署和运行 -本机 macOS Python 也可能因为本地 CA 证书问题导致 HTTPS RSS 拉取失败;GitHub-hosted runner 使用正常 CA bundle。本地排查时可以先修 Python 证书,也可以直接从 GitHub Actions 下载 `source_items.csv` artifact 后验证抽取链路。 +使用配置好的来源列表运行采集流水线,检查 source_items 和 source_events 后,仅发布给下游研究使用。 -## 研究判断 +建议先手工运行或 dry-run。只有在日志、产物、权限和回滚步骤都检查过之后,才启用定时任务或 live 执行。 -这类“追踪效果”可以拆成三个可验证问题: +## 策略表现与证据边界 -1. **能不能第一时间知道谁进入观察池**:需要结构化公开披露和政策/持仓来源。 -2. **能不能捕捉公开点名**:需要按时间记录官方讲话、公告、新闻稿和媒体 lead。 -3. **点名后是否有可交易的统计优势**:需要事件研究和样本外验证,不能只看少数轶事案例。 +这不是交易策略仓库。质量主要看来源可追溯、事件归一化、时效性和可审阅性。 -本仓库先解决前两步的数据结构和复盘框架;第三步需要更多点位和真实行情输入。后续如果需要 LLM 处理长文本,只能作为可替换的抽取工具,不能把模型判断结果写成核心信号合同。 +README 不应该承诺固定收益或过期指标。实际使用前,请重新运行对应测试、回测或流水线任务。 -免费数据源配置见 [docs/free_source_setup.zh-CN.md](docs/free_source_setup.zh-CN.md)。 +## 安全注意事项 -## 短线事件层边界 +- 不要把 API key、券商凭据、OAuth token、Cookie 或账户标识提交到 Git。 +- 新策略或平台变更在 live 前必须先跑 dry-run 或 paper 流程。 +- 启用定时任务前,需要人工检查生成的订单、产物和日志。 -本仓库是短线事实事件输入层,服务 Advisor 的 `1-10个交易日` 事件催化判断。它只输出来源、日期、事件类型和置信度,不直接输出短线买卖推荐;最终短/中/长线推荐仍由 `QuantAdvisorResearch` 合成。 +## 参与贡献 -## 跨板块来源原则 +请保持改动小、可复现,并用最小必要测试覆盖。涉及策略的改动,需要附上验证行为的证据产物或命令。 -稳定源不局限于 AI 板块。半导体、数据中心电力、网络安全、国防、能源、金融、医疗、消费平台、工业和 EV/汽车等方向,只要有 SEC、官方政策、发行人公告、政府采购或其他一手来源,都可以进入同一套 `source_items.csv` / `source_events.csv` 结构。 +## 许可证 -主题归属和长期语义判断由 `ResearchSignalContextPipelines` 维护;本仓库只负责点时事实证据,避免因为短期热点临时改变采集边界。 +如仓库包含 [LICENSE](LICENSE),请以该文件为准。