This document records the current external-data validation state after multiple real reruns and profile sweeps.
Current top-line conclusion:
Binance-onlyis still the formal production default.- the best external-data variant is now much closer to production quality than earlier versions.
- but it still does not win clearly enough across
30 / 60 / 90horizon quality metrics to replaceBinance-onlyas the default live mode. - the best current experimental profile is
external_data_core_only_no_doge.
The external backfill policy now has two tiers.
Core backfill whitelist:
BTCUSDTETHUSDTXRPUSDTLTCUSDTBCHUSDTTRXUSDTADAUSDTSOLUSDT
Cautious backfill whitelist:
DOGEUSDT
Current default behavior:
- core symbols are eligible for merge
- cautious symbols are tracked and reported
- cautious symbols are not merged by default
This is controlled by:
external_data.merge_cautious_symbols: false
So the current default experimental version is effectively:
core_only_no_doge
DOGEUSDT still passes the raw mechanical quality checks:
- no duplicate-date issue
- monotonic time order is clean
- no large calendar gaps
- overlap consistency vs Binance is acceptable
- second-provider cross-check is acceptable
But it is now treated as a cautious_holdout, not an approved merge symbol.
Reason:
DOGEis materially more exposed to theme rotation, social amplification, and event-driven repricing than long-cycle majors such asETH,XRP, orLTC- even when its historical series is usable, it is less representative of a stable "core major" backfill asset
- the profile sweep showed that removing DOGE from active merge materially improved the external-data variant
Current report fields for DOGE:
whitelist_tier = cautiouscaution_reason = theme_and_sentiment_sensitive_meme_majorquality_status = cautious_holdoutfinal_decision = cautious_holdout
Primary backfill provider:
CryptoCompare histoday- role:
pre_binance - purpose: extend history backward before Binance coverage starts
Secondary cross-check provider:
CryptoDataDownloadexchange-archive daily CSVs- role:
crosscheck_history - purpose: cross-check the primary provider on whitelist majors only
Why this second provider is used instead of Yahoo:
- exchange-style daily CSV archives are more stable for long crypto history checks
- coverage is better for the current whitelist majors
- alignment is better with exchange OHLCV conventions
- it works better as a provider-divergence sanity check for this use case
Current exchange-archive mapping:
- Bitstamp daily CSVs for
BTC,ETH,XRP,LTC,BCH,DOGE - Bitfinex daily CSVs for
TRX,ADA,SOL
Every external candidate is evaluated with:
- whitelist membership
- duplicate-date check
- monotonic time check
- gap count
- max gap size
- overlap-period consistency vs Binance
- missing core field check
- suspicious jump count
- minimum pre-Binance extension length
- second-provider cross-check summary
The second provider is an enhanced quality signal.
It can reject only when the anomaly is clearly severe. Normal daily-close differences are tolerated so acceptable symbols are not falsely rejected for session-boundary reasons.
Artifact:
data/reports/external_data_quality_report.csv
Current decisions:
approved_core: 8 symbolsapproved_cautious: 0 symbolscautious_holdout: 1 symbolrejected: 0 symbols
Approved core symbols:
BTCUSDTETHUSDTXRPUSDTLTCUSDTBCHUSDTTRXUSDTADAUSDTSOLUSDT
Cautious holdout symbols:
DOGEUSDT
Rejected symbols:
- none in this round
Representative pre-Binance extension results:
| Symbol | Tier | Pre-Binance rows added | Cross-check status | Final decision | Notes |
|---|---|---|---|---|---|
| BTCUSDT | core | 1826 | pass | approved_core | |
| ETHUSDT | core | 1243 | pass | approved_core | |
| XRPUSDT | core | 1433 | pass | approved_core | |
| LTCUSDT | core | 1826 | pass | approved_core | |
| BCHUSDT | core | 849 | warn | approved_core | crosscheck_return_corr_warn |
| TRXUSDT | core | 484 | pass | approved_core | |
| ADAUSDT | core | 457 | pass | approved_core | |
| SOLUSDT | core | 123 | pass | approved_core | |
| DOGEUSDT | cautious | 0 merged | informational | cautious_holdout | theme_and_sentiment_sensitive_meme_major |
Interpretation:
- the eight core symbols now have approved pre-Binance extensions
DOGEUSDTremains visible in governance and reporting, but it no longer alters the active merged history setBCHUSDTstays approved, but the second provider still raises a warning rather than a reject
Everything outside the core whitelist remains Binance-only.
That includes:
NEARUSDTHBARUSDTSUIUSDTETCUSDTXLMUSDT- and the rest of the non-whitelist universe
In addition, the cautious tier currently stays Binance-only by default:
DOGEUSDT
The comparison uses a common evaluation start date of 2020-12-21.
Artifacts:
data/reports/binance_only_vs_external_data_summary.csvdata/reports/external_data_quality_report.csvdata/reports/external_data_symbol_coverage.csvdata/reports/binance_only/data/reports/external_data/data/output/binance_only/data/output/external_data/
final_score strategy:
| Metric | Binance-only | Current external-data | Delta |
|---|---|---|---|
| CAGR | 0.5334 | 0.6145 | +0.0811 |
| Sharpe | 0.9465 | 1.0115 | +0.0650 |
| Max Drawdown | -0.7677 | -0.7392 | +0.0285 |
| Turnover | 16.5716 | 16.4948 | -0.0768 |
Leader metrics:
| Metric | Binance-only | Current external-data | Delta |
|---|---|---|---|
| H30 Precision@N | 0.2131 | 0.2144 | +0.0012 |
| H30 Leader Capture | 0.1784 | 0.2119 | +0.0335 |
| H60 Precision@N | 0.2214 | 0.1975 | -0.0239 |
| H60 Leader Capture | 0.1774 | 0.1962 | +0.0189 |
| H90 Precision@N | 0.2179 | 0.1923 | -0.0256 |
| H90 Leader Capture | 0.1731 | 0.1577 | -0.0154 |
Interpretation:
- research CAGR improved
- research Sharpe improved
- drawdown improved
- turnover improved slightly
- H30 precision and capture improved
- H60 capture improved, but H60 precision fell
- H90 precision and H90 capture are still worse
Mean window metrics:
| Metric | Binance-only | Current external-data | Delta |
|---|---|---|---|
| H30 Precision | 0.2167 | 0.2183 | +0.0015 |
| H30 Leader Capture | 0.1843 | 0.2114 | +0.0271 |
| H60 Precision | 0.2217 | 0.2040 | -0.0178 |
| H60 Leader Capture | 0.1833 | 0.1991 | +0.0158 |
| H90 Precision | 0.2231 | 0.1977 | -0.0255 |
| H90 Leader Capture | 0.1935 | 0.1555 | -0.0380 |
| Mean Window Sharpe | 0.8810 | 0.9123 | +0.0312 |
| Mean Window Turnover | 16.9076 | 17.0003 | +0.0927 |
Interpretation:
- H30 precision improved
- H30 capture improved materially
- H60 capture improved, but H60 precision is still weaker
- H90 precision and H90 capture remain clearly worse
- mean walk-forward Sharpe improved
- turnover is now nearly flat vs Binance-only
Latest snapshot date: 2026-03-13
Latest live pools:
- Binance-only:
TRXUSDTETHUSDTBCHUSDTNEARUSDTLTCUSDT
- Current external-data:
TRXUSDTETHUSDTBCHUSDTLTCUSDTSOLUSDT
Interpretation:
- the current external-data live pool still differs only modestly from Binance-only
- the swap is now
SOLin andNEARout, which is a cleaner major-for-major change than earlier noisy variants DOGEUSDTdoes not enter the current live pool and does not alter the active merged-history set
Sweep artifact:
data/reports/external_profile_sweep_venv/profile_summary.csv
Tested profiles:
current_core_plus_dogecore_only_no_dogecore_no_doge_no_bchcore_no_doge_no_sol
Effective attempt:
- disabling DOGE active merge while keeping it as a cautious holdout
- this was the strongest external-data profile across research CAGR, research Sharpe, max drawdown, and mean walk-forward Sharpe
Ineffective attempts:
- removing
BCHUSDTfrom core - removing
SOLUSDTfrom core
Those stricter variants reduced the gains too much and did not fix the long-horizon precision weakness.
Current recommendation: still keep Binance-only as the formal production default.
Why:
- the current external-data version is now materially better than earlier external-data runs
- it improves research CAGR, research Sharpe, drawdown, and mean walk-forward Sharpe
- it improves H30 capture and H60 capture
- turnover is now nearly flat vs Binance-only
- but it still underperforms on H60 precision
- it still underperforms on H90 precision
- it still underperforms on H90 capture
That means the current external-data branch is now a serious experimental candidate, but it is not yet a clear enough winner across the full 30 / 60 / 90 objective to replace the default baseline.
Production decision:
- keep
Production v1 = Binance-only + core_major + monthly publish - keep
external-datain the repository asexperimental only - do not enable
external_data.enabledin the default production path
Recommended current usage:
- keep
Binance-onlyas the formal default production mode - keep the current external-data branch as the best experimental candidate
- allow merges only for the approved core list
- keep
DOGEUSDTvisible as a cautious holdout, not as an active merged symbol
- recover H60 precision without losing the recent H30/H60 capture gains
- recover H90 precision and H90 capture
- keep turnover near-flat while preserving the research / walk-forward Sharpe gains
- continue watching
BCHUSDTbecause its cross-check warning still persists - optionally add one more curated second-history source for the approved core majors before reconsidering default enablement
The external-data branch is now:
- tiered
- whitelist-controlled
- quality-gated
- second-provider cross-checked
- DOGE-safe by default
- and much closer to production quality than before
But the current best conclusion is still:
Binance-onlyremains the formal default production versionexternal-dataremains the best experimental version, but not yet the default- the current best experimental profile is
external_data_core_only_no_doge