Overview & Experiments 55 Synthesis Roadmap Lookahead Audit

Data Overview

entry

What happens here

Descriptive overview of the 1m BTC history: data quality, return distributions, vol regimes, time-of-day, funding, drawdowns. Every hypothesis that stands out here gets tested in its own experiment under Experiments walk-forward validated.

BTCUSDT — Data Overview

🎯 Mai-2026-Welle ABGESCHLOSSEN: Roadmap · Synthesis-Report — 9 Experimente, 4 promoted (BOCPD, ETF-Flow, DVOL/VRP, Master-LGBM, R²+10.6pp), 2 pursue, 3 dropped.

Generated by ml/overview.py on 2026-05-17 09:32 UTC. Source: Binance USDM 1m futures via backtesting/data/.

This report is the entry point for the ml/ pattern-discovery module. It is descriptive, not prescriptive — every conditional edge spotted here must be re-validated walk-forward in ml/experiments/ before feeding a strategy.

1. Data coverage

Range: 2020-01-01T00:00:00+00:00 → 2026-04-28T23:59:00+00:00 (2,309 days, 6.3 years)
Bars: 3,326,400 (expected 3,326,400, completeness 100.000%)
Gaps > 60s: 0 (max gap 60s)
Zero-volume bars: 369 (0.0111%)
Price range: $3,707 → $126,087
Funding: 2023-05-12 → 2026-05-13 (25,773 settlements)

Per-year coverage

timestamp	bars	zero_vol_bars	first	last	close_lo	close_hi
2020	527040	2	2020-01-01	2020-12-31	3706.96	29336
2021	525600	59	2021-01-01	2021-12-31	28180	69154.9
2022	525600	64	2022-01-01	2022-12-31	15502	48143
2023	525600	118	2023-01-01	2023-12-31	16497.1	44745
2024	527040	89	2024-01-01	2024-12-31	38560.6	108225
2025	525600	37	2025-01-01	2025-12-31	74585.8	126087
2026	169920	0	2026-01-01	2026-04-28	60003.9	97794.3

2. Return distributions across horizons

Log returns at multiple horizons. Vol_annual is the per-bar std scaled by sqrt(periods_per_year). High excess kurtosis indicates fat tails; positive skew means more positive shocks than negative.

horizon	n_obs	mean	std	vol_annual	skew	kurt_excess	p01	p05	p50	p95	p99	tail_3sigma_pct
1m	3,326,399	0	0.0009	0.6741	-0.2411	193.04	-0.0025	-0.0012	0	0.0012	0.0025	1.4976
5m	665,279	0	0.002	0.6579	-1.0969	140.5	-0.0056	-0.0026	0	0.0026	0.0056	1.6115
15m	221,759	0	0.0034	0.6451	-0.3687	92.5615	-0.0097	-0.0045	0	0.0045	0.0096	1.6784
1h	55,439	0	0.0067	0.6274	-0.8929	63.0225	-0.0196	-0.0091	0.0001	0.0091	0.0191	1.7966
4h	13,859	0.0002	0.0131	0.6113	-0.7419	18.2072	-0.0392	-0.0193	0.0002	0.0193	0.038	1.9193
1d	2,309	0.001	0.0326	0.6229	-1.4664	24.825	-0.088	-0.0481	0.0005	0.05	0.0923	1.5158

1h return histogram

3. Volatility regimes over time

Annualised realized vol from 1m close-to-close returns, rolling over different windows. Look for regime breaks and clustering.

Annualised vol per year

timestamp	ann_vol_1d_avg	ann_vol_1d_min	ann_vol_1d_max	biggest_1d_move_pct
2020	0.642	0.116	7.254	72.185
2021	0.869	0.38	4.521	38.705
2022	0.595	0.076	2.467	21.653
2023	0.402	0.049	1.593	18.62
2024	0.511	0.106	2.142	20.889
2025	0.418	0.086	1.594	15.936
2026	0.486	0.103	1.637	19.467

realized vol

4. Autocorrelation: returns vs |returns|

Returns themselves should have near-zero autocorrelation (efficient market). |returns| typically show strong positive autocorrelation (volatility clustering). The gap between them is the signature of GARCH-style dynamics.

ACF

Lag-1 acf(return) = -0.0453 (mean-reversion)
Lag-1 acf(|return|) = +0.3851 (strong vol clustering if > 0.1)

5. Time-of-day & day-of-week effects

Mean and std of forward 1h log returns grouped by UTC hour and by weekday. t-stat = mean / (std / sqrt(n)) — rough significance check.

⚠️ Caveat: consecutive 1m bars produce overlapping 60-bar forward windows, so observations within the same hour are heavily correlated. The reported n overstates the effective sample size by roughly 60×, which inflates t-stats. Treat large t-stats as a screening signal — confirm walk-forward in ml/experiments/ with non-overlapping samples before believing.

By UTC hour

hour	mean_bps	std	count	t_stat
0	0.4	0.00701	138600	2.1
1	-1.36	0.00659	138600	-7.68
2	-1.29	0.00709	138600	-6.77
3	-0.78	0.00539	138600	-5.39
4	-0.43	0.00532	138600	-3.02
5	1.21	0.00519	138600	8.65
6	1.01	0.00556	138600	6.77
7	2	0.00588	138600	12.68
8	-0.04	0.00611	138600	-0.24
9	0.11	0.0063	138600	0.64
10	0.41	0.00612	138600	2.51
11	1.14	0.00641	138600	6.65
12	1.85	0.00745	138600	9.25
13	-0.65	0.00807	138600	-3.02
14	-0.08	0.00854	138600	-0.33
15	0.68	0.00748	138600	3.36
16	-1.01	0.00685	138600	-5.51
17	-0.06	0.00646	138600	-0.34
18	0.56	0.0065	138600	3.2
19	0.54	0.00699	138600	2.85
20	3.51	0.00667	138600	19.6
21	3.58	0.00647	138600	20.61
22	0.39	0.00691	138600	2.11
23	-1.46	0.00731	138540	-7.42

By day-of-week (Mon-Sun)

	mean_bps	std	count	t_stat
Mon	1.62	0.00733	475200	15.25
Tue	0.24	0.00666	475140	2.48
Wed	2.05	0.00707	475200	20
Thu	-1.55	0.00724	475200	-14.78
Fri	0.51	0.00743	475200	4.71
Sat	-0.03	0.0049	475200	-0.45
Sun	0.15	0.00556	475200	1.83

hour of day

6. Volume & activity patterns

Where does the action concentrate? Volume rhythms hint at when liquidity providers vs. takers dominate. Also a sanity check for time-of-day return effects.

volume

Avg volume per 1m bar by UTC hour

timestamp	avg_vol
0	223.77
1	193.69
2	176.29
3	160.65
4	155.61
5	155.74
6	168.51
7	186.4
8	211.37
9	202.09
10	207.19
11	210.28
12	280.38
13	329.89
14	389
15	359.16
16	330.08
17	268.79
18	253.77
19	237.39
20	225.27
21	182.8
22	181.35
23	179.59

7. Funding rate analysis

Coverage: 2023-05-12 → 2026-04-28 (1,559,519 bars with funding attached)
Mean per 8h: +0.153 bps
Std per 8h: 0.417 bps
Annualised (×1095): mean 1.68% / std 4.56%
Min / Max single settlement: -8.18 bps / +5.98 bps
% bars with negative funding: 13.81%

Forward-24h return by funding quintile (sampled at 8h settlements)

Quintiles use rank-based ties when funding-rate buckets collide on the default value. |t-stat| > 2 suggests the bucket's mean is unlikely zero — but settlements 24h apart can overlap, so treat as a hint not proof.

	mean_bps	std	count	t_stat
Q1 (-0.0018180000000000002, 5.86e-06]	29.8	0.021	649	3.62
Q2 (5.86e-06, 1.25e-05]	-3.1	0.0238	1809	-0.56
Q3 (1.25e-05, 2.41e-05]	-16.2	0.0248	138	-0.77
Q4 (2.41e-05, 0.000453]	31.5	0.0249	649	3.22

funding

8. Bull / bear regimes & drawdowns

Top-10 drawdown episodes (peak-to-trough within sample)

start	end	length_days	max_dd_pct	peak_price	trough_price
2021-11-10	2024-03-08	849	-77.6%	$69,155	$15,502
2020-02-13	2020-07-27	165	-64.8%	$10,535	$3,707
2021-04-14	2021-10-20	189	-55.6%	$64,945	$28,860
2025-10-06	2026-04-28	204	-52.4%	$126,087	$60,004
2024-03-14	2024-11-06	236	-33.2%	$73,859	$49,353
2025-01-20	2025-05-21	121	-31.9%	$109,533	$74,586
2021-01-08	2021-02-08	30	-31.2%	$42,048	$28,908
2021-02-21	2021-03-13	19	-26.2%	$58,460	$43,159
2020-08-17	2020-10-21	64	-20.8%	$12,474	$9,882
2021-01-03	2021-01-06	2	-19.1%	$34,822	$28,180

Biggest 10 daily moves (non-overlapping 1440m windows)

Rank	Up date	Up %	Down date	Down %
1	2021-02-09	+18.22%	2020-03-13	-48.96%
2	2020-03-20	+13.72%	2022-06-14	-17.59%
3	2022-03-01	+13.65%	2022-11-10	-15.67%
4	2020-03-14	+12.66%	2026-02-06	-14.91%
5	2020-04-30	+12.38%	2021-01-22	-14.86%
6	2020-03-24	+12.37%	2021-05-20	-14.52%
7	2024-08-09	+11.22%	2021-05-13	-13.84%
8	2020-07-28	+11.14%	2022-05-10	-12.25%
9	2021-06-10	+11.10%	2021-09-08	-11.63%
10	2026-02-07	+10.96%	2021-06-22	-11.56%

drawdown

9. Where to dig next

Concrete hypotheses worth testing in ml/experiments/ after reading this:

Vol-clustering exploit: ACF(|returns|) is the strongest non-zero autocorrelation we have. Test: predict realized vol over the next h hours from features, use that to size positions or filter entries.
Time-of-day conditional return: if any hour shows |t-stat| > 3 (see §5), check whether the effect is stable across walk-forward windows — overlapping samples make raw t-stats optimistic.
Funding extreme reversal: most-negative funding quintile vs. forward 24h return (see §7). Classical 'long when shorts are paying' thesis — test if it holds with walk-forward.
Regime clustering: HMM or k-means on (rv_1d, ret_24h, vol_z_1d) to find 3–5 distinct market states. Then look at conditional forward returns per state.
Volume-shock mean reversion: bars with vol_z_1h > 3σ — do they predict short-term reversal or continuation?
Range-compression breakout: low hl_range over N bars followed by directional move — quantify base rate and expectancy.

Each of these gets its own folder under ml/experiments/ with a README.md, the code, IC + bootstrap CI numbers, and a verdict.