Lab · ML Experiments

ML — Pattern Discovery

Inverted workflow: find conditional edges in BTC data first, build strategies second.
55 experiments

Vol-forecast calibration check

Done
2026-05-19 calibrationforecastproduction-gate
Hypothesis
The walk-forward GBM forecasts at the 4h horizon are well-calibrated: low mean bias, reliability close to the y=x line across deciles, ~50% of realisations exceeding the forecast (median calibration), and sensible tail behaviour.
Verdict
**SHIP WITH CORRECTION** — 3/4 gates pass. Failing: median calibration (over_rate 0.45-0.55). Apply a simple bias correction (multiplicative scalar = mean_act / mean_pred = 1.0442) inside the sizing helper before live use.
n_obs
45,984
gates_total
4
gates_passed
3
rel_bias_pct
-4.2284
mean_pred_ann
+0.5036
over_rate_pct
+44.1023
max_decile_err
+0.0611
mean_actual_ann
+0.5258
top_decile_hit_pct
+60.0348
bottom_decile_hit_pct
+64.0139

Vol-forecast calibration check

2026-05-19 · status: done · 7.3s

Hypothesis: The walk-forward GBM forecasts at the 4h horizon are well-calibrated: low mean bias, reliability close to the y=x line across deciles, ~50% of realisations exceeding the forecast (median calibration), and sensible tail behaviour.

Verdict: SHIP WITH CORRECTION — 3/4 gates pass. Failing: median calibration (over_rate 0.45-0.55). Apply a simple bias correction (multiplicative scalar = mean_act / mean_pred = 1.0442) inside the sizing helper before live use.

Key metrics

metric value
n_obs 45,984
mean_pred_ann +0.5036
mean_actual_ann +0.5258
rel_bias_pct -4.2284
over_rate_pct +44.1023
max_decile_err +0.0611
top_decile_hit_pct +60.0348
bottom_decile_hit_pct +64.0139
gates_passed 3
gates_total 4

Approach

Diagnostics on 45,984 OOS forecasts (GBM, 4h horizon, walk-forward 12mo train / 3mo test, 21 windows). All numbers below are annualised σ (so 0.50 = 50% annual vol).

1. Bias

  • Mean forecast: 0.5036 (= 50.36% ann)
  • Mean actual: 0.5258 (= 52.58% ann)
  • Absolute bias (forecast − actual): -0.0222
  • Relative bias: -4.23%
  • Median forecast vs median actual: 0.4591 vs 0.4421

Interpretation: |relative bias| under 5% is acceptable for sizing use; > 10% would require recalibration before live.

2. Reliability — decile bins

Sorted by forecast decile (decile 0 = lowest forecasts, decile 9 = highest). Within each decile we compare mean forecast vs mean realised. Perfect calibration: pred_meanact_mean in every bin.

bin n pred_mean act_mean pred_p50 act_p50 abs_err rel_err_pct
0 4599 0.1815 0.1974 0.1893 0.1797 -0.016 -8.08
1 4598 0.2638 0.2784 0.2643 0.2539 -0.0146 -5.24
2 4598 0.3217 0.3366 0.3222 0.3075 -0.0148 -4.41
3 4599 0.3752 0.3917 0.3751 0.3607 -0.0165 -4.21
4 4598 0.4315 0.4397 0.4319 0.4037 -0.0082 -1.86
5 4598 0.4864 0.4971 0.4863 0.4567 -0.0106 -2.14
6 4599 0.5452 0.5596 0.5446 0.5161 -0.0144 -2.58
7 4598 0.62 0.6427 0.6191 0.5968 -0.0227 -3.53
8 4598 0.7294 0.7728 0.7261 0.7171 -0.0434 -5.62
9 4599 1.0809 1.1419 0.9742 0.9963 -0.0611 -5.35

3. Coverage

  • P(actual > forecast): 44.10% (target ≈ 50% if forecast is unbiased median)
  • P(actual > 1.5× forecast) — model under-shoots big: 8.90%
  • P(actual < 0.67× forecast) — model over-shoots calm: 8.17%

4. Tail behaviour

High-vol tail (forecast top-10%, threshold = 0.810)

  • Mean actual when forecast in top-10%: 1.142 (global mean 0.526, top-10% threshold of actuals 0.914)
  • Hit rate — fraction of high-forecast cases where actual was also in the top-10%: 60.0% (random baseline = 10%)
  • Reverse: mean forecast when actual is in top-10%: 0.964

Low-vol tail (forecast bottom-10%, threshold = 0.231)

  • Mean actual when forecast in bottom-10%: 0.197
  • Hit rate for bottom decile: 64.0% (random baseline = 10%)
  • Reverse: mean forecast when actual is in bottom-10%: 0.223

reliability

ratio dist

ts residual

Production gates

gate pass? actual

| bias <5% | OK | -4.23% |

| median calibration (over_rate 0.45-0.55) | FAIL | 44.1% |

| max decile error <0.10 | OK | 0.0611 |

| tail hit-rate >30% | OK | 60.0% |

Passed: 3/4