Master-LightGBM — kitchen-sink 4h vol forecast
PromotedMaster-LightGBM — kitchen-sink 4h vol forecast
2026-05-20 · status: promoted · 39.1s
Hypothesis: A LightGBM regressor on a unified causal feature panel (lagged returns, multi-window RV, funding, BOCPD p_short, HMM p_state1, VRP, stablecoin Δ7d, ETF flow, DXY z-score, time) beats HAR-RV on OOS log-vol R² at 4h by ≥ 3 pp.
Verdict: PROMOTE — LightGBM lifts 4h vol-forecast R² by +10.96 pp vs HAR-RV baseline (0.5541 → 0.6636). Top features by importance: log_rv_7d_ann, iv_ann, hour_cos. Replace HAR-RV with LGBM in
ml/forecast/after a clean production run.
Key metrics
| metric | value |
|---|---|
| pooled_R2_persistence | +0.4061 |
| pooled_R2_HAR_RV | +0.5541 |
| pooled_R2_LGBM_all | +0.6636 |
| pooled_R2_LGBM_new | +0.5009 |
| pooled_IC_HAR_RV | +0.7258 |
| pooled_IC_LGBM_all | +0.8142 |
| lift_pp_lgbm_vs_har | +10.9555 |
| n_windows | 13 |
| n_features_full | 27 |
| n_features_new | 7 |
| top_3_features | ['log_rv_7d_ann', 'iv_ann', 'hour_cos'] |
Approach
Build a unified 1h causal panel with lagged returns (1/4/24/168h), multi-window realized vol (1h/4h/1d/7d, annualised), funding (rate, z-score, cum-1d), BOCPD p_short (from 15m, causal forward filter), HMM p_state1 (from 1h, re-fit per walk-forward split), VRP (DVOL annualised IV − trailing 4h RV), stablecoin Δ7d (1d shift), ETF flow (1d shift), and DXY 4h z-score.
Walk-forward 12mo train / 3mo test, embargo = 1440 min, starting 2022-01. Target = log of forward 4h annualised RV. Models: Persistence, HAR-RV (baseline), LightGBM-all (kitchen sink, native NaN handling), LightGBM-new-only (new features only).
Pooled OOS metrics
| model | R2_log | IC_spearman | n_oos |
|---|---|---|---|
| persistence | 0.4061 | 0.6751 | 28,464 |
| har_rv | 0.5541 | 0.7258 | 28,464 |
| lgbm_all | 0.6636 | 0.8142 | 28,464 |
| lgbm_new | 0.5009 | 0.686 | 28,464 |
Lift over HAR-RV baseline
-
LightGBM-all R² lift: +10.96 pp (0.5541 → 0.6636)
-
LightGBM-new-only R²: +0.5009
Per-window R² (13 windows)
| window | n | pers_r2 | har_r2 | lgbm_all_r2 | lgbm_new_r2 |
|---|---|---|---|---|---|
| 2023-01-02 → 2023-04-02 | 2160 | 0.3337 | 0.5331 | 0.6242 | 0.3803 |
| 2023-04-02 → 2023-07-02 | 2184 | -0.0239 | 0.2852 | 0.3932 | 0.1875 |
| 2023-07-02 → 2023-10-02 | 2208 | 0.3143 | 0.4665 | 0.5786 | 0.4058 |
| 2023-10-02 → 2024-01-02 | 2208 | 0.1651 | 0.4091 | 0.5383 | 0.2894 |
| 2024-01-02 → 2024-04-02 | 2184 | 0.3579 | 0.5275 | 0.6297 | 0.4504 |
| 2024-04-02 → 2024-07-02 | 2184 | 0.4061 | 0.55 | 0.6379 | 0.5249 |
| 2024-07-02 → 2024-10-02 | 2208 | 0.2171 | 0.4311 | 0.5577 | 0.3789 |
| 2024-10-02 → 2025-01-02 | 2208 | 0.3484 | 0.4802 | 0.6892 | 0.4954 |
| 2025-01-02 → 2025-04-02 | 2160 | 0.469 | 0.5895 | 0.7042 | 0.5417 |
| 2025-04-02 → 2025-07-02 | 2184 | 0.3325 | 0.5167 | 0.6393 | 0.4447 |
| 2025-07-02 → 2025-10-02 | 2208 | 0.2737 | 0.4577 | 0.6344 | 0.45 |
| 2025-10-02 → 2026-01-02 | 2208 | 0.3028 | 0.482 | 0.655 | 0.4616 |
| 2026-01-02 → 2026-04-02 | 2160 | 0.3788 | 0.5446 | 0.7147 | 0.5373 |
Feature importance (LightGBM-all, mean gain across folds)
| feature | mean_gain | max_gain |
|---|---|---|
| log_rv_7d_ann | 418.8 | 543 |
| iv_ann | 362.4 | 423 |
| hour_cos | 355.5 | 400 |
| log_rv_1d_ann | 317.2 | 416 |
| dow | 317.2 | 351 |
| ret_7d | 314.6 | 378 |
| stablecoin_d7 | 314.5 | 367 |
| ret_24h | 268.8 | 305 |
| hour_sin | 249.7 | 286 |
| funding_cum_1d | 198 | 306 |
| cvd_divergence_4h | 193.2 | 258 |
| funding_z_30d | 169.6 | 280 |
| log_rv_1h_ann | 163.3 | 201 |
| n_whale_trades_4h | 155.2 | 190 |
| ret_4h | 153.1 | 204 |
| etf_flow | 152.5 | 339 |
| vrp | 145.3 | 252 |
| hmm_p_state1 | 141.4 | 236 |
| range_4h | 138.5 | 190 |
| n_large_trades_4h | 136.2 | 174 |
| bocpd_p_short | 118 | 143 |
| log_rv_4h_ann | 109.7 | 138 |
| ret_1h | 99.5 | 116 |
| vol_z_1d | 97.1 | 133 |
| log_vol | 89.7 | 139 |
| dxy_z_4h | 88.9 | 172 |
| funding_rate | 73.2 | 139 |

