Regime clustering (k-means)
InconclusiveHypothesis
Unsupervised k-means clustering on (rv_1d, ret_24h, vol_z_1d, dd_7d) yields market regimes with materially different forward-24h returns walk-forward.
Verdict
**INCONCLUSIVE** — spread between best and worst regime is 32.0 bps but t-stat (-0.68) too weak. The clusters separate the data but the OOS return differences are noisier than they look in-sample.
n_windows
21
k_clusters
4
best_regime
2
best_t_stat
-0.6781
best_mean_bps
-18.5632
regime_spread_bps
+32.0167
Regime clustering (k-means)
2026-05-17 · status: inconclusive · 4.0s
Hypothesis: Unsupervised k-means clustering on (rv_1d, ret_24h, vol_z_1d, dd_7d) yields market regimes with materially different forward-24h returns walk-forward.
Verdict: INCONCLUSIVE — spread between best and worst regime is 32.0 bps but t-stat (-0.68) too weak. The clusters separate the data but the OOS return differences are noisier than they look in-sample.
Key metrics
| metric | value |
|---|---|
| k_clusters | 4 |
| best_regime | 2 |
| best_mean_bps | -18.5632 |
| best_t_stat | -0.6781 |
| regime_spread_bps | +32.0167 |
| n_windows | 21 |
Approach
Features: rv_1d_ann, ret_24h, vol_z_1d, dd_7d. Daily observations sampled at 00:00 UTC (2,308 obs). Per walk-forward window we fit a StandardScaler + KMeans(k=4) on the training segment, then predict cluster labels on the held-out test segment.
Walk-forward windows: 21
Pooled OOS per regime (sorted by trailing vol, low → high)
| k_sorted | mean_bps | se_bps | t_stat | n_windows | total_obs | avg_centroid_rv | avg_centroid_ret | avg_centroid_dd |
|---|---|---|---|---|---|---|---|---|
| 0 | -7.43 | 16.86 | -0.44 | 21 | 590 | 0.424 | 0.0038 | -0.037 |
| 1 | 10.72 | 11.33 | 0.95 | 21 | 831 | 0.502 | 0.0017 | -0.04 |
| 2 | -18.56 | 27.37 | -0.68 | 21 | 272 | 0.812 | 0.0198 | -0.074 |
| 3 | 13.45 | 21.81 | 0.62 | 18 | 223 | 1.097 | -0.0238 | -0.142 |


