BOCPD Filter — Definitive Shuffle Test
InconclusiveBOCPD Filter — Definitive Shuffle Test
2026-05-19 · status: inconclusive · 100.0s
Hypothesis: If the BOCPD p_short signal carries real information about trade quality, the +6.9–17.0% PnL lift from bocpd_filter@0.3 should disappear when we shuffle the p_short → timestamp mapping. If shuffled runs show similar lift, the C7 'edge' was just trade-count-reduction with re-routing benefit.
Verdict: MIXED — 2/3 strategies show real BOCPD signal (z>1.5). 1 look like noise, 0 show anti-signal. Consider deploying selectively or pause until more data.
Key metrics
| metric | value |
|---|---|
| n_strategies | 3 |
| n_shuffle_seeds | 5 |
| n_signal | 2 |
| n_neutral | 1 |
| n_anti | 0 |
| shuf_also_lifts | 0 |
| BB_EXTREME_3_real_lift_pct | +6.8836 |
| BB_EXTREME_3_shuf_mean_lift_pct | -4.9936 |
| BB_EXTREME_3_z_score | +1.0550 |
| BB_EXTREME_1_real_lift_pct | +12.6457 |
| BB_EXTREME_1_shuf_mean_lift_pct | +0.1612 |
| BB_EXTREME_1_z_score | +7.2970 |
| BB_EXTREME_2_real_lift_pct | +17.0414 |
| BB_EXTREME_2_shuf_mean_lift_pct | -1.7855 |
| BB_EXTREME_2_z_score | +6.4999 |
Approach
For each of the 3 BB-Extreme strategies we run the FULL backtest (with re-routing) over 2023-01-01 → 2026-04-01 with three source configurations:
- baseline_no_filter: bocpd_filter not active (θ=1.0)
- real: bocpd_filter active at θ=0.3, real BOCPD parquet
- shuffled (×5): bocpd_filter active at θ=0.3, p_short values randomly permuted across timestamps (same value distribution)
If the lift from real-filter is genuine BOCPD information, shuffled runs should show lift ≈ 0 (= baseline-pnl). If shuffled runs lift similarly, the lift comes from trade-count-reduction + re-routing, not from BOCPD.
Results
| strategy | baseline_pnl | real_filter_pnl | shuf_filter_mean_pnl | shuf_filter_min_pnl | shuf_filter_max_pnl | real_lift_pct | shuf_mean_lift_pct | real_minus_shuf_pnl | z_score |
|---|---|---|---|---|---|---|---|---|---|
| BB_EXTREME_3 | 11.42 | 12.21 | 10.85 | 8.57 | 11.56 | 6.88 | -4.99 | 1.36 | 1.06 |
| BB_EXTREME_1 | 15.3 | 17.23 | 15.32 | 14.94 | 15.59 | 12.65 | 0.16 | 1.91 | 7.3 |
| BB_EXTREME_2 | 17.66 | 20.67 | 17.34 | 16.55 | 17.84 | 17.04 | -1.79 | 3.32 | 6.5 |
Reading the table: real_lift_pct = lift of real-source filter over baseline. shuf_mean_lift_pct = average lift of shuffled-source filter over baseline (across 5 permutations). z_score = (real_pnl − shuf_mean) / shuf_std. |z| > 2 means the real signal is genuinely different from shuffled.
Per-run raw PnL
BB_EXTREME_3
| source | shuffle_seed | n_trades | total_pnl |
|---|---|---|---|
| baseline_no_filter | -1 | 124 | 11.42 |
| real | -1 | 95 | 12.21 |
| shuffled | 0 | 121 | 11.56 |
| shuffled | 1 | 123 | 11.47 |
| shuffled | 2 | 123 | 11.5 |
| shuffled | 3 | 121 | 11.18 |
| shuffled | 4 | 125 | 8.57 |
BB_EXTREME_1
| source | shuffle_seed | n_trades | total_pnl |
|---|---|---|---|
| baseline_no_filter | -1 | 165 | 15.3 |
| real | -1 | 146 | 17.23 |
| shuffled | 0 | 165 | 14.94 |
| shuffled | 1 | 162 | 15.51 |
| shuffled | 2 | 165 | 15.39 |
| shuffled | 3 | 166 | 15.2 |
| shuffled | 4 | 162 | 15.59 |
BB_EXTREME_2
| source | shuffle_seed | n_trades | total_pnl |
|---|---|---|---|
| baseline_no_filter | -1 | 139 | 17.66 |
| real | -1 | 115 | 20.67 |
| shuffled | 0 | 138 | 17.55 |
| shuffled | 1 | 137 | 17.84 |
| shuffled | 2 | 138 | 17.63 |
| shuffled | 3 | 137 | 17.14 |
| shuffled | 4 | 137 | 16.55 |

Verdict
MIXED — 2/3 strategies show real BOCPD signal (z>1.5). 1 look like noise, 0 show anti-signal. Consider deploying selectively or pause until more data.