Overview & Experiments 55 Synthesis Roadmap Lookahead Audit

BOCPD Filter — Definitive Shuffle Test

Inconclusive

2026-05-19 validationshufflesanityfilterproduction-gate

Hypothesis

If the BOCPD p_short signal carries real information about trade quality, the +6.9–17.0% PnL lift from bocpd_filter@0.3 should disappear when we shuffle the p_short → timestamp mapping. If shuffled runs show similar lift, the C7 'edge' was just trade-count-reduction with re-routing benefit.

Verdict

**MIXED** — 2/3 strategies show real BOCPD signal (z>1.5). 1 look like noise, 0 show anti-signal. Consider deploying selectively or pause until more data.

n_anti

n_signal

n_neutral

n_strategies

n_shuffle_seeds

shuf_also_lifts

BB_EXTREME_1_z_score

+7.2970

BB_EXTREME_2_z_score

+6.4999

BB_EXTREME_3_z_score

+1.0550

BB_EXTREME_1_real_lift_pct

+12.6457

BB_EXTREME_2_real_lift_pct

+17.0414

BB_EXTREME_3_real_lift_pct

+6.8836

BB_EXTREME_1_shuf_mean_lift_pct

+0.1612

BB_EXTREME_2_shuf_mean_lift_pct

-1.7855

BB_EXTREME_3_shuf_mean_lift_pct

-4.9936

BOCPD Filter — Definitive Shuffle Test

2026-05-19 · status: inconclusive · 100.0s

Hypothesis: If the BOCPD p_short signal carries real information about trade quality, the +6.9–17.0% PnL lift from bocpd_filter@0.3 should disappear when we shuffle the p_short → timestamp mapping. If shuffled runs show similar lift, the C7 'edge' was just trade-count-reduction with re-routing benefit.

Verdict: MIXED — 2/3 strategies show real BOCPD signal (z>1.5). 1 look like noise, 0 show anti-signal. Consider deploying selectively or pause until more data.

Key metrics

metric	value
n_strategies	`3`
n_shuffle_seeds	`5`
n_signal	`2`
n_neutral	`1`
n_anti	`0`
shuf_also_lifts	`0`
BB_EXTREME_3_real_lift_pct	`+6.8836`
BB_EXTREME_3_shuf_mean_lift_pct	`-4.9936`
BB_EXTREME_3_z_score	`+1.0550`
BB_EXTREME_1_real_lift_pct	`+12.6457`
BB_EXTREME_1_shuf_mean_lift_pct	`+0.1612`
BB_EXTREME_1_z_score	`+7.2970`
BB_EXTREME_2_real_lift_pct	`+17.0414`
BB_EXTREME_2_shuf_mean_lift_pct	`-1.7855`
BB_EXTREME_2_z_score	`+6.4999`

Approach

For each of the 3 BB-Extreme strategies we run the FULL backtest (with re-routing) over 2023-01-01 → 2026-04-01 with three source configurations:

baseline_no_filter: bocpd_filter not active (θ=1.0)
real: bocpd_filter active at θ=0.3, real BOCPD parquet
shuffled (×5): bocpd_filter active at θ=0.3, p_short values randomly permuted across timestamps (same value distribution)

If the lift from real-filter is genuine BOCPD information, shuffled runs should show lift ≈ 0 (= baseline-pnl). If shuffled runs lift similarly, the lift comes from trade-count-reduction + re-routing, not from BOCPD.

Results

strategy	baseline_pnl	real_filter_pnl	shuf_filter_mean_pnl	shuf_filter_min_pnl	shuf_filter_max_pnl	real_lift_pct	shuf_mean_lift_pct	real_minus_shuf_pnl	z_score
BB_EXTREME_3	11.42	12.21	10.85	8.57	11.56	6.88	-4.99	1.36	1.06
BB_EXTREME_1	15.3	17.23	15.32	14.94	15.59	12.65	0.16	1.91	7.3
BB_EXTREME_2	17.66	20.67	17.34	16.55	17.84	17.04	-1.79	3.32	6.5

Reading the table: real_lift_pct = lift of real-source filter over baseline. shuf_mean_lift_pct = average lift of shuffled-source filter over baseline (across 5 permutations). z_score = (real_pnl − shuf_mean) / shuf_std. |z| > 2 means the real signal is genuinely different from shuffled.

Per-run raw PnL

BB_EXTREME_3

source	shuffle_seed	n_trades	total_pnl
baseline_no_filter	-1	124	11.42
real	-1	95	12.21
shuffled	0	121	11.56
shuffled	1	123	11.47
shuffled	2	123	11.5
shuffled	3	121	11.18
shuffled	4	125	8.57

BB_EXTREME_1

source	shuffle_seed	n_trades	total_pnl
baseline_no_filter	-1	165	15.3
real	-1	146	17.23
shuffled	0	165	14.94
shuffled	1	162	15.51
shuffled	2	165	15.39
shuffled	3	166	15.2
shuffled	4	162	15.59

BB_EXTREME_2

source	shuffle_seed	n_trades	total_pnl
baseline_no_filter	-1	139	17.66
real	-1	115	20.67
shuffled	0	138	17.55
shuffled	1	137	17.84
shuffled	2	138	17.63
shuffled	3	137	17.14
shuffled	4	137	16.55

shuffle comparison

Verdict

MIXED — 2/3 strategies show real BOCPD signal (z>1.5). 1 look like noise, 0 show anti-signal. Consider deploying selectively or pause until more data.