Overview & Experiments 55 Synthesis Roadmap Lookahead Audit

BOCPD Filter Validation — Walk-Forward, Threshold Grid, Size, Sanity

Dropped

2026-05-19 validationwalk-forwardthresholdsanityfilter

Hypothesis

The +6.9–17.0% PnL lift on BB_EXTREME strategies from `bocpd_filter` (Exp C7) survives (a) walk-forward partitioning into 4 windows, (b) a finer threshold grid, and (c) a shuffled-p_short sanity check.

Verdict

**NOT ROBUST** — no BB-Extreme strategy survives both walk-forward and sanity gates simultaneously. The C7 lift is likely period-specific or a lucky cut. Don't deploy. Failed WF: [], failed sanity: ['BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2'].

bb_promoted

[]

bb_failed_wf

[]

n_wf_windows

bb_failed_sanity

['BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2']

EMA_MACD_sanity_z

-0.3334

EMA_PURE_sanity_z

-1.4410

n_threshold_levels

HOLY_GRAIL_sanity_z

+1.7957

n_strategies_checked

BB_EXTREME_1_sanity_z

-1.1602

BB_EXTREME_2_sanity_z

-0.8764

BB_EXTREME_3_sanity_z

+0.8112

BOCPD Filter Validation — Walk-Forward, Threshold Grid, Size, Sanity

2026-05-19 · status: dropped · 178.6s

Hypothesis: The +6.9–17.0% PnL lift on BB_EXTREME strategies from bocpd_filter (Exp C7) survives (a) walk-forward partitioning into 4 windows, (b) a finer threshold grid, and (c) a shuffled-p_short sanity check.

Verdict: NOT ROBUST — no BB-Extreme strategy survives both walk-forward and sanity gates simultaneously. The C7 lift is likely period-specific or a lucky cut. Don't deploy. Failed WF: [], failed sanity: ['BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2'].

Key metrics

metric	value
n_strategies_checked	`6`
n_wf_windows	`4`
n_threshold_levels	`8`
bb_promoted	`[]`
bb_failed_wf	`[]`
bb_failed_sanity	`['BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2']`
BB_EXTREME_3_sanity_z	`+0.8112`
BB_EXTREME_1_sanity_z	`-1.1602`
BB_EXTREME_2_sanity_z	`-0.8764`
EMA_PURE_sanity_z	`-1.4410`
EMA_MACD_sanity_z	`-0.3334`
HOLY_GRAIL_sanity_z	`+1.7957`

Approach

Five validations of the bocpd_filter promote-signal from Exp C7. Strategies: ['EMA_PURE', 'EMA_MACD', 'HOLY_GRAIL', 'BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2']. Window for full-period runs: 2023-01-01 → 2026-04-01. Walk-forward partitions: 4 non-overlapping ~9-mo windows.

Sub 1 — Walk-Forward Stability

Per-window PnL with filter on (θ=0.3) vs off. Stability = fraction of windows where the filter improves PnL.

	off	th0.3	delta	lift_pct
('BB_EXTREME_1', 'WF1_2023H1H2')	2.02	2.27	0.25	12.4
('BB_EXTREME_1', 'WF2_2023Q4_2024H1')	6.52	7.44	0.92	14.1
('BB_EXTREME_1', 'WF3_2024H2_2025Q1')	3.24	3.48	0.24	7.4
('BB_EXTREME_1', 'WF4_2025Q2_2026Q1')	3.79	4.31	0.52	13.7
('BB_EXTREME_2', 'WF1_2023H1H2')	0.23	0.73	0.5	217.4
('BB_EXTREME_2', 'WF2_2023Q4_2024H1')	7.9	8.77	0.87	11
('BB_EXTREME_2', 'WF3_2024H2_2025Q1')	4.26	4.78	0.52	12.2
('BB_EXTREME_2', 'WF4_2025Q2_2026Q1')	5.08	6.2	1.12	22
('BB_EXTREME_3', 'WF1_2023H1H2')	3.5	3.36	-0.14	-4
('BB_EXTREME_3', 'WF2_2023Q4_2024H1')	-1.04	-0.76	0.28	26.9
('BB_EXTREME_3', 'WF3_2024H2_2025Q1')	3.2	3.28	0.08	2.5
('BB_EXTREME_3', 'WF4_2025Q2_2026Q1')	6.02	6.59	0.57	9.5
('EMA_MACD', 'WF1_2023H1H2')	-5.63	-5.9	-0.27	-4.8
('EMA_MACD', 'WF2_2023Q4_2024H1')	-4.74	-4.63	0.11	2.3
('EMA_MACD', 'WF3_2024H2_2025Q1')	-1.21	-1.08	0.13	10.7
('EMA_MACD', 'WF4_2025Q2_2026Q1')	-7.46	-7.28	0.18	2.4
('EMA_PURE', 'WF1_2023H1H2')	-5.38	-5.51	-0.13	-2.4
('EMA_PURE', 'WF2_2023Q4_2024H1')	-6.17	-5.87	0.3	4.9
('EMA_PURE', 'WF3_2024H2_2025Q1')	-1.3	-0.93	0.37	28.5
('EMA_PURE', 'WF4_2025Q2_2026Q1')	-4.71	-5.18	-0.47	-10
('HOLY_GRAIL', 'WF1_2023H1H2')	-0.43	-0.43	0	0
('HOLY_GRAIL', 'WF2_2023Q4_2024H1')	0.26	0.26	0	0
('HOLY_GRAIL', 'WF3_2024H2_2025Q1')	0.35	0.35	0	0
('HOLY_GRAIL', 'WF4_2025Q2_2026Q1')	-1.01	-1.01	0	0

Stability summary

strategy	n_windows	n_positive	stability	mean_delta	min_delta	max_delta
EMA_PURE	4	2	0.5	0.0175	-0.47	0.37
EMA_MACD	4	3	0.75	0.0375	-0.27	0.18
HOLY_GRAIL	4	0	0	0	0	0
BB_EXTREME_3	4	3	0.75	0.1975	-0.14	0.57
BB_EXTREME_1	4	4	1	0.4825	0.24	0.92
BB_EXTREME_2	4	4	1	0.7525	0.5	1.12

wf delta

Sub 2 — Threshold Grid (BB-Extreme only)

Full-period backtest at θ ∈ [1.0, 0.7, 0.5, 0.4, 0.3, 0.2, 0.15, 0.1]. A smooth PnL-vs-θ curve indicates robust signal; spikes = lucky θ.

Total PnL ($) by threshold

strategy	0.1	0.15	0.2	0.3	0.4	0.5	0.7	1.0
BB_EXTREME_1	18.14	17.67	17.07	17.23	17.23	16.76	16.2	15.3
BB_EXTREME_2	20.08	20.39	20.39	20.67	20.47	19.76	18.9	17.66
BB_EXTREME_3	13.7	12.32	12.31	12.21	12.16	12.12	12.12	11.42

Sharpe (trade-level) by threshold

strategy	0.1	0.15	0.2	0.3	0.4	0.5	0.7	1.0
BB_EXTREME_1	2.378	2.242	2.146	2.148	2.148	2.066	1.953	1.735
BB_EXTREME_2	2.955	2.891	2.891	2.914	2.878	2.724	2.521	2.182
BB_EXTREME_3	3.355	3.145	3.141	3.08	3.051	3.023	2.669	2.362

threshold grid

Sub 3 — LONG/SHORT split at θ=0.3

strategy	side	n_total	n_blocked	n_kept	baseline_pnl	kept_pnl	blocked_pnl	blocked_mean_pnl	kept_mean_pnl
BB_EXTREME_3	long	58	6	52	7.35	7.64	-0.28	-0.05	0.15
BB_EXTREME_3	short	66	6	60	4.07	4.34	-0.27	-0.04	0.07
BB_EXTREME_1	long	74	8	66	17.32	11.59	5.74	+0.72	0.18
BB_EXTREME_1	short	91	6	85	-2.03	-1.07	-0.96	-0.16	-0.01
BB_EXTREME_2	long	61	9	52	17.21	11.47	5.73	+0.64	0.22
BB_EXTREME_2	short	78	7	71	0.45	1.55	-1.1	-0.16	0.02
EMA_PURE	long	559	21	538	-8.07	-7.77	-0.3	-0.01	-0.01
EMA_PURE	short	480	34	446	-9.59	-10.12	0.52	+0.02	-0.02
EMA_MACD	long	603	15	588	-4.38	-3.88	-0.49	-0.03	-0.01
EMA_MACD	short	631	22	609	-14.59	-14.77	0.18	+0.01	-0.02
HOLY_GRAIL	long	10	0	10	-0.22	-0.22	0	—	-0.02
HOLY_GRAIL	short	17	2	15	-0.61	-0.18	-0.43	-0.22	-0.01

Interpretation: blocked_pnl negative = filter saves money on that side. If one side benefits much more than the other → direction-specific edge.

Sub 4 — Size-reduction variant at θ=0.3

Instead of blocking, scale position size by size_factor. 0.0 = full block, 1.0 = no change, 0.5 = half size for high-p_short trades.

strategy	0.0	0.25	0.5	0.75	1.0
BB_EXTREME_1	10.52	11.71	12.91	14.1	15.3
BB_EXTREME_2	13.02	14.18	15.34	16.5	17.66
BB_EXTREME_3	11.98	11.84	11.7	11.56	11.42
EMA_MACD	-18.66	-18.73	-18.81	-18.89	-18.97
EMA_PURE	-17.89	-17.83	-17.78	-17.72	-17.67
HOLY_GRAIL	-0.4	-0.51	-0.62	-0.72	-0.83

Δ PnL vs no-change baseline (size_factor=1.0)

strategy	0.0	0.25	0.5	0.75
BB_EXTREME_1	-4.78	-3.59	-2.39	-1.2
BB_EXTREME_2	-4.64	-3.48	-2.32	-1.16
BB_EXTREME_3	0.56	0.42	0.28	0.14
EMA_MACD	0.31	0.24	0.16	0.08
EMA_PURE	-0.22	-0.16	-0.11	-0.05
HOLY_GRAIL	0.43	0.32	0.21	0.11

Reading: positive Δ = size-reduction beats no-change. The cell at size_factor=0.0 == full block (same as bocpd_filter@0.3).

Sub 5 — Shuffle Sanity Check

Shuffle p_short values across baseline trades 200 times and apply the θ=0.3 filter. If the real lift is a signal (and not luck), the real kept-PnL should sit far in the right tail of the shuffled distribution — z-score > 1.5 means the lift is unlikely by chance.

strategy	real_kept_pnl	shuffled_mean_kept_pnl	shuffled_p5	shuffled_p95	z_score
BB_EXTREME_3	11.98	10.47	7.25	12	0.81
BB_EXTREME_1	10.52	14.06	8.44	17.21	-1.16
BB_EXTREME_2	13.02	15.95	9.56	19.61	-0.88
EMA_PURE	-17.89	-16.73	-17.99	-15.52	-1.44
EMA_MACD	-18.66	-18.44	-19.52	-17.41	-0.33
HOLY_GRAIL	-0.4	-0.8	-1.11	-0.39	1.8

Overall verdict

NOT ROBUST — no BB-Extreme strategy survives both walk-forward and sanity gates simultaneously. The C7 lift is likely period-specific or a lucky cut. Don't deploy. Failed WF: [], failed sanity: ['BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2'].