BOCPD Filter Validation — Walk-Forward, Threshold Grid, Size, Sanity
DroppedBOCPD Filter Validation — Walk-Forward, Threshold Grid, Size, Sanity
2026-05-19 · status: dropped · 178.6s
Hypothesis: The +6.9–17.0% PnL lift on BB_EXTREME strategies from
bocpd_filter(Exp C7) survives (a) walk-forward partitioning into 4 windows, (b) a finer threshold grid, and (c) a shuffled-p_short sanity check.Verdict: NOT ROBUST — no BB-Extreme strategy survives both walk-forward and sanity gates simultaneously. The C7 lift is likely period-specific or a lucky cut. Don't deploy. Failed WF: [], failed sanity: ['BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2'].
Key metrics
| metric | value |
|---|---|
| n_strategies_checked | 6 |
| n_wf_windows | 4 |
| n_threshold_levels | 8 |
| bb_promoted | [] |
| bb_failed_wf | [] |
| bb_failed_sanity | ['BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2'] |
| BB_EXTREME_3_sanity_z | +0.8112 |
| BB_EXTREME_1_sanity_z | -1.1602 |
| BB_EXTREME_2_sanity_z | -0.8764 |
| EMA_PURE_sanity_z | -1.4410 |
| EMA_MACD_sanity_z | -0.3334 |
| HOLY_GRAIL_sanity_z | +1.7957 |
Approach
Five validations of the bocpd_filter promote-signal from Exp C7. Strategies: ['EMA_PURE', 'EMA_MACD', 'HOLY_GRAIL', 'BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2']. Window for full-period runs: 2023-01-01 → 2026-04-01. Walk-forward partitions: 4 non-overlapping ~9-mo windows.
Sub 1 — Walk-Forward Stability
Per-window PnL with filter on (θ=0.3) vs off. Stability = fraction of windows where the filter improves PnL.
| off | th0.3 | delta | lift_pct | |
|---|---|---|---|---|
| ('BB_EXTREME_1', 'WF1_2023H1H2') | 2.02 | 2.27 | 0.25 | 12.4 |
| ('BB_EXTREME_1', 'WF2_2023Q4_2024H1') | 6.52 | 7.44 | 0.92 | 14.1 |
| ('BB_EXTREME_1', 'WF3_2024H2_2025Q1') | 3.24 | 3.48 | 0.24 | 7.4 |
| ('BB_EXTREME_1', 'WF4_2025Q2_2026Q1') | 3.79 | 4.31 | 0.52 | 13.7 |
| ('BB_EXTREME_2', 'WF1_2023H1H2') | 0.23 | 0.73 | 0.5 | 217.4 |
| ('BB_EXTREME_2', 'WF2_2023Q4_2024H1') | 7.9 | 8.77 | 0.87 | 11 |
| ('BB_EXTREME_2', 'WF3_2024H2_2025Q1') | 4.26 | 4.78 | 0.52 | 12.2 |
| ('BB_EXTREME_2', 'WF4_2025Q2_2026Q1') | 5.08 | 6.2 | 1.12 | 22 |
| ('BB_EXTREME_3', 'WF1_2023H1H2') | 3.5 | 3.36 | -0.14 | -4 |
| ('BB_EXTREME_3', 'WF2_2023Q4_2024H1') | -1.04 | -0.76 | 0.28 | 26.9 |
| ('BB_EXTREME_3', 'WF3_2024H2_2025Q1') | 3.2 | 3.28 | 0.08 | 2.5 |
| ('BB_EXTREME_3', 'WF4_2025Q2_2026Q1') | 6.02 | 6.59 | 0.57 | 9.5 |
| ('EMA_MACD', 'WF1_2023H1H2') | -5.63 | -5.9 | -0.27 | -4.8 |
| ('EMA_MACD', 'WF2_2023Q4_2024H1') | -4.74 | -4.63 | 0.11 | 2.3 |
| ('EMA_MACD', 'WF3_2024H2_2025Q1') | -1.21 | -1.08 | 0.13 | 10.7 |
| ('EMA_MACD', 'WF4_2025Q2_2026Q1') | -7.46 | -7.28 | 0.18 | 2.4 |
| ('EMA_PURE', 'WF1_2023H1H2') | -5.38 | -5.51 | -0.13 | -2.4 |
| ('EMA_PURE', 'WF2_2023Q4_2024H1') | -6.17 | -5.87 | 0.3 | 4.9 |
| ('EMA_PURE', 'WF3_2024H2_2025Q1') | -1.3 | -0.93 | 0.37 | 28.5 |
| ('EMA_PURE', 'WF4_2025Q2_2026Q1') | -4.71 | -5.18 | -0.47 | -10 |
| ('HOLY_GRAIL', 'WF1_2023H1H2') | -0.43 | -0.43 | 0 | 0 |
| ('HOLY_GRAIL', 'WF2_2023Q4_2024H1') | 0.26 | 0.26 | 0 | 0 |
| ('HOLY_GRAIL', 'WF3_2024H2_2025Q1') | 0.35 | 0.35 | 0 | 0 |
| ('HOLY_GRAIL', 'WF4_2025Q2_2026Q1') | -1.01 | -1.01 | 0 | 0 |
Stability summary
| strategy | n_windows | n_positive | stability | mean_delta | min_delta | max_delta |
|---|---|---|---|---|---|---|
| EMA_PURE | 4 | 2 | 0.5 | 0.0175 | -0.47 | 0.37 |
| EMA_MACD | 4 | 3 | 0.75 | 0.0375 | -0.27 | 0.18 |
| HOLY_GRAIL | 4 | 0 | 0 | 0 | 0 | 0 |
| BB_EXTREME_3 | 4 | 3 | 0.75 | 0.1975 | -0.14 | 0.57 |
| BB_EXTREME_1 | 4 | 4 | 1 | 0.4825 | 0.24 | 0.92 |
| BB_EXTREME_2 | 4 | 4 | 1 | 0.7525 | 0.5 | 1.12 |

Sub 2 — Threshold Grid (BB-Extreme only)
Full-period backtest at θ ∈ [1.0, 0.7, 0.5, 0.4, 0.3, 0.2, 0.15, 0.1]. A smooth PnL-vs-θ curve indicates robust signal; spikes = lucky θ.
Total PnL ($) by threshold
| strategy | 0.1 | 0.15 | 0.2 | 0.3 | 0.4 | 0.5 | 0.7 | 1.0 |
|---|---|---|---|---|---|---|---|---|
| BB_EXTREME_1 | 18.14 | 17.67 | 17.07 | 17.23 | 17.23 | 16.76 | 16.2 | 15.3 |
| BB_EXTREME_2 | 20.08 | 20.39 | 20.39 | 20.67 | 20.47 | 19.76 | 18.9 | 17.66 |
| BB_EXTREME_3 | 13.7 | 12.32 | 12.31 | 12.21 | 12.16 | 12.12 | 12.12 | 11.42 |
Sharpe (trade-level) by threshold
| strategy | 0.1 | 0.15 | 0.2 | 0.3 | 0.4 | 0.5 | 0.7 | 1.0 |
|---|---|---|---|---|---|---|---|---|
| BB_EXTREME_1 | 2.378 | 2.242 | 2.146 | 2.148 | 2.148 | 2.066 | 1.953 | 1.735 |
| BB_EXTREME_2 | 2.955 | 2.891 | 2.891 | 2.914 | 2.878 | 2.724 | 2.521 | 2.182 |
| BB_EXTREME_3 | 3.355 | 3.145 | 3.141 | 3.08 | 3.051 | 3.023 | 2.669 | 2.362 |

Sub 3 — LONG/SHORT split at θ=0.3
| strategy | side | n_total | n_blocked | n_kept | baseline_pnl | kept_pnl | blocked_pnl | blocked_mean_pnl | kept_mean_pnl |
|---|---|---|---|---|---|---|---|---|---|
| BB_EXTREME_3 | long | 58 | 6 | 52 | 7.35 | 7.64 | -0.28 | -0.05 | 0.15 |
| BB_EXTREME_3 | short | 66 | 6 | 60 | 4.07 | 4.34 | -0.27 | -0.04 | 0.07 |
| BB_EXTREME_1 | long | 74 | 8 | 66 | 17.32 | 11.59 | 5.74 | +0.72 | 0.18 |
| BB_EXTREME_1 | short | 91 | 6 | 85 | -2.03 | -1.07 | -0.96 | -0.16 | -0.01 |
| BB_EXTREME_2 | long | 61 | 9 | 52 | 17.21 | 11.47 | 5.73 | +0.64 | 0.22 |
| BB_EXTREME_2 | short | 78 | 7 | 71 | 0.45 | 1.55 | -1.1 | -0.16 | 0.02 |
| EMA_PURE | long | 559 | 21 | 538 | -8.07 | -7.77 | -0.3 | -0.01 | -0.01 |
| EMA_PURE | short | 480 | 34 | 446 | -9.59 | -10.12 | 0.52 | +0.02 | -0.02 |
| EMA_MACD | long | 603 | 15 | 588 | -4.38 | -3.88 | -0.49 | -0.03 | -0.01 |
| EMA_MACD | short | 631 | 22 | 609 | -14.59 | -14.77 | 0.18 | +0.01 | -0.02 |
| HOLY_GRAIL | long | 10 | 0 | 10 | -0.22 | -0.22 | 0 | — | -0.02 |
| HOLY_GRAIL | short | 17 | 2 | 15 | -0.61 | -0.18 | -0.43 | -0.22 | -0.01 |
Interpretation: blocked_pnl negative = filter saves money on that side. If one side benefits much more than the other → direction-specific edge.
Sub 4 — Size-reduction variant at θ=0.3
Instead of blocking, scale position size by size_factor. 0.0 = full block, 1.0 = no change, 0.5 = half size for high-p_short trades.
| strategy | 0.0 | 0.25 | 0.5 | 0.75 | 1.0 |
|---|---|---|---|---|---|
| BB_EXTREME_1 | 10.52 | 11.71 | 12.91 | 14.1 | 15.3 |
| BB_EXTREME_2 | 13.02 | 14.18 | 15.34 | 16.5 | 17.66 |
| BB_EXTREME_3 | 11.98 | 11.84 | 11.7 | 11.56 | 11.42 |
| EMA_MACD | -18.66 | -18.73 | -18.81 | -18.89 | -18.97 |
| EMA_PURE | -17.89 | -17.83 | -17.78 | -17.72 | -17.67 |
| HOLY_GRAIL | -0.4 | -0.51 | -0.62 | -0.72 | -0.83 |
Δ PnL vs no-change baseline (size_factor=1.0)
| strategy | 0.0 | 0.25 | 0.5 | 0.75 | 1.0 |
|---|---|---|---|---|---|
| BB_EXTREME_1 | -4.78 | -3.59 | -2.39 | -1.2 | 0 |
| BB_EXTREME_2 | -4.64 | -3.48 | -2.32 | -1.16 | 0 |
| BB_EXTREME_3 | 0.56 | 0.42 | 0.28 | 0.14 | 0 |
| EMA_MACD | 0.31 | 0.24 | 0.16 | 0.08 | 0 |
| EMA_PURE | -0.22 | -0.16 | -0.11 | -0.05 | 0 |
| HOLY_GRAIL | 0.43 | 0.32 | 0.21 | 0.11 | 0 |
Reading: positive Δ = size-reduction beats no-change. The cell at size_factor=0.0 == full block (same as bocpd_filter@0.3).
Sub 5 — Shuffle Sanity Check
Shuffle p_short values across baseline trades 200 times and apply the θ=0.3 filter. If the real lift is a signal (and not luck), the real kept-PnL should sit far in the right tail of the shuffled distribution — z-score > 1.5 means the lift is unlikely by chance.
| strategy | real_kept_pnl | shuffled_mean_kept_pnl | shuffled_p5 | shuffled_p95 | z_score |
|---|---|---|---|---|---|
| BB_EXTREME_3 | 11.98 | 10.47 | 7.25 | 12 | 0.81 |
| BB_EXTREME_1 | 10.52 | 14.06 | 8.44 | 17.21 | -1.16 |
| BB_EXTREME_2 | 13.02 | 15.95 | 9.56 | 19.61 | -0.88 |
| EMA_PURE | -17.89 | -16.73 | -17.99 | -15.52 | -1.44 |
| EMA_MACD | -18.66 | -18.44 | -19.52 | -17.41 | -0.33 |
| HOLY_GRAIL | -0.4 | -0.8 | -1.11 | -0.39 | 1.8 |
Overall verdict
NOT ROBUST — no BB-Extreme strategy survives both walk-forward and sanity gates simultaneously. The C7 lift is likely period-specific or a lucky cut. Don't deploy. Failed WF: [], failed sanity: ['BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2'].