Lab · ML Experiments

ML — Pattern Discovery

Inverted workflow: find conditional edges in BTC data first, build strategies second.
55 experiments

BOCPD Filter Validation — Walk-Forward, Threshold Grid, Size, Sanity

Dropped
2026-05-19 validationwalk-forwardthresholdsanityfilter
Hypothesis
The +6.9–17.0% PnL lift on BB_EXTREME strategies from `bocpd_filter` (Exp C7) survives (a) walk-forward partitioning into 4 windows, (b) a finer threshold grid, and (c) a shuffled-p_short sanity check.
Verdict
**NOT ROBUST** — no BB-Extreme strategy survives both walk-forward and sanity gates simultaneously. The C7 lift is likely period-specific or a lucky cut. Don't deploy. Failed WF: [], failed sanity: ['BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2'].
bb_promoted
[]
bb_failed_wf
[]
n_wf_windows
4
bb_failed_sanity
['BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2']
EMA_MACD_sanity_z
-0.3334
EMA_PURE_sanity_z
-1.4410
n_threshold_levels
8
HOLY_GRAIL_sanity_z
+1.7957
n_strategies_checked
6
BB_EXTREME_1_sanity_z
-1.1602
BB_EXTREME_2_sanity_z
-0.8764
BB_EXTREME_3_sanity_z
+0.8112

BOCPD Filter Validation — Walk-Forward, Threshold Grid, Size, Sanity

2026-05-19 · status: dropped · 178.6s

Hypothesis: The +6.9–17.0% PnL lift on BB_EXTREME strategies from bocpd_filter (Exp C7) survives (a) walk-forward partitioning into 4 windows, (b) a finer threshold grid, and (c) a shuffled-p_short sanity check.

Verdict: NOT ROBUST — no BB-Extreme strategy survives both walk-forward and sanity gates simultaneously. The C7 lift is likely period-specific or a lucky cut. Don't deploy. Failed WF: [], failed sanity: ['BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2'].

Key metrics

metric value
n_strategies_checked 6
n_wf_windows 4
n_threshold_levels 8
bb_promoted []
bb_failed_wf []
bb_failed_sanity ['BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2']
BB_EXTREME_3_sanity_z +0.8112
BB_EXTREME_1_sanity_z -1.1602
BB_EXTREME_2_sanity_z -0.8764
EMA_PURE_sanity_z -1.4410
EMA_MACD_sanity_z -0.3334
HOLY_GRAIL_sanity_z +1.7957

Approach

Five validations of the bocpd_filter promote-signal from Exp C7. Strategies: ['EMA_PURE', 'EMA_MACD', 'HOLY_GRAIL', 'BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2']. Window for full-period runs: 2023-01-01 → 2026-04-01. Walk-forward partitions: 4 non-overlapping ~9-mo windows.

Sub 1 — Walk-Forward Stability

Per-window PnL with filter on (θ=0.3) vs off. Stability = fraction of windows where the filter improves PnL.

off th0.3 delta lift_pct
('BB_EXTREME_1', 'WF1_2023H1H2') 2.02 2.27 0.25 12.4
('BB_EXTREME_1', 'WF2_2023Q4_2024H1') 6.52 7.44 0.92 14.1
('BB_EXTREME_1', 'WF3_2024H2_2025Q1') 3.24 3.48 0.24 7.4
('BB_EXTREME_1', 'WF4_2025Q2_2026Q1') 3.79 4.31 0.52 13.7
('BB_EXTREME_2', 'WF1_2023H1H2') 0.23 0.73 0.5 217.4
('BB_EXTREME_2', 'WF2_2023Q4_2024H1') 7.9 8.77 0.87 11
('BB_EXTREME_2', 'WF3_2024H2_2025Q1') 4.26 4.78 0.52 12.2
('BB_EXTREME_2', 'WF4_2025Q2_2026Q1') 5.08 6.2 1.12 22
('BB_EXTREME_3', 'WF1_2023H1H2') 3.5 3.36 -0.14 -4
('BB_EXTREME_3', 'WF2_2023Q4_2024H1') -1.04 -0.76 0.28 26.9
('BB_EXTREME_3', 'WF3_2024H2_2025Q1') 3.2 3.28 0.08 2.5
('BB_EXTREME_3', 'WF4_2025Q2_2026Q1') 6.02 6.59 0.57 9.5
('EMA_MACD', 'WF1_2023H1H2') -5.63 -5.9 -0.27 -4.8
('EMA_MACD', 'WF2_2023Q4_2024H1') -4.74 -4.63 0.11 2.3
('EMA_MACD', 'WF3_2024H2_2025Q1') -1.21 -1.08 0.13 10.7
('EMA_MACD', 'WF4_2025Q2_2026Q1') -7.46 -7.28 0.18 2.4
('EMA_PURE', 'WF1_2023H1H2') -5.38 -5.51 -0.13 -2.4
('EMA_PURE', 'WF2_2023Q4_2024H1') -6.17 -5.87 0.3 4.9
('EMA_PURE', 'WF3_2024H2_2025Q1') -1.3 -0.93 0.37 28.5
('EMA_PURE', 'WF4_2025Q2_2026Q1') -4.71 -5.18 -0.47 -10
('HOLY_GRAIL', 'WF1_2023H1H2') -0.43 -0.43 0 0
('HOLY_GRAIL', 'WF2_2023Q4_2024H1') 0.26 0.26 0 0
('HOLY_GRAIL', 'WF3_2024H2_2025Q1') 0.35 0.35 0 0
('HOLY_GRAIL', 'WF4_2025Q2_2026Q1') -1.01 -1.01 0 0

Stability summary

strategy n_windows n_positive stability mean_delta min_delta max_delta
EMA_PURE 4 2 0.5 0.0175 -0.47 0.37
EMA_MACD 4 3 0.75 0.0375 -0.27 0.18
HOLY_GRAIL 4 0 0 0 0 0
BB_EXTREME_3 4 3 0.75 0.1975 -0.14 0.57
BB_EXTREME_1 4 4 1 0.4825 0.24 0.92
BB_EXTREME_2 4 4 1 0.7525 0.5 1.12

wf delta

Sub 2 — Threshold Grid (BB-Extreme only)

Full-period backtest at θ ∈ [1.0, 0.7, 0.5, 0.4, 0.3, 0.2, 0.15, 0.1]. A smooth PnL-vs-θ curve indicates robust signal; spikes = lucky θ.

Total PnL ($) by threshold

strategy 0.1 0.15 0.2 0.3 0.4 0.5 0.7 1.0
BB_EXTREME_1 18.14 17.67 17.07 17.23 17.23 16.76 16.2 15.3
BB_EXTREME_2 20.08 20.39 20.39 20.67 20.47 19.76 18.9 17.66
BB_EXTREME_3 13.7 12.32 12.31 12.21 12.16 12.12 12.12 11.42

Sharpe (trade-level) by threshold

strategy 0.1 0.15 0.2 0.3 0.4 0.5 0.7 1.0
BB_EXTREME_1 2.378 2.242 2.146 2.148 2.148 2.066 1.953 1.735
BB_EXTREME_2 2.955 2.891 2.891 2.914 2.878 2.724 2.521 2.182
BB_EXTREME_3 3.355 3.145 3.141 3.08 3.051 3.023 2.669 2.362

threshold grid

Sub 3 — LONG/SHORT split at θ=0.3

strategy side n_total n_blocked n_kept baseline_pnl kept_pnl blocked_pnl blocked_mean_pnl kept_mean_pnl
BB_EXTREME_3 long 58 6 52 7.35 7.64 -0.28 -0.05 0.15
BB_EXTREME_3 short 66 6 60 4.07 4.34 -0.27 -0.04 0.07
BB_EXTREME_1 long 74 8 66 17.32 11.59 5.74 +0.72 0.18
BB_EXTREME_1 short 91 6 85 -2.03 -1.07 -0.96 -0.16 -0.01
BB_EXTREME_2 long 61 9 52 17.21 11.47 5.73 +0.64 0.22
BB_EXTREME_2 short 78 7 71 0.45 1.55 -1.1 -0.16 0.02
EMA_PURE long 559 21 538 -8.07 -7.77 -0.3 -0.01 -0.01
EMA_PURE short 480 34 446 -9.59 -10.12 0.52 +0.02 -0.02
EMA_MACD long 603 15 588 -4.38 -3.88 -0.49 -0.03 -0.01
EMA_MACD short 631 22 609 -14.59 -14.77 0.18 +0.01 -0.02
HOLY_GRAIL long 10 0 10 -0.22 -0.22 0 -0.02
HOLY_GRAIL short 17 2 15 -0.61 -0.18 -0.43 -0.22 -0.01

Interpretation: blocked_pnl negative = filter saves money on that side. If one side benefits much more than the other → direction-specific edge.

Sub 4 — Size-reduction variant at θ=0.3

Instead of blocking, scale position size by size_factor. 0.0 = full block, 1.0 = no change, 0.5 = half size for high-p_short trades.

strategy 0.0 0.25 0.5 0.75 1.0
BB_EXTREME_1 10.52 11.71 12.91 14.1 15.3
BB_EXTREME_2 13.02 14.18 15.34 16.5 17.66
BB_EXTREME_3 11.98 11.84 11.7 11.56 11.42
EMA_MACD -18.66 -18.73 -18.81 -18.89 -18.97
EMA_PURE -17.89 -17.83 -17.78 -17.72 -17.67
HOLY_GRAIL -0.4 -0.51 -0.62 -0.72 -0.83

Δ PnL vs no-change baseline (size_factor=1.0)

strategy 0.0 0.25 0.5 0.75 1.0
BB_EXTREME_1 -4.78 -3.59 -2.39 -1.2 0
BB_EXTREME_2 -4.64 -3.48 -2.32 -1.16 0
BB_EXTREME_3 0.56 0.42 0.28 0.14 0
EMA_MACD 0.31 0.24 0.16 0.08 0
EMA_PURE -0.22 -0.16 -0.11 -0.05 0
HOLY_GRAIL 0.43 0.32 0.21 0.11 0

Reading: positive Δ = size-reduction beats no-change. The cell at size_factor=0.0 == full block (same as bocpd_filter@0.3).

Sub 5 — Shuffle Sanity Check

Shuffle p_short values across baseline trades 200 times and apply the θ=0.3 filter. If the real lift is a signal (and not luck), the real kept-PnL should sit far in the right tail of the shuffled distribution — z-score > 1.5 means the lift is unlikely by chance.

strategy real_kept_pnl shuffled_mean_kept_pnl shuffled_p5 shuffled_p95 z_score
BB_EXTREME_3 11.98 10.47 7.25 12 0.81
BB_EXTREME_1 10.52 14.06 8.44 17.21 -1.16
BB_EXTREME_2 13.02 15.95 9.56 19.61 -0.88
EMA_PURE -17.89 -16.73 -17.99 -15.52 -1.44
EMA_MACD -18.66 -18.44 -19.52 -17.41 -0.33
HOLY_GRAIL -0.4 -0.8 -1.11 -0.39 1.8

Overall verdict

NOT ROBUST — no BB-Extreme strategy survives both walk-forward and sanity gates simultaneously. The C7 lift is likely period-specific or a lucky cut. Don't deploy. Failed WF: [], failed sanity: ['BB_EXTREME_3', 'BB_EXTREME_1', 'BB_EXTREME_2'].