Lab · ML Experiments

ML — Pattern Discovery

Inverted workflow: find conditional edges in BTC data first, build strategies second.
55 experiments

BOCPD Filter — Definitive Shuffle Test

Inconclusive
2026-05-19 validationshufflesanityfilterproduction-gate
Hypothesis
If the BOCPD p_short signal carries real information about trade quality, the +6.9–17.0% PnL lift from bocpd_filter@0.3 should disappear when we shuffle the p_short → timestamp mapping. If shuffled runs show similar lift, the C7 'edge' was just trade-count-reduction with re-routing benefit.
Verdict
**MIXED** — 2/3 strategies show real BOCPD signal (z>1.5). 1 look like noise, 0 show anti-signal. Consider deploying selectively or pause until more data.
n_anti
0
n_signal
2
n_neutral
1
n_strategies
3
n_shuffle_seeds
5
shuf_also_lifts
0
BB_EXTREME_1_z_score
+7.2970
BB_EXTREME_2_z_score
+6.4999
BB_EXTREME_3_z_score
+1.0550
BB_EXTREME_1_real_lift_pct
+12.6457
BB_EXTREME_2_real_lift_pct
+17.0414
BB_EXTREME_3_real_lift_pct
+6.8836
BB_EXTREME_1_shuf_mean_lift_pct
+0.1612
BB_EXTREME_2_shuf_mean_lift_pct
-1.7855
BB_EXTREME_3_shuf_mean_lift_pct
-4.9936

BOCPD Filter — Definitive Shuffle Test

2026-05-19 · status: inconclusive · 100.0s

Hypothesis: If the BOCPD p_short signal carries real information about trade quality, the +6.9–17.0% PnL lift from bocpd_filter@0.3 should disappear when we shuffle the p_short → timestamp mapping. If shuffled runs show similar lift, the C7 'edge' was just trade-count-reduction with re-routing benefit.

Verdict: MIXED — 2/3 strategies show real BOCPD signal (z>1.5). 1 look like noise, 0 show anti-signal. Consider deploying selectively or pause until more data.

Key metrics

metric value
n_strategies 3
n_shuffle_seeds 5
n_signal 2
n_neutral 1
n_anti 0
shuf_also_lifts 0
BB_EXTREME_3_real_lift_pct +6.8836
BB_EXTREME_3_shuf_mean_lift_pct -4.9936
BB_EXTREME_3_z_score +1.0550
BB_EXTREME_1_real_lift_pct +12.6457
BB_EXTREME_1_shuf_mean_lift_pct +0.1612
BB_EXTREME_1_z_score +7.2970
BB_EXTREME_2_real_lift_pct +17.0414
BB_EXTREME_2_shuf_mean_lift_pct -1.7855
BB_EXTREME_2_z_score +6.4999

Approach

For each of the 3 BB-Extreme strategies we run the FULL backtest (with re-routing) over 2023-01-012026-04-01 with three source configurations:

  1. baseline_no_filter: bocpd_filter not active (θ=1.0)
  2. real: bocpd_filter active at θ=0.3, real BOCPD parquet
  3. shuffled (×5): bocpd_filter active at θ=0.3, p_short values randomly permuted across timestamps (same value distribution)

If the lift from real-filter is genuine BOCPD information, shuffled runs should show lift ≈ 0 (= baseline-pnl). If shuffled runs lift similarly, the lift comes from trade-count-reduction + re-routing, not from BOCPD.

Results

strategy baseline_pnl real_filter_pnl shuf_filter_mean_pnl shuf_filter_min_pnl shuf_filter_max_pnl real_lift_pct shuf_mean_lift_pct real_minus_shuf_pnl z_score
BB_EXTREME_3 11.42 12.21 10.85 8.57 11.56 6.88 -4.99 1.36 1.06
BB_EXTREME_1 15.3 17.23 15.32 14.94 15.59 12.65 0.16 1.91 7.3
BB_EXTREME_2 17.66 20.67 17.34 16.55 17.84 17.04 -1.79 3.32 6.5

Reading the table: real_lift_pct = lift of real-source filter over baseline. shuf_mean_lift_pct = average lift of shuffled-source filter over baseline (across 5 permutations). z_score = (real_pnl − shuf_mean) / shuf_std. |z| > 2 means the real signal is genuinely different from shuffled.

Per-run raw PnL

BB_EXTREME_3

source shuffle_seed n_trades total_pnl
baseline_no_filter -1 124 11.42
real -1 95 12.21
shuffled 0 121 11.56
shuffled 1 123 11.47
shuffled 2 123 11.5
shuffled 3 121 11.18
shuffled 4 125 8.57

BB_EXTREME_1

source shuffle_seed n_trades total_pnl
baseline_no_filter -1 165 15.3
real -1 146 17.23
shuffled 0 165 14.94
shuffled 1 162 15.51
shuffled 2 165 15.39
shuffled 3 166 15.2
shuffled 4 162 15.59

BB_EXTREME_2

source shuffle_seed n_trades total_pnl
baseline_no_filter -1 139 17.66
real -1 115 20.67
shuffled 0 138 17.55
shuffled 1 137 17.84
shuffled 2 138 17.63
shuffled 3 137 17.14
shuffled 4 137 16.55

shuffle comparison

Verdict

MIXED — 2/3 strategies show real BOCPD signal (z>1.5). 1 look like noise, 0 show anti-signal. Consider deploying selectively or pause until more data.