- UPDATE 2026-06-08 (counter-check MS_20260606_162121, min_trades=30, training 2024+2025, OOS incl. bear+LUNA/FTX): NONE of the three candidates passes. Aggregate PBO 0.77 (overfit). A (sfp): 1 weak survivor, ann avg -0.89%, loses in both crashes. B (bb_extreme): 6 survivors but mediocre (best MC prob_profit 0.61 vs earlier 0.97-0.99), bear -4.23%. regime_switch: 0 survivors.
- UPDATE 2026-06-08: regime_switch cleanly diagnosed - real in-sample edge (Phase 1: ema200+volume+fixed_stop @4h, +10.9%, PF 3.45) BUT Phase-2 collapse (score 73.98 -> 33.68, cutoff 89.7). Works in ONE regime period, not in 2024 AND 2025. The hybrid inherits the fragility of its parts instead of robustly collecting both edges.
- UPDATE 2026-06-08 - the lesson that explains everything: the formerly 'best' sweep MS_0429 (PBO 0.043) trained on ONE window (2024), and its SFP winner rode BULL_2021H1 (+18-22% in ONE window). Single-window training feigns robustness. Under 2-window training (profitability required in 2024 AND 2025) + held-out crash the 'edge' does not reproduce. RULE: from now on always >=2 regime windows in training.
- The PBO filter is merciless: of 22 sweeps only 4 have a trustworthy PBO < 0.20 - the broad sweep MS_20260429 (0.043), the ADX-regime solo MS_20260606 (0.055), the broad sweep MS_20260505 (0.133) and bb_extreme MS_20260507 (0.196).
- Broadly confirmed overfit/dead (PBO > 0.5): ema_crossover (0.56-0.82 across several sweeps), donchian_breakout, macd_crossover, funding_rate_extreme, rsi_mean_reversion (borderline 0.42) and divergences (separately debunked in 3 walk-forwards). These entries do not survive selection.
- Meta-pattern across the 4 good sweeps: EVERY robust winner carries a trend/regime filter (min_adx>=25, ema200_filter, hurst>=0.55) or IS the regime (adx_dmi_regime). The entry signal is interchangeable; the gate does the work. Confirmed by the EMA sweeps, whose raw crossover only survives OOS with adx_rising+vrp/bocpd stacking.
- Timeframe law: the edge lives at 30m-4h. Everything <=15m overfits reliably - the 5m hurst+macd trap reached in-sample score 112, but MC prob_profit 0.0015 (1/11 OOS windows). Phase-1 raw run of the ADX sweep: 15m 0/144 profitable (average -25%), 4h 55/144.
- Exit rule: a tight fixed_stop (1.0-1.5%) or swing_stop dominate. Signal-based exits (ema_cross_exit, macd_cross_exit) consistently hurt - they cut off the few large trend winners the low-win-rate profile (15-25% WR, PF ~2) lives on.
- Two survivor archetypes: A) Trend harvest = sfp + adx_filter + ema200_filter + fixed_stop @ 30m (from the best-PBO sweep 0.043; 8/11 OOS windows, DD ~1.2%, MC pp 0.97). B) Mean reversion = bb_extreme + ema200_filter + volume_filter + swing_stop @ 1h (PBO 0.133; ann +14%, p50 +55%, MC pp 0.99).
- Portfolio finding: A and B are regime-ANTI-correlated. The A winner's bucket_stats show a bull bucket 40-50% p.a. vs. a range bucket ~3%; B harvests exactly the range/MR phases in which A sleeps. This motivates a regime-switched portfolio instead of the search for the one single strategy.
- Honesty caveat: even the best-PBO sfp winner draws almost all its return from ONE window (BULL_2021H1 +18-22%); the remaining 10 windows hover around zero. Low PBO here means 'honestly flat in chop/bear', not 'earns everywhere'. wf_avg 11 is inflated by a single trend leg.
- Two methodological lessons: (1) min_trades >= 30 - sweeps with min_trades 8-12 (MS_0601/0603) produced 'winners' whose edge was 3 lucky trades (+21.7% on 3 trades). (2) The OOS set MUST contain bear+crash - MS_0601 tested only bull/recent windows and crafted itself a hollow '3/3 profitable'.
⚠️ UPDATE 2026-06-08 - clean counter-check debunks the optimism thesis
The comparison sweep MS_20260606_162121 (sfp, bb_extreme, regime_switch; min_trades=30; training 2024+2025 with min_window_wins=2; held-out OOS incl. bear + LUNA/FTX crash) ran cleanly for ~26h (34,548 Phase-2 runs, 2 errors). Result: none of the three passes.
- Aggregate PBO 0.77 (overfit) - dominated by bb_extreme's huge filter-stack search space (24,348 Phase-2 runs).
- regime_switch: 0 walk-forward survivors. It had a real in-sample edge (Phase 1:
ema200+volume+fixed_stop @4h, +10.9%, PF 3.45) but collapsed in Phase 2 (score 73.98 -> 33.68, cutoff 89.7). Works in one regime period, not in 2024 AND 2025. The hybrid inherits the fragility of its parts. - A (sfp) and B (bb_extreme) are 'one-window wonders': sfp harvests only Y2023 (+3.4%), loses in both crashes, ann avg -0.89%. bb_extreme harvests only Y2021 (+8.8%), bear -4.23%. Best walk-forward MC prob_profit 0.61 - far from the 0.97-0.99 of the earlier 'winners'.
Why the contradiction with the earlier good PBO values? MS_0429 (PBO 0.043) trained on a SINGLE window (2024), and its SFP winner drew almost all its return from BULL_2021H1 (+18-22% in one window) - the window concentration that was already flagged as a caveat back then. Single-window training feigns robustness; 2-window training + held-out crash exposes it. Rule for all future sweeps: at least 2 regime windows in training, require profitability in both.
This is not a methodology error but its very point - PBO + multi-window walk-forward exist for exactly this. It fits in consistently (ML 5/6 died in walk-forward; divergences dead). The original text below (2026-06-06) remains as evidence of how convincing the thesis looked BEFORE the counter-check.
Starting point
Between 25.04. and 06.06.2026 Botty ran a total of 22 megasweeps (3-phase optimizer: coarse -> fine grid -> walk-forward). For each, the PBO (Probability of Backtest Overfitting, via CSCV in backtesting/cpcv.py) was computed after the fact. PBO answers the decisive question: Did we fool ourselves by selecting the in-sample winner? PBO ~ 0 = the winner generalizes; PBO ~ 0.5 = the selection is chance; PBO > 0.5 = actively counterproductive.
This research evaluates all sweeps together instead of celebrating individual winners.
The PBO landscape (all 22 sweeps, sorted)
| PBO | Sweep | Entry(s) | Verdict |
|---|---|---|---|
| 0.043 | MS_20260429_085540 | all (broad sweep) | Gold |
| 0.055 | MS_20260606_043928 | adx_dmi_regime | Gold |
| 0.133 | MS_20260505_172436 | all (broad sweep) | good |
| 0.196 | MS_20260507_073619 | bb_extreme | ok |
| 0.257 | MS_20260513_110446 | burj_khalifa | borderline |
| 0.42-0.47 | MS_0516/0517/0515 | rsi_mean_reversion, holy_grail | unreliable |
| 0.53-0.57 | MS_0525, MS_0601, MS_0425 | incl. ema_crossover | overfit |
| 0.66-0.92 | MS_0603, MS_0519, MS_0526, MS_0508, MS_0513x3, MS_0523 | ema_crossover, donchian, bb_extreme, macd, funding, sfp mix | dead |
Selection conclusion: Only 4 of 22 sweeps deliver a winner you can trust. ema_crossover, donchian_breakout, macd_crossover, funding_rate_extreme and rsi_mean_reversion are broadly exposed as overfit as stand-alone entries.
The meta-pattern: the edge is the gate
Important: the Phase-2 top scores are misleading - the highest-scored in-sample configs were consistently OOS total failures (textbook overfit, exactly what the PBO measures). Example from MS_0429: hurst+macd_crossover+volume @ 5m reached in-sample score 112, but Monte-Carlo prob_profit 0.0015 and 1/11 OOS windows. You have to look at the real walk-forward survivors.
When you do, the pattern is unambiguous across all 4 good sweeps:
- The edge sits in the trend/regime gate, not the trigger. Every robust winner carries
min_adx>=25,ema200_filter,hurst>=0.55- or is itself the regime (adx_dmi_regime). The entry signal (SFP, BB, EMA cross) is interchangeable. Confirmed by the two EMA sweeps: the raw crossover survives OOS only when you stackadx_rising + vrp/bocpdon top of it - the gate does the work, the trigger is an accessory. - Timeframe 30m-4h. Everything <=15m overfits. 5m was the most reliable trap. Higher TF = less noise in the regime flip, less fee churn.
- Exit = tight stop, let winners run.
fixed_stop1.0-1.5% orswing_stop. Signal exits (ema_cross_exit,macd_cross_exit) consistently hurt. - Profile: low win rate (15-25%), low frequency, tiny drawdown (<2%), flat-defensive in chop/bear, harvests trends. That is precisely why the PBO is low - the strategy doesn't pretend to earn money everywhere, so there is barely any IS-vs-OOS rank inversion.
The two survivor archetypes
A) Trend harvest (lowest PBO, most regime-robust)
Entry
sfp(Swing Failure) - oradx_dmi_regimeonset · Filteradx_filter(min_adx 25) +ema200_filter· Exitfixed_stop1.0-1.5% · TF 30m-4h
From MS_0429 (PBO 0.043): 8/11 OOS windows green, DD ~1.2%, MC prob_profit 0.97, max_safe_leverage 10, p50 +25%. Caveat: return heavily concentrated in BULL_2021H1 (+18-22%), remaining windows ~ zero.
B) Mean reversion in trend (best return at good PBO)
Entry
bb_extreme· Filterema200_filter+volume_filter· Exitswing_stop· TF 1h
From MS_0505 (PBO 0.133): 7/9 OOS windows green, ann +14%, DD 1.65%, MC prob_profit 0.99, p50 +55%, max_safe_leverage 10. BB extreme is a range strategy (fade the band touch), direction-gated by EMA200 and confirmed by volume.
The portfolio argument (the actual conclusion)
Instead of searching for the 'best single strategy': A (trend) and B (mean reversion) are regime anti-correlated. The A winner's bucket_stats prove it - bull bucket 40-50% p.a., range bucket ~3%. B harvests exactly the range phases in which A sleeps. Run in parallel on Botty's three isolated wallets, they smooth the equity curve far beyond what a single strategy can achieve.
From this follows the regime-switch hypothesis to be tested: a single strategy that switches based on the ADX/DMI regime - if ADX says 'trend', we trade SFP; if ADX later flips to 'sideways', we switch to BB_Extreme. A self-adaptive hybrid instead of two separate wallets.
Position sizing
- Risk-based fixed-fractional, not fixed notional. The stop distance (1-1.5%) defines the size.
- Leverage: Monte-Carlo calls 9-10x 'safe', but that is optimistic given the window concentration. Cap live conservatively at 3-5x.
- Dynamic via vol-forecast: Botty's only walk-forward-surviving ML edge is
ml/forecast(predict_vol_4h, IC +0.83). As a sizing input: smaller size / wider stop at high forecast vol, larger at low.
What does NOT work
ema_crossover/macd/donchian/funding as a stand-alone entry (PBO dead), signal-based exits, timeframes <=15m, more than ~3 stacked filters (every knob inflates the overfit surface - visible in the 0.66-PBO sweeps with 3-filter stacks). See also What provably does NOT work in retail trading (and why) and Divergences: theory, practice and our walk-forward test on BTC.
Next steps
- Implement the
regime_switchentry (SFP in trend / BB in range, ADX-gated). - ONE clean comparison sweep over [sfp, bb_extreme, regime_switch] with min_trades>=30, multi-regime training windows and bear+crash in the held-out OOS - so that A, B and the switch are directly comparable via PBO/walk-forward.
- Monte-Carlo + decision only AFTER the overfit gates.