Botty · Megasweep PBO synthesis: the edge sits in the regime gate, not the entry trigger

Strategy analysis 2026-06-06 4 sources

Evaluation of all 22 archived megasweeps by their PBO values. Only 4 sweeps are trustworthy (PBO < 0.20); the meta-pattern: the robust edge lies in the trend/regime gate, not the entry. ATTENTION UPDATE (2026-06-08): the clean counter-check sweep MS_20260606_162121 (min_trades=30, training 2024+2025, held-out crash) has LARGELY DEBUNKED the archetype-optimism thesis (aggregate PBO 0.77): A (sfp) and B (bb_extreme) are 'one-window wonders', regime_switch collapses in the cross-window test. Single-window sweeps feigned robustness. See the update chapter.

Key findings

UPDATE 2026-06-08 (counter-check MS_20260606_162121, min_trades=30, training 2024+2025, OOS incl. bear+LUNA/FTX): NONE of the three candidates passes. Aggregate PBO 0.77 (overfit). A (sfp): 1 weak survivor, ann avg -0.89%, loses in both crashes. B (bb_extreme): 6 survivors but mediocre (best MC prob_profit 0.61 vs earlier 0.97-0.99), bear -4.23%. regime_switch: 0 survivors.
UPDATE 2026-06-08: regime_switch cleanly diagnosed - real in-sample edge (Phase 1: ema200+volume+fixed_stop @4h, +10.9%, PF 3.45) BUT Phase-2 collapse (score 73.98 -> 33.68, cutoff 89.7). Works in ONE regime period, not in 2024 AND 2025. The hybrid inherits the fragility of its parts instead of robustly collecting both edges.
UPDATE 2026-06-08 - the lesson that explains everything: the formerly 'best' sweep MS_0429 (PBO 0.043) trained on ONE window (2024), and its SFP winner rode BULL_2021H1 (+18-22% in ONE window). Single-window training feigns robustness. Under 2-window training (profitability required in 2024 AND 2025) + held-out crash the 'edge' does not reproduce. RULE: from now on always >=2 regime windows in training.
The PBO filter is merciless: of 22 sweeps only 4 have a trustworthy PBO < 0.20 - the broad sweep MS_20260429 (0.043), the ADX-regime solo MS_20260606 (0.055), the broad sweep MS_20260505 (0.133) and bb_extreme MS_20260507 (0.196).
Broadly confirmed overfit/dead (PBO > 0.5): ema_crossover (0.56-0.82 across several sweeps), donchian_breakout, macd_crossover, funding_rate_extreme, rsi_mean_reversion (borderline 0.42) and divergences (separately debunked in 3 walk-forwards). These entries do not survive selection.
Meta-pattern across the 4 good sweeps: EVERY robust winner carries a trend/regime filter (min_adx>=25, ema200_filter, hurst>=0.55) or IS the regime (adx_dmi_regime). The entry signal is interchangeable; the gate does the work. Confirmed by the EMA sweeps, whose raw crossover only survives OOS with adx_rising+vrp/bocpd stacking.
Timeframe law: the edge lives at 30m-4h. Everything <=15m overfits reliably - the 5m hurst+macd trap reached in-sample score 112, but MC prob_profit 0.0015 (1/11 OOS windows). Phase-1 raw run of the ADX sweep: 15m 0/144 profitable (average -25%), 4h 55/144.
Exit rule: a tight fixed_stop (1.0-1.5%) or swing_stop dominate. Signal-based exits (ema_cross_exit, macd_cross_exit) consistently hurt - they cut off the few large trend winners the low-win-rate profile (15-25% WR, PF ~2) lives on.
Two survivor archetypes: A) Trend harvest = sfp + adx_filter + ema200_filter + fixed_stop @ 30m (from the best-PBO sweep 0.043; 8/11 OOS windows, DD ~1.2%, MC pp 0.97). B) Mean reversion = bb_extreme + ema200_filter + volume_filter + swing_stop @ 1h (PBO 0.133; ann +14%, p50 +55%, MC pp 0.99).
Portfolio finding: A and B are regime-ANTI-correlated. The A winner's bucket_stats show a bull bucket 40-50% p.a. vs. a range bucket ~3%; B harvests exactly the range/MR phases in which A sleeps. This motivates a regime-switched portfolio instead of the search for the one single strategy.
Honesty caveat: even the best-PBO sfp winner draws almost all its return from ONE window (BULL_2021H1 +18-22%); the remaining 10 windows hover around zero. Low PBO here means 'honestly flat in chop/bear', not 'earns everywhere'. wf_avg 11 is inflated by a single trend leg.
Two methodological lessons: (1) min_trades >= 30 - sweeps with min_trades 8-12 (MS_0601/0603) produced 'winners' whose edge was 3 lucky trades (+21.7% on 3 trades). (2) The OOS set MUST contain bear+crash - MS_0601 tested only bull/recent windows and crafted itself a hollow '3/3 profitable'.

Botty recommendations

P1 DONE + NEGATIVE (2026-06-08): regime_switch implemented and counter-checked - does not pass

The hypothesis was: A (trend-robust) + B (range-robust, anti-correlated) -> a self-adaptive hybrid collects both edges. Result MS_20260606_162121: regime_switch had a real in-sample edge (+10.9%, PF 3.45 @4h) but 0 walk-forward survivors - Phase-2 collapse, because the edge does not hold in 2024 AND 2025. A and B themselves also fail the clean test (PBO 0.77, one-window wonders). Do NOT take live.

Implementation: Implemented: regime_switch in strategies/conditions/entries.py + regime_state precompute in data/indicator_cache.py (commit 9cfe2664). Remains as a building block should more robust sub-edges be found in the future - the switch mechanics themselves are correct (0/40 causality mismatch), it just lacks robust edges to switch between.

Evidence: MS_20260606_162121: regime_switch Phase-1 score 73.98 -> Phase-2 33.68 (cutoff 89.7); sfp ann avg -0.89%; bb_extreme best MC pp 0.61. Earlier optimism (MS_0429 PBO 0.043) was a single-window artifact.

P2 Harden the sweep methodology: min_trades >= 30 and bear+crash in the OOS set

The overfit sweeps (MS_0601/0603) suffered from min_trades 8-12 (winners with 3-trade samples) and from bull-only OOS windows that produce a hollow '3/3 profitable'.

Implementation: config['min_trades'] >= 30; training_windows across bear/range/bull; _robustness_windows() already provides BEAR_2022, CRASH_LUNA/FTX/YEN, RANGE_2023_mid as held-out OOS - ensure they are used and not leaked into training.

Evidence: MS_20260601 (PBO 0.56): +21.7% from 3 trades, OOS only bull/recent. MS_20260603 (PBO 0.66): in-sample top scores were OOS total failures.

P3 Position sizing risk-based + coupled to vol-forecast, cap leverage live at 3-5x

The robust configs have tiny DD (<2%) but small raw returns; the value lies in the risk-adjusted profile + leverage headroom. MC 'safe leverage' 9-10x is optimistic because of window concentration.

Implementation: fixed-fractional sizing from the stop distance; ml/forecast.predict_vol_4h as a scaling factor (IC +0.83, the only WF survivor of the ML experiments); leverage cap 3-5x in execution/config.py.

Evidence: MC max_safe_leverage 9-10 (MS_0429/0505); project_vol_forecast_module (4h IC +0.83 walk-forward).

P4 Do NOT take ema_crossover/macd/donchian/funding live as a stand-alone entry

Broadly exposed as overfit (PBO > 0.5 across several sweeps). They survive OOS only as gated passengers of a trend filter, not as an edge of their own.

Implementation: If ema_crossover at all, then only as a trigger INSIDE an adx/ema200 gate - and even then the selection is unreliable (PBO 0.56-0.66). Better to use the trend-gate edge directly.

Evidence: ema_crossover PBO 0.56 (MS_0601), 0.66 (MS_0603), 0.78/0.82 (MS_0513); donchian 0.77; macd as an entry in the registry sweep=False (Phase-3 verdict: all variants negative).

Full analysis

⚠️ UPDATE 2026-06-08 - clean counter-check debunks the optimism thesis

The comparison sweep MS_20260606_162121 (sfp, bb_extreme, regime_switch; min_trades=30; training 2024+2025 with min_window_wins=2; held-out OOS incl. bear + LUNA/FTX crash) ran cleanly for ~26h (34,548 Phase-2 runs, 2 errors). Result: none of the three passes.

Aggregate PBO 0.77 (overfit) - dominated by bb_extreme's huge filter-stack search space (24,348 Phase-2 runs).
regime_switch: 0 walk-forward survivors. It had a real in-sample edge (Phase 1: ema200+volume+fixed_stop @4h, +10.9%, PF 3.45) but collapsed in Phase 2 (score 73.98 -> 33.68, cutoff 89.7). Works in one regime period, not in 2024 AND 2025. The hybrid inherits the fragility of its parts.
A (sfp) and B (bb_extreme) are 'one-window wonders': sfp harvests only Y2023 (+3.4%), loses in both crashes, ann avg -0.89%. bb_extreme harvests only Y2021 (+8.8%), bear -4.23%. Best walk-forward MC prob_profit 0.61 - far from the 0.97-0.99 of the earlier 'winners'.

Why the contradiction with the earlier good PBO values? MS_0429 (PBO 0.043) trained on a SINGLE window (2024), and its SFP winner drew almost all its return from BULL_2021H1 (+18-22% in one window) - the window concentration that was already flagged as a caveat back then. Single-window training feigns robustness; 2-window training + held-out crash exposes it. Rule for all future sweeps: at least 2 regime windows in training, require profitability in both.

This is not a methodology error but its very point - PBO + multi-window walk-forward exist for exactly this. It fits in consistently (ML 5/6 died in walk-forward; divergences dead). The original text below (2026-06-06) remains as evidence of how convincing the thesis looked BEFORE the counter-check.

Starting point

Between 25.04. and 06.06.2026 Botty ran a total of 22 megasweeps (3-phase optimizer: coarse -> fine grid -> walk-forward). For each, the PBO (Probability of Backtest Overfitting, via CSCV in backtesting/cpcv.py) was computed after the fact. PBO answers the decisive question: Did we fool ourselves by selecting the in-sample winner? PBO ~ 0 = the winner generalizes; PBO ~ 0.5 = the selection is chance; PBO > 0.5 = actively counterproductive.

This research evaluates all sweeps together instead of celebrating individual winners.

The PBO landscape (all 22 sweeps, sorted)

PBO	Sweep	Entry(s)	Verdict
0.043	MS_20260429_085540	all (broad sweep)	Gold
0.055	MS_20260606_043928	adx_dmi_regime	Gold
0.133	MS_20260505_172436	all (broad sweep)	good
0.196	MS_20260507_073619	bb_extreme	ok
0.257	MS_20260513_110446	burj_khalifa	borderline
0.42-0.47	MS_0516/0517/0515	rsi_mean_reversion, holy_grail	unreliable
0.53-0.57	MS_0525, MS_0601, MS_0425	incl. ema_crossover	overfit
0.66-0.92	MS_0603, MS_0519, MS_0526, MS_0508, MS_0513x3, MS_0523	ema_crossover, donchian, bb_extreme, macd, funding, sfp mix	dead

Selection conclusion: Only 4 of 22 sweeps deliver a winner you can trust. ema_crossover, donchian_breakout, macd_crossover, funding_rate_extreme and rsi_mean_reversion are broadly exposed as overfit as stand-alone entries.

The meta-pattern: the edge is the gate

Important: the Phase-2 top scores are misleading - the highest-scored in-sample configs were consistently OOS total failures (textbook overfit, exactly what the PBO measures). Example from MS_0429: hurst+macd_crossover+volume @ 5m reached in-sample score 112, but Monte-Carlo prob_profit 0.0015 and 1/11 OOS windows. You have to look at the real walk-forward survivors.

When you do, the pattern is unambiguous across all 4 good sweeps:

The edge sits in the trend/regime gate, not the trigger. Every robust winner carries min_adx>=25, ema200_filter, hurst>=0.55 - or is itself the regime (adx_dmi_regime). The entry signal (SFP, BB, EMA cross) is interchangeable. Confirmed by the two EMA sweeps: the raw crossover survives OOS only when you stack adx_rising + vrp/bocpd on top of it - the gate does the work, the trigger is an accessory.
Timeframe 30m-4h. Everything <=15m overfits. 5m was the most reliable trap. Higher TF = less noise in the regime flip, less fee churn.
Exit = tight stop, let winners run. fixed_stop 1.0-1.5% or swing_stop. Signal exits (ema_cross_exit, macd_cross_exit) consistently hurt.
Profile: low win rate (15-25%), low frequency, tiny drawdown (<2%), flat-defensive in chop/bear, harvests trends. That is precisely why the PBO is low - the strategy doesn't pretend to earn money everywhere, so there is barely any IS-vs-OOS rank inversion.

The two survivor archetypes

A) Trend harvest (lowest PBO, most regime-robust)

Entry sfp (Swing Failure) - or adx_dmi_regime onset · Filter adx_filter (min_adx 25) + ema200_filter · Exit fixed_stop 1.0-1.5% · TF 30m-4h

From MS_0429 (PBO 0.043): 8/11 OOS windows green, DD ~1.2%, MC prob_profit 0.97, max_safe_leverage 10, p50 +25%. Caveat: return heavily concentrated in BULL_2021H1 (+18-22%), remaining windows ~ zero.

B) Mean reversion in trend (best return at good PBO)

Entry bb_extreme · Filter ema200_filter + volume_filter · Exit swing_stop · TF 1h

From MS_0505 (PBO 0.133): 7/9 OOS windows green, ann +14%, DD 1.65%, MC prob_profit 0.99, p50 +55%, max_safe_leverage 10. BB extreme is a range strategy (fade the band touch), direction-gated by EMA200 and confirmed by volume.

The portfolio argument (the actual conclusion)

Instead of searching for the 'best single strategy': A (trend) and B (mean reversion) are regime anti-correlated. The A winner's bucket_stats prove it - bull bucket 40-50% p.a., range bucket ~3%. B harvests exactly the range phases in which A sleeps. Run in parallel on Botty's three isolated wallets, they smooth the equity curve far beyond what a single strategy can achieve.

From this follows the regime-switch hypothesis to be tested: a single strategy that switches based on the ADX/DMI regime - if ADX says 'trend', we trade SFP; if ADX later flips to 'sideways', we switch to BB_Extreme. A self-adaptive hybrid instead of two separate wallets.

Position sizing

Risk-based fixed-fractional, not fixed notional. The stop distance (1-1.5%) defines the size.
Leverage: Monte-Carlo calls 9-10x 'safe', but that is optimistic given the window concentration. Cap live conservatively at 3-5x.
Dynamic via vol-forecast: Botty's only walk-forward-surviving ML edge is ml/forecast (predict_vol_4h, IC +0.83). As a sizing input: smaller size / wider stop at high forecast vol, larger at low.

What does NOT work

ema_crossover/macd/donchian/funding as a stand-alone entry (PBO dead), signal-based exits, timeframes <=15m, more than ~3 stacked filters (every knob inflates the overfit surface - visible in the 0.66-PBO sweeps with 3-filter stacks). See also What provably does NOT work in retail trading (and why) and Divergences: theory, practice and our walk-forward test on BTC.

Next steps

Implement the regime_switch entry (SFP in trend / BB in range, ADX-gated).
ONE clean comparison sweep over [sfp, bb_extreme, regime_switch] with min_trades>=30, multi-regime training windows and bear+crash in the held-out OOS - so that A, B and the switch are directly comparable via PBO/walk-forward.
Monte-Carlo + decision only AFTER the overfit gates.