Knowledge · Research · What provably does NOT work in retail trading (and why)

What provably does NOT work in retail trading (and why)

Strategy analysis 2026-06-06 7 sources
A consolidated, evidence-based list of the things that, per the quant research of Thomas Skinner (Delta Trend Trading), do not work - and the mechanistic reason behind each. Drawn from 11 long-form videos + 22 shorts, many backed by programmatic backtests over thousands of trades. Matches Botty's own live_readiness pipeline and ml philosophy 1:1. Goal: a yardstick to sort out your own ideas early, before capital flows.
  • Public/sold strategies have no edge - alpha decay: once an edge becomes public and capital flows in, the market arbitrages it away (pairs trading 1987 -> gone within years; published edges -25%; real edges legally protected, FBI case Jan 2025).
  • Concrete guru setups don't hold up to the test: Justin Werlein's ICT model +107% vs +311% buy&hold (Sharpe 0.165, 3,500 trades); FVGs are eaten by transaction costs (break-even ~1.96 ticks < NQ costs 2-4); 8AM ORB ~12% prop pass, poor risk-adjusted return.
  • Stop hunts are a myth: market makers are delta-neutral; the real wicks come from overcrowded strategies with clustered stops. 'There is no algorithm' - prices = supply/demand (axiom).
  • Win rate and average RR are the wrong metrics; a single backtest proves nothing (path dependence); 'discretion' is not a hidden edge but unquantified feature engineering.
  • Overfitting is pervasive: looking at a holdout repeatedly + re-tuning turns it into a training set; then even Sharpe 3 and tight MC CIs are meaningless (garbage in, garbage out).
  • 'Psychology/discipline = 80% of trading' is a scapegoat narrative; sampling on every candle forecasts noise; social-proof testimonials are a survivorship/variance illusion.
  • Positive counterpart (what counts instead): expected value + confidence intervals, Monte Carlo for drawdown/path dependence, event-based sampling, regime awareness, real out-of-sample testing - exactly Botty's live_readiness pipeline.
  • SCALPING (measured by Botty itself, not just the literature): splits into two variants with two causes of death. TAKER scalping dies at the fee floor (~7bps round-trip; liquidation_mr_scout: 56% hit rate, yet net -7..-10bps, 0/7 years). MAKER market-making dies at adverse selection (spread_capture_scout, BTC/ETH/SOL 6+ yrs: 0/7 years at the HL base maker fee, only marginally & unstably positive with a guaranteed rebate + ultra-wide quote). Full derivation + visualization: [[scalping_market_making]].
P1 Keep these 'does-not-work' criteria as a negative checklist alongside live_readiness
Most ideas die for exactly these reasons (alpha decay, overfitting, costs, noise sampling). An explicit negative list speeds up early culling before sweep/backtest time is invested.
Implementation: Cross-link from live_readiness to this entry (done); for new strategy ideas, run through the 5 categories as a gate.
Evidence: Matches Botty's history: divergences_dead, vol_regime_transitions DEAD, macd_crossover sweep:False.
P2 Ensure transaction-cost realism in feature/setup tests
The FVG finding (break-even ~1.96 ticks < 2-4 ticks cost) shows how many 'edges' die on costs alone. Feature tests without realistic fees/slippage are misleading.
Implementation: In ml/ event studies and backtests, set costs/slippage explicitly against the measured edge; use TAKER_FEE_RATE + FUNDING in backtesting/config.py consistently.
Evidence: FVG short: edge present, but below costs.
P3 Evaluate tick/volume bars for distribution-sensitive statistics
Time-based returns are not IID Gaussian; z-scores/std-dev assumptions are then biased. Tick/volume sampling comes closer to normality (paper 1967).
Implementation: Evaluate optional tick/volume-based resampling in ml/ for tests where Gaussian/IID assumptions enter (e.g. z-score features, std-dev thresholds).
Evidence: Short 'When to Use Tick Bars'.

What this is about

This page bundles what, per the quant research of Thomas Skinner (Delta Trend Trading), provably does not work - and why (the mechanistic reason, not just 'it's bad'). Much of it is backed by programmatic backtests over thousands of trades. It is the negative flip side of Live Readiness: how to recognize early an idea you should discard. Full source base in the trader profile Delta Trend Trading.

1. Public or sold strategies (gurus, courses, signals, ICT/TJR)

Why not: Alpha decay is axiomatic - once an edge becomes public and capital flows in, traders crowd into the same trade until the inefficiency vanishes and the price converges to efficiency.

Evidence: Pairs trading (Morgan Stanley 1987, ~$50M) -> arbitraged away within a few years. Published edges lose ~25% after publication. Real edges are protected with hard legal means (unsealed FBI indictment Jan 2025 against a quant who stole trade secrets). Rule of thumb: 'If you can find it published, it doesn't work. If somebody's selling it, it doesn't work.'

2. Concrete guru setups (tested programmatically)

  • Justin Werlein's ICT model (liquidity sweep -> manipulation leg -> FVG inversion): 2018-2025, 3,500 trades -> +107%, but buy&hold NASDAQ +311% (3x more by doing nothing), Sharpe 0.165 (below 0.5 = uninvestable). Werlein himself is net break-even.
  • Fair Value Gaps (FVGs): the proximal gap tap is only profitable above ~1.96 ticks - the NQ all-in costs are 2-4 ticks -> eaten by costs (RTH/overnight/best hour all dead). Only a deeper entry (displacement open) barely survives (+0.13R). And that is only a feature test, not a strategy.
  • 8AM opening-range breakout (RP Profits): ~415 trades, 16% win rate, profit factor 0.87, miserable risk-adjusted return; ~12% prop pass.

3. Smart-money myths

  • Stop hunts are not real. Market makers are exchange-mandated, delta-neutral spread earners with no directional interest. Moving the most liquid assets in the world by a few pips just to clear out retail stops would be absurdly expensive. The 'stop hunt' wicks arise because an overcrowded strategy clusters all stops at the same level -> cascade. An overcrowded strategy is not an edge but a signal that smarter players fade.
  • 'There is no algorithm.' Prices are driven by supply/demand (axiom). The ICT 'algorithm' narrative is cult/scam.

4. False validation

  • Win rate & average RR answer neither 'am I making money?' (that is what expected value does) nor path dependence/drawdown (that is what Monte Carlo does).
  • A single backtest is just ONE realization of a stochastic process; the trade ordering produces completely different equity paths from the same distribution. Often the 90% CI of the per-trade EV spans zero -> indistinguishable from an edgeless strategy.
  • Overfitting: looking at a holdout repeatedly and adjusting parameters turns it into a training set. Then even Sharpe 3 + tight MC CIs + a smooth equity curve are meaningless. Required: a real one-time holdout + forward/live test + a falsifiable idea with an economic rationale.
  • 'Discretion' is not a hidden edge - just unquantified feature engineering. The strategy IS the edge.

5. Narratives & psychology

  • 'Psychology/discipline = 80-90% of trading' is a scapegoat: if it were true, quant firms wouldn't need PhDs for strategies executed by disciplineless machines. Long, inexplicable losing = you are trading something unvalidated.
  • Social proof/testimonials are a survivorship/variance illusion: variance produces short-term winners who post and get upvoted, then converge to baseline and disappear.
  • Sampling on every candle forecasts noise (no catalyst). You have to define real events.
  • Prediction markets (Polymarket/Kalshi) are not easy money: ~70% lose, 0.1% collect 67% of the winnings; the same quant/HFT firms trade there.

6. Scalping & market-making (measured by Botty itself)

Not from the literature but from our own scouts - and cleanly symmetric. Scalping (many micro-trades, each a few basis points) splits into two variants with two causes of death:

  • Taker scalping (market order, you take liquidity) dies at the fee floor: ~7bps round-trip (HL taker 3.5bps x 2). The liquidation_mr_scout (5,300 cascades, 7 years, fading liquidations) had a 56% hit rate - and still net -7 to -10bps in 0/7 years. The fee eats exactly the micro-edge.
  • Maker market-making (limit quotes, you provide liquidity, paying only ~1bp maker fee/rebate) is the only conceivable way out - but dies at the adverse-selection floor: the spread_capture_scout (BTC/ETH/SOL, 6+ yrs, mark-to-close) is negative everywhere at the HL base fee (0/7 years), and even with a rebate only crosses zero marginally & unstably. Mechanism: the clean 'both filled' bucket pays well (+2..+43bps) but is rare; the one-sided 'adverse selection' bucket (-1..-10bps) is 3-10x more frequent and dominates. Nice twist: the illiquid SOL is worse, not better - more spread, but exactly that much more adverse selection.

Lesson as a negative gate: every 'many small trades' idea must clear both floors BEFORE infrastructure - the fee floor and the adverse-selection floor. High hit rate != edge. Full derivation + ASCII visualization: scalping_market_making. The structurally clean 'scalping cousin' that passes our gates is the delta-neutral funding carry (live).

What this means for Botty

This list is the external validation of Botty's own discipline. Every 'doesn't work' has its counterpart in Botty's tools: alpha decay -> don't copy public setups; false validation -> the live_readiness pipeline (walk-forward, PBO, Monte Carlo); noise sampling -> event-based ml experiments; the stop-hunt myth -> focus on statistical instead of narrative explanations. Botty's own graveyards (divergences 3x debunked, vol_regime_transitions DEAD, macd_crossover) are lived examples of the same principles.