The problem it solves
If you test 1000 strategy variants, the best one will have a high Sharpe by pure chance — even if none has a real edge. The raw Sharpe of the winning strategy is then selection-biased and lies. This is exactly where the Deflated Sharpe Ratio (DSR) comes in.
Definition
By Bailey & López de Prado (2014). The DSR is the probability that the true Sharpe is greater than zero, given:
- Number of trials N — how many variants were tried (multiple-testing penalty),
- Sample length — short histories are less reliable,
- Skew and kurtosis of the returns — fat tails / skew make the Sharpe estimate shakier.
DSR = Prob( true SR > 0 | observed SR, N trials, T, skew, kurtosis )
Core mechanic: from N trials you compute the expected maximum Sharpe under pure chance (benchmark). Only what clearly clears this bar survives. A Sharpe of 2.0 from a single test is strong; the same 2.0 as the best value out of 5000 sweeps is often indistinguishable from luck.
Interpretation
The DSR is read as a probability (0..1): > 0.95 = the Sharpe survives the multiple-testing correction. Near 0.5 or below = the observed value is plausibly noise.
How Botty uses it
backtesting/deflated_sharpe.py implements the DSR as part of the overfit defence — sister to the PBO (Probability of Backtest Overfitting). While the PBO checks via rankings in the sweep whether the selection overfit, the DSR deflates the Sharpe value itself by the number of looks. Both address the same sin: looking many times and celebrating the lucky hit.
Limits
- N is often hard to count honestly. Every informal parameter peek is a trial — the true trial count is usually higher than documented.
- Assumes a sensible trial distribution. Strongly correlated variants don't count like independent tests.