Botty · DSR

Deflated Sharpe Ratio

A Sharpe ratio corrected for the number of strategies tried, the sample length and the non-normality of returns. Answers: is this Sharpe real, or just the winner of many attempts?

The problem it solves

If you test 1000 strategy variants, the best one will have a high Sharpe by pure chance — even if none has a real edge. The raw Sharpe of the winning strategy is then selection-biased and lies. This is exactly where the Deflated Sharpe Ratio (DSR) comes in.

Definition

By Bailey & López de Prado (2014). The DSR is the probability that the true Sharpe is greater than zero, given:

Number of trials N — how many variants were tried (multiple-testing penalty),
Sample length — short histories are less reliable,
Skew and kurtosis of the returns — fat tails / skew make the Sharpe estimate shakier.

DSR = Prob( true SR > 0 | observed SR, N trials, T, skew, kurtosis )

Core mechanic: from N trials you compute the expected maximum Sharpe under pure chance (benchmark). Only what clearly clears this bar survives. A Sharpe of 2.0 from a single test is strong; the same 2.0 as the best value out of 5000 sweeps is often indistinguishable from luck.

Interpretation

The DSR is read as a probability (0..1): > 0.95 = the Sharpe survives the multiple-testing correction. Near 0.5 or below = the observed value is plausibly noise.

How Botty uses it

backtesting/deflated_sharpe.py implements the DSR as part of the overfit defence — sister to the PBO (Probability of Backtest Overfitting). While the PBO checks via rankings in the sweep whether the selection overfit, the DSR deflates the Sharpe value itself by the number of looks. Both address the same sin: looking many times and celebrating the lucky hit.

Limits

N is often hard to count honestly. Every informal parameter peek is a trial — the true trial count is usually higher than documented.
Assumes a sensible trial distribution. Strongly correlated variants don't count like independent tests.