Botty · Bayesianische Optimierung

Bayesian Optimization (BO)

Smart parameter search: a probabilistic model (usually a Gaussian process) of the objective proposes the next most promising test point, balancing exploration vs. exploitation. Finds good optima in 10-50 instead of thousands of backtests - but tends toward sharper overfitting and must be embedded in walk-forward.

Problem

A trading strategy often has many parameters (stop distance, EMA period, filter thresholds). Testing every combination means running a backtest — and a backtest costs seconds to minutes. The search space grows exponentially: 5 parameters with 10 values each = 100,000 backtests. Grid search (try everything) and random search (sample randomly) waste almost all of their evaluations in obviously bad regions.

Idea

Bayesian optimization (BO) searches intelligently: it builds a probabilistic surrogate model of the objective f(parameter) -> performance and uses it to pick the next most promising point — instead of blindly scanning a grid.

The standard surrogate is a Gaussian process (GP): for any untested parameter point x it provides not just a prediction μ(x) but also an uncertainty σ(x). An acquisition function turns both into a decision — it balances:

Exploitation — test where μ(x) is high (probably good),
Exploration — test where σ(x) is high (still unknown).

Common acquisition functions:

  Upper Confidence Bound:   UCB(x) = μ(x) + κ · σ(x)
  Expected Improvement:     EI(x)  = E[ max(f(x) − f_best, 0) ]

The procedure

Evaluate f at a few random starting points.
Fit the Gaussian process to all (x, y) pairs so far.
Maximize the acquisition function -> next test point x*.
Evaluate f(x*) (= one backtest), add the result.
Back to 2 — until the budget is exhausted.

This way BO often finds good optima in 10-50 evaluations where grid search needs thousands. Well-known tools: Optuna (TPE sampler), scikit-optimize, Hyperopt, BoTorch.

The trap in trading

BO is designed to chase the peak — and that is precisely what is dangerous in trading:

Sharper overfitting than grid. BO aggressively hunts the highest backtest value. If that peak is just luck (a lucky cluster of trades), BO finds it more reliably than grid search and overfits harder.
Noisy objectives. Backtest Sharpe is noisy; the Gaussian process, however, assumes a smooth function. Without noise modeling (or repeated evaluation) BO chases the noise.
Never optimize on the test set. BO must run inside a walk-forward loop and optimize only on train data; the out-of-sample validation evaluation stays untouched.

How Botty optimizes

Botty deliberately uses NO Bayesian optimization, but a grid sweep (backtesting/megasweep.py): phase 1 a coarse grid over all structural combinations, phase 2 a finer parameter grid of the top structures, phase 3 walk-forward validation. The reason: a coarse, uniform grid + a hard walk-forward hurdle is more robust against overfitting than an aggressive optimizer that finds the luckiest backtest peak. BO would be a candidate to speed up the expensive phase-2 search — but only with a noise model and strictly within the train window.

Not to be confused: Botty's BOCPD (Bayesian Online Change-Point Detection, ml/forecast/bocpd_live.py) is a feature for regime detection — it only shares the name "Bayesian" and has nothing to do with parameter optimization.

Trade-offs

✅ Drastically fewer evaluations than grid/random — ideal for expensive backtests. ✅ Delivers uncertainty too, not just a point estimate.

❌ Easily overfits noisy trading objectives — must be embedded in walk-forward. ❌ More complex (GP, acquisition, kernel choice) than a simple grid. ❌ With very many parameters (>20) the Gaussian process itself becomes expensive/unstable.