Botty · Machine Learning / Reinforcement Learning Trading

Early ML in finance since the 1990s; deep RL since ~2016 (DeepMind), crypto application from 2018

Machine Learning Evidence: Weak intraday to swing all (in research)

4/10

Relevance for Botty

Supervised learning on features -> signal, or a deep-RL agent learns an end-to-end policy. Promising in papers, fragile live.

Core concept

Instead of handcrafted rules, a model (random forest, gradient boosting, LSTM, or deep-RL agent) learns the trade signal directly from features. In theory ML can detect patterns humans do not see - in practice overfitting, data leakage and non-stationarity dominate.

Relevance for Botty

Relevance Score 4/10

Botty's ml/ module is not yet implemented. As a sensible entry point: a **random-forest classifier** that labels historical signals as win/loss and thus becomes a filter for existing strategies. Deep-RL end-to-end would be over-engineering given Botty's data volume and infrastructure.

Rules

Entry

Supervised ML: features (technical indicators, order book, funding, sentiment) -> label (sign of future return)
Train/val/test on a walk-forward basis
Generate the signal from the model probability or output
RL: state = feature vector, action = {long, short, flat}, reward = PnL

Exit

Supervised: inverse signal or probability threshold
RL: learns the exit endogenously as part of the policy

Parameter

Name	Typ. value	Description
feature_lookback	100-1000 bars	Windows for feature computation
retraining_frequency	weekly/monthly	Against drift
walkforward_window	out-of-sample 20-30%	Robust testing

Pros & Cons

Pros

Can capture complex, nonlinear patterns
Adaptive when retrained cleanly
Scales with feature availability (alt-data such as on-chain, sentiment)
Active research field with plenty of tool support (Freqtrade, QuantConnect, RLlib)

Cons

Extreme overfitting risk - 99% of backtests are fake
Non-stationary markets break models quietly and quickly
RL agents are especially fragile - a small change in the training setup = a completely different policy
Execution realism in backtests is usually poor
Interpretability is essentially zero - hard to debug why it won/lost

Typical performance

notes

No publicly known consistently profitable pure RL bots. Hybrids (ML as a feature generator for human strategies) work better.

gap reasons

overfitting, lookahead bias, regime change, execution costs not modeled

live performance

usually clearly negative after fees

paper performance

often 50-200% annualized in backtests

Bot suitability

High complexity, questionable added value. As a signal generator for human strategies (e.g. regime detection) it is more pragmatic than end-to-end.

Background

Variants

Supervised learning

Features -> label -> model: - Features: indicator values, order-book imbalance, funding rate, on-chain metrics, sentiment scores - Label: future return over horizon h - e.g. sign(close[t+24] / close[t] - 1) for a 24h forecast - Model: logistic regression, random forest, gradient boosting (XGBoost), LSTM

Signal: model output -> long / short / flat

Reinforcement learning

State -> action -> reward: - State: feature vector (as above) - Action: discrete {long, short, flat} or continuous (position size) - Reward: realized PnL per step - Model: PPO, DQN, A2C

The agent learns a policy: given a state, which action maximizes long-term cumulative reward.

The backtest-to-live gap

The most dangerous problem in ML trading: papers and backtests show 50-200% returns, live you lose money. Why:

Lookahead bias: features unintentionally incorporate future information (e.g. the close price at trade time instead of the open)
Data leakage: the train/test split violates time ordering
Overfitting: the model learns noise patterns of history, not structural regularities
Execution naivety: the backtest assumes a fill exactly at the close price. Reality: slippage, partial fills, fees
Non-stationarity: crypto market structure in 2020 is not that of 2026; training data is stale

What actually works

Hybrid approaches: - ML as a feature for human strategies (e.g. a regime classifier that says 'trending vs. ranging') - ML as a filter on existing signals (which setups are high-probability?) - ML for optimizing the parameters of a mechanical strategy

Pure end-to-end ML/RL bots have no broadly documented consistently profitable examples. Jane Street, Citadel, Two Sigma use ML - but as a building block within tightly controlled pipelines with risk management around them, not as an autonomous 'black-box trader'.

Warning from the literature

The ScienceDirect / arXiv papers on deep-RL trading almost all share a similar pattern:

Impressive in-sample results
Weak out-of-sample performance (but 'model improvement' proposed)
No live-trading follow-up studies
When live tests exist, they cover short periods in a single regime

Robo-trader research has a reproducibility-crisis problem.

Relevance for Botty

Botty's ml/ module is not implemented. A sensible path:

Stage 1 (pragmatic): a feature-based filter on existing strategies. Example: a random forest that labels historical EMA-crossover signals as win/loss with features (ADX, RSI, vol regime, funding) -> only trade when probability > 0.55.
Stage 2 (ambitious): a regime classifier that distinguishes between trend/range/transition and switches active strategies accordingly.
Stage 3 (research): deep RL on a multi-asset portfolio. Only with significant dev effort and realistic expectation management.