Knowledge · Strategies · Machine Learning / Reinforcement Learning Trading

Machine Learning / Reinforcement Learning Trading

Early ML in finance since the 1990s; deep RL since ~2016 (DeepMind), crypto application from 2018
Machine Learning Evidence: Weak intraday to swing all (in research)
4/10
Relevance for Botty
Supervised learning on features -> signal, or a deep-RL agent learns an end-to-end policy. Promising in papers, fragile live.
Instead of handcrafted rules, a model (random forest, gradient boosting, LSTM, or deep-RL agent) learns the trade signal directly from features. In theory ML can detect patterns humans do not see - in practice overfitting, data leakage and non-stationarity dominate.
Relevance Score 4/10
Botty's ml/ module is not yet implemented. As a sensible entry point: a **random-forest classifier** that labels historical signals as win/loss and thus becomes a filter for existing strategies. Deep-RL end-to-end would be over-engineering given Botty's data volume and infrastructure.

Entry

  • Supervised ML: features (technical indicators, order book, funding, sentiment) -> label (sign of future return)
  • Train/val/test on a walk-forward basis
  • Generate the signal from the model probability or output
  • RL: state = feature vector, action = {long, short, flat}, reward = PnL

Exit

  • Supervised: inverse signal or probability threshold
  • RL: learns the exit endogenously as part of the policy
NameTyp. valueDescription
feature_lookback 100-1000 bars Windows for feature computation
retraining_frequency weekly/monthly Against drift
walkforward_window out-of-sample 20-30% Robust testing

Pros

  • Can capture complex, nonlinear patterns
  • Adaptive when retrained cleanly
  • Scales with feature availability (alt-data such as on-chain, sentiment)
  • Active research field with plenty of tool support (Freqtrade, QuantConnect, RLlib)

Cons

  • Extreme overfitting risk - 99% of backtests are fake
  • Non-stationary markets break models quietly and quickly
  • RL agents are especially fragile - a small change in the training setup = a completely different policy
  • Execution realism in backtests is usually poor
  • Interpretability is essentially zero - hard to debug why it won/lost
notes
No publicly known consistently profitable pure RL bots. Hybrids (ML as a feature generator for human strategies) work better.
gap reasons
overfitting, lookahead bias, regime change, execution costs not modeled
live performance
usually clearly negative after fees
paper performance
often 50-200% annualized in backtests
High complexity, questionable added value. As a signal generator for human strategies (e.g. regime detection) it is more pragmatic than end-to-end.

Variants

Supervised learning

Features -> label -> model: - Features: indicator values, order-book imbalance, funding rate, on-chain metrics, sentiment scores - Label: future return over horizon h - e.g. sign(close[t+24] / close[t] - 1) for a 24h forecast - Model: logistic regression, random forest, gradient boosting (XGBoost), LSTM

Signal: model output -> long / short / flat

Reinforcement learning

State -> action -> reward: - State: feature vector (as above) - Action: discrete {long, short, flat} or continuous (position size) - Reward: realized PnL per step - Model: PPO, DQN, A2C

The agent learns a policy: given a state, which action maximizes long-term cumulative reward.

The backtest-to-live gap

The most dangerous problem in ML trading: papers and backtests show 50-200% returns, live you lose money. Why:

  1. Lookahead bias: features unintentionally incorporate future information (e.g. the close price at trade time instead of the open)
  2. Data leakage: the train/test split violates time ordering
  3. Overfitting: the model learns noise patterns of history, not structural regularities
  4. Execution naivety: the backtest assumes a fill exactly at the close price. Reality: slippage, partial fills, fees
  5. Non-stationarity: crypto market structure in 2020 is not that of 2026; training data is stale

What actually works

Hybrid approaches: - ML as a feature for human strategies (e.g. a regime classifier that says 'trending vs. ranging') - ML as a filter on existing signals (which setups are high-probability?) - ML for optimizing the parameters of a mechanical strategy

Pure end-to-end ML/RL bots have no broadly documented consistently profitable examples. Jane Street, Citadel, Two Sigma use ML - but as a building block within tightly controlled pipelines with risk management around them, not as an autonomous 'black-box trader'.

Warning from the literature

The ScienceDirect / arXiv papers on deep-RL trading almost all share a similar pattern:

  • Impressive in-sample results
  • Weak out-of-sample performance (but 'model improvement' proposed)
  • No live-trading follow-up studies
  • When live tests exist, they cover short periods in a single regime

Robo-trader research has a reproducibility-crisis problem.

Relevance for Botty

Botty's ml/ module is not implemented. A sensible path:

  1. Stage 1 (pragmatic): a feature-based filter on existing strategies. Example: a random forest that labels historical EMA-crossover signals as win/loss with features (ADX, RSI, vol regime, funding) -> only trade when probability > 0.55.

  2. Stage 2 (ambitious): a regime classifier that distinguishes between trend/range/transition and switches active strategies accordingly.

  3. Stage 3 (research): deep RL on a multi-asset portfolio. Only with significant dev effort and realistic expectation management.