Knowledge · Terms · CUSUM-Filter

CUSUM-Filter

Indicator concept
Cumulative-sum event filter (López de Prado / Page)
Accumulates returns in two buckets (up/down pressure) and marks an event as soon as a threshold is breached — then resets. A vol-adaptive event sampler ('don't sample every candle'), not a buy/sell signal.

What is the CUSUM filter?

CUSUM = Cumulative Sum. Originating in quality control (E. S. Page, 1954) as a 'control chart' for detecting a drift in the mean. Popularised in financial ML by López de Prado as an event filter — exactly the variant Thomas Skinner (Delta Trend Trading) shows in the video 'STOP Sampling Every Candle'.

Idea: two running sums of the returns — one for upward, one for downward pressure. When one breaches the threshold, an event is marked and the sum is reset to zero:

S_up = max(0, S_up + r_t)      # accumulated upward pressure
S_dn = min(0, S_dn + r_t)      # accumulated downward pressure

if S_up >=  h:  → Up-Event,   S_up = 0
if S_dn <= -h:  → Down-Event, S_dn = 0

r_t = (log) return per bar, h = threshold. Crucially: h is vol-/ATR-normalised (h = k·ATR) instead of fixed. With a fixed threshold you would get spammed with events during high-vol phases and almost none during quiet ones; vol-normalised, the events are evenly distributed across regimes.

What it gives us

The event sampler for the 'don't sample every candle' discipline:

  1. Noise out. Forecasting at every bar means predicting normal intraday noise with no catalyst → low accuracy. CUSUM filters down to the bars where a meaningful directional move has accumulated — the only ones with anything potentially learnable.
  2. Partner to Triple-Barrier. CUSUM first defines when an event occurs, then Triple-Barrier labels what happens afterwards. Exactly the 'event-defining → outcome features' pipeline.
  3. One tuning lever. k controls the event density: low = many events (towards noise), high = few (sample too small). You tune to a sensible frequency.

Important: the events are not signals. An up-event does not mean 'long'. It is only a timestamp: 'here it is worth looking'. The edge only emerges from contextualising features + a model on top of them.

Where it sits in Botty — the ML module

Home: ml/, as an event sampler (planned: ml/events.py or a function in ml/features.py) that returns a list of event timestamps. Experiments in ml/experiments/ use it to subset the bars before computing context features and labelling them via Triple-Barrier.

Not strategies/conditions/ — at least not first. This is Botty's architectural boundary: ml/ delivers findings, strategies/ implements signals only after validation. Since CUSUM events are explicitly not signals, an 'entry on CUSUM event' would be edge-less. Only once an ml/ study shows that CUSUM events + context have predictive power does a derived entry move into strategies/.

CUSUM vs. BOCPD

Both detect 'something has changed', but differently:

CUSUM filter BOCPD
Type frequentist, threshold-based Bayesian, probabilistic
Output event yes/no (reset) P(fresh structural break)
Cost very cheap, 1 parameter more expensive, model
Status in Botty proposed live (ml/forecast/bocpd_live.py)

For pure event sampling, CUSUM is lighter; as an early vol-warning signal, BOCPD is stronger. They don't compete — they complement each other.

Honest assessment

  • Research infrastructure (sampling), not an edge in itself. Belongs to the live-readiness discipline.
  • Don't confuse the two CUSUM variants: the event filter (López de Prado / Skinner — this one) and the classic changepoint detector (mean shift). For Botty we mean the event filter.
  • Causal & lookahead-free (only past returns).

Status in Botty: implemented in ml/events.py (cusum_events + vol_threshold); pilot pipeline in ml/experiments/cusum_triple_barrier/ (64% downsampling on 1h BTC, +1 base rate, walk-forward stable).