Overfitting e Backtesting

Quick Reference

Ottimizzare strategia su dati storici al punto che funziona perfettamente in backtest ma fallisce nel trading reale.

Definizione

Overfitting = Fitting al noise invece che al signal nei dati storici.

Risultato: Backtest fantastico, live trading disastroso.

Come Succede

Esempio Classico

Testo 100 parametri diversi per EWMAC: - 99 perdono soldi in backtest - 1 ha SR 2.5 (!!) - Ma: Quello "vincente" è solo lucky noise - Live trading: Performa come gli altri 99 (male)

Data mining casualty.

Danger Signs

Red Flags Overfitting

  1. SR > 1.5 in backtest (troppo bello)
  2. Troppi parametri testati (>5-10)
  3. Short data history (<5 anni)
  4. Complessa strategia con molte regole
  5. Perfect fit a eventi storici specifici
  6. No out-of-sample testing

Esempio Estremo

"Buy quando S&P scende 3 giorni + VIX >20 + è martedì + mese dispari"

Magari funziona perfetto 2010-2020, ma è overfitted nonsense.

Metodologie Anti-Overfitting

1. Out-of-Sample Testing

Split data: - 70% training (optimize) - 30% testing (validate)

Never look at test set durante optimization!

Problema: Se "sbirci" test set e re-optimize, hai contaminato.

2. Walk-Forward / Rolling Window

Progressive validation:

Train: 2000-2005 → Test: 2006
Train: 2001-2006 → Test: 2007
Train: 2002-2007 → Test: 2008
...

Più robusto di single train/test split.

3. Bootstrapping

Resample historical data: - Crea 1000 "alternative histories" - Test strategia su ciascuna - Median performance è stima realistica

Molto robusto ma computationally expensive.

4. Ideas-First Testing

Cruciale: 1. Decide trading idea BEFORE looking at data 2. Test idea once 3. Accept results 4. NO re-optimization based on results

Vs: Data mining (test 100 ideas, pick best).

Realistic SR Expectations

Discount Factors

Apply to backtest SR:

Methodology Discount Factor
Out-of-sample bootstrap 0.75
Rolling/Walk-forward 0.60
Single train/test split 0.50
In-sample only 0.25
Data mining 0.10

Esempio: - Backtest SR = 1.0 - Rolling window validation - Realistic SR = 1.0 × 0.60 = 0.60

Maximum Believable SR

Absolute ceiling: SR 1.0 per diversified systematic strategy.

Se backtest > 1.0: Probabile overfitting, anche con good methodology.

Overfitting vs Edge Decay

Overfitting

Never had real edge: - Fitted to noise - Random luck in backtest - Live: Immediate failure

Edge Decay

Had real edge, now gone: - Market changed - Competition increased - Strategy crowded

Different problems, different solutions.

Parameter Stability

Test robustness:

EWMAC optimal speed = 16/64: - Test 12/48, 20/80, 32/128 - Se tutti similar performance: Robusto ✓ - Se solo 16/64 works: Overfitted ✗

Robust strategies perform OK with parameter variations.

Complexity Penalty

Occam's Razor: Simpler is better.

Esempio:

Strategy A (simple): - Single EWMAC 16/64 - Backtest SR: 0.50 - 1 parameter

Strategy B (complex): - 10 rules, 50 parameters - Backtest SR: 0.55 - 50 parameters

Choose A! Più robusto, meno overfit risk.

Common Overfitting Traps

1. Optimization Hell

Optimize EWMAC speed → optimize forecast scalar → optimize position size → optimize stops...

Each optimization aumenta overfit risk.

Solution: Use standard parameters, minimal optimization.

2. Survivor Bias

Test solo strumenti ancora traded: - Miss failures - Overstate performance

Solution: Include delisted/dead instruments.

3. Look-Ahead Bias

Use future information in past: - Dividends known only ex-post - Revised economic data - Survivorship

Solution: Careful data cleaning, point-in-time data.

4. Transaction Cost Ignorance

Backtest ignores costs: - Looks profitable - After costs: Loss

Solution: Always include realistic costs.

Out-of-Sample Performance

What to expect:

Good scenario: - In-sample SR: 0.60 - Out-sample SR: 0.45 - Degradation: 25% (acceptable)

Bad scenario: - In-sample SR: 1.20 - Out-sample SR: 0.10 - Degradation: 92% (overfitted!)

Rule of thumb: Expect 20-40% degradation out-of-sample.

Statistical Significance

Not enough data per most tests:

10 years data, SR 0.50: - Standard error ≈ 0.33 - 95% confidence: SR between 0.17 and 0.83 - Huge uncertainty!

Lesson: Hard to distinguish skill from luck con limited data.

Live Trading Validation

Ultimate test: - Paper trade 6-12 mesi - Small real money 12+ mesi - Se performance in line con expectations: Procedi - Se disaster: Era overfitted

No shame in discovering overfitting early!

Preventing Overfitting

Checklist

  • [ ] Use ideas-first approach
  • [ ] Minimize parameters (<5)
  • [ ] Out-of-sample test
  • [ ] Walk-forward validation
  • [ ] Bootstrap if possible
  • [ ] Test parameter robustness
  • [ ] Apply discount factors to SR
  • [ ] Keep strategy simple
  • [ ] Include transaction costs
  • [ ] Paper trade first

Errori Comuni

  • Excessive optimization: Testing 100s parameters
  • Peeking at test set: Contaminating validation
  • Complex strategies: 10+ rules with interactions
  • Believing backtest SR > 1.0: Unrealistic
  • No out-of-sample: Using all data for optimization
  • Ignoring costs: Profitable gross, loss net
  • Survivor bias: Only testing winners
  • Not paper trading: Going live directly after backtest

Concetti Correlati

  • [[Sharpe Ratio]] - often overstated in backtests
  • [[Risk Adjusted Returns]] - apply discount factors
  • [[Transaction Costs]] - critical for realistic backtest