Overfitting e Backtesting
Quick Reference
Ottimizzare strategia su dati storici al punto che funziona perfettamente in backtest ma fallisce nel trading reale.
Definizione
Overfitting = Fitting al noise invece che al signal nei dati storici.
Risultato: Backtest fantastico, live trading disastroso.
Come Succede
Esempio Classico
Testo 100 parametri diversi per EWMAC: - 99 perdono soldi in backtest - 1 ha SR 2.5 (!!) - Ma: Quello "vincente" è solo lucky noise - Live trading: Performa come gli altri 99 (male)
Data mining casualty.
Danger Signs
Red Flags Overfitting
- SR > 1.5 in backtest (troppo bello)
- Troppi parametri testati (>5-10)
- Short data history (<5 anni)
- Complessa strategia con molte regole
- Perfect fit a eventi storici specifici
- No out-of-sample testing
Esempio Estremo
"Buy quando S&P scende 3 giorni + VIX >20 + è martedì + mese dispari"
Magari funziona perfetto 2010-2020, ma è overfitted nonsense.
Metodologie Anti-Overfitting
1. Out-of-Sample Testing
Split data: - 70% training (optimize) - 30% testing (validate)
Never look at test set durante optimization!
Problema: Se "sbirci" test set e re-optimize, hai contaminato.
2. Walk-Forward / Rolling Window
Progressive validation:
Train: 2000-2005 → Test: 2006
Train: 2001-2006 → Test: 2007
Train: 2002-2007 → Test: 2008
...
Più robusto di single train/test split.
3. Bootstrapping
Resample historical data: - Crea 1000 "alternative histories" - Test strategia su ciascuna - Median performance è stima realistica
Molto robusto ma computationally expensive.
4. Ideas-First Testing
Cruciale: 1. Decide trading idea BEFORE looking at data 2. Test idea once 3. Accept results 4. NO re-optimization based on results
Vs: Data mining (test 100 ideas, pick best).
Realistic SR Expectations
Discount Factors
Apply to backtest SR:
| Methodology | Discount Factor |
|---|---|
| Out-of-sample bootstrap | 0.75 |
| Rolling/Walk-forward | 0.60 |
| Single train/test split | 0.50 |
| In-sample only | 0.25 |
| Data mining | 0.10 |
Esempio: - Backtest SR = 1.0 - Rolling window validation - Realistic SR = 1.0 × 0.60 = 0.60
Maximum Believable SR
Absolute ceiling: SR 1.0 per diversified systematic strategy.
Se backtest > 1.0: Probabile overfitting, anche con good methodology.
Overfitting vs Edge Decay
Overfitting
Never had real edge: - Fitted to noise - Random luck in backtest - Live: Immediate failure
Edge Decay
Had real edge, now gone: - Market changed - Competition increased - Strategy crowded
Different problems, different solutions.
Parameter Stability
Test robustness:
EWMAC optimal speed = 16/64: - Test 12/48, 20/80, 32/128 - Se tutti similar performance: Robusto ✓ - Se solo 16/64 works: Overfitted ✗
Robust strategies perform OK with parameter variations.
Complexity Penalty
Occam's Razor: Simpler is better.
Esempio:
Strategy A (simple): - Single EWMAC 16/64 - Backtest SR: 0.50 - 1 parameter
Strategy B (complex): - 10 rules, 50 parameters - Backtest SR: 0.55 - 50 parameters
Choose A! Più robusto, meno overfit risk.
Common Overfitting Traps
1. Optimization Hell
Optimize EWMAC speed → optimize forecast scalar → optimize position size → optimize stops...
Each optimization aumenta overfit risk.
Solution: Use standard parameters, minimal optimization.
2. Survivor Bias
Test solo strumenti ancora traded: - Miss failures - Overstate performance
Solution: Include delisted/dead instruments.
3. Look-Ahead Bias
Use future information in past: - Dividends known only ex-post - Revised economic data - Survivorship
Solution: Careful data cleaning, point-in-time data.
4. Transaction Cost Ignorance
Backtest ignores costs: - Looks profitable - After costs: Loss
Solution: Always include realistic costs.
Out-of-Sample Performance
What to expect:
Good scenario: - In-sample SR: 0.60 - Out-sample SR: 0.45 - Degradation: 25% (acceptable)
Bad scenario: - In-sample SR: 1.20 - Out-sample SR: 0.10 - Degradation: 92% (overfitted!)
Rule of thumb: Expect 20-40% degradation out-of-sample.
Statistical Significance
Not enough data per most tests:
10 years data, SR 0.50: - Standard error ≈ 0.33 - 95% confidence: SR between 0.17 and 0.83 - Huge uncertainty!
Lesson: Hard to distinguish skill from luck con limited data.
Live Trading Validation
Ultimate test: - Paper trade 6-12 mesi - Small real money 12+ mesi - Se performance in line con expectations: Procedi - Se disaster: Era overfitted
No shame in discovering overfitting early!
Preventing Overfitting
Checklist
- [ ] Use ideas-first approach
- [ ] Minimize parameters (<5)
- [ ] Out-of-sample test
- [ ] Walk-forward validation
- [ ] Bootstrap if possible
- [ ] Test parameter robustness
- [ ] Apply discount factors to SR
- [ ] Keep strategy simple
- [ ] Include transaction costs
- [ ] Paper trade first
Errori Comuni
- Excessive optimization: Testing 100s parameters
- Peeking at test set: Contaminating validation
- Complex strategies: 10+ rules with interactions
- Believing backtest SR > 1.0: Unrealistic
- No out-of-sample: Using all data for optimization
- Ignoring costs: Profitable gross, loss net
- Survivor bias: Only testing winners
- Not paper trading: Going live directly after backtest
Concetti Correlati
- [[Sharpe Ratio]] - often overstated in backtests
- [[Risk Adjusted Returns]] - apply discount factors
- [[Transaction Costs]] - critical for realistic backtest