My Forecast Was Wrong — Until I Added Lag Features
When I first started building a sales forecasting model, I focused heavily on choosing the “best” machine learning algorithm — Linear Regression, Random Forest, XGBoost… you name it.
But no matter how many models I tried, the forecasts felt off — too smooth, too delayed, or just wrong.
That’s when I learned something that completely shifted my approach:
In time-based forecasting, what you feed the model matters more than which model you use.
And that’s where lag features come in.
🚨 The Problem
“I initially built a model using only time as input. The predictions were smooth, but totally disconnected from the real trend. That’s when I realized: the model had no memory of the past.”
✅ The Fix: Adding Lag Features
Lag features are values from the past that we bring into the current row.
Here’s what I added:
Sales_Lag_1
: Sales from 1 week agoSales_Lag_2
: Sales from 2 weeks agoSales_Lag_12
: Sales from the same week last quarterRolling_Mean_4
: 4-week rolling averageRolling_Std_4
: 4-week rolling standard deviation
“I added lag features like
lag_1
and arolling_mean_12
. Suddenly, the model started making smarter predictions.”
🔬 Case Study: Forecasting Sunspots
Dataset Used:
I used the publicly available Monthly Sunspots dataset — clean and ready to use.
📊 Model Comparison
Model | Description | MAE (Error) |
---|---|---|
No Lag Features | Used only time index | 60.28 |
With Lag Features | Used lag_1 + rolling_mean_12 | 13.65 |
✅ Result: Error reduced significantly by giving the model memory.
📈 Visual Comparison
🧠 Final Takeaway
Adding memory to your model through lag and rolling features can be the difference between guessing and forecasting.