This content originally appeared on DEV Community and was authored by MohammadReza Mahdian
I keep seeing people add higher-degree polynomials “just in case”.
Today I got tired of it and built the simplest possible counter-example.
Result: the quadratic model literally got a lower test R² than plain linear regression.
Let’s watch Occam’s Razor do its thing.
More complex model ≠ better model
(That’s literally Occam’s Razor in action)
A simple, reproducible example showing why a more complex model is not always better — even on perfectly linear data.
Today I created a perfectly linear dataset:
y = 2x + 3 + some random noise
Then I trained two models:
- Simple Linear Regression → Test R² ≈ 0.8626
- Polynomial Degree 2 (more complex) → Test R² ≈ 0.8623
Guess what?
The complex model did slightly better on training data…
but performed worse (or at best equal) on unseen test data.
This notebook walks through the entire experiment — step by step — with clean plots and real numbers.
It’s a classic illustration of overfitting and why Occam’s Razor matters in machine learning.
Let’s dive in.
1. Import Libraries & Generate Synthetic Linear Data with Noise
We generate a perfectly linear dataset based on the true relationship y = 2x + 3.
Then we add Gaussian noise to scatter the points slightly off the line — just like real-world data.
Why add noise?
Because in practice, data is never perfectly clean.
This noise tempts the polynomial model to fit random fluctuations instead of the underlying pattern — classic overfitting.
Meanwhile, the simple linear model ignores the noise and captures only the true signal.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score
Generate perfectly linear data with noise
x = np.arange(-5.0, 5.0, 0.1)
y_true = 2 * x + 3
y_noise = 2 * np.random.normal(size=x.size)
y = y_true + y_noise
Plot: Data + True Relationship
plt.figure(figsize=(10, 6))
plt.scatter(x, y, marker='o', color='blue', alpha=0.7, s=50, label='Observed Data', edgecolor='navy', linewidth=0.5)
plt.plot(x, y_true, color='red', linewidth=3, label='True Relationship (y = 2x + 3)')
plt.title('Synthetic Dataset with Ground Truth', fontsize=14, pad=15)
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='lower center', bbox_to_anchor=(0.5, -0.15), ncol=2, fancybox=True, shadow=True, fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
2. Convert to DataFrame & Train/Test Split
We convert the data into a pandas DataFrame for easier handling and cleaner syntax (e.g., df['x'] instead of indexing arrays).
Then we randomly split it into training (80%) and testing (20%) sets.
This split is crucial — it allows us to evaluate how well each model generalizes to unseen data, which is exactly where overfitting reveals itself.
X = x.reshape(-1, 1)
data = np.column_stack((x, y))
df = pd.DataFrame(data, columns=['x', 'y'])
msk = np.random.rand(len(df)) < 0.8
train = df[msk]
test = df[~msk]
train_x = train[['x']]
train_y = train[['y']]
test_x = test[['x']]
test_y = test[['y']]
3. Simple Linear Regression
We fit a basic linear model and evaluate it on the test set.
reg_linear = LinearRegression()
reg_linear.fit(train_x, train_y)
coef_linear = reg_linear.coef_[0]
intercept_linear = reg_linear.intercept_
pred_linear = reg_linear.predict(test_x)
r2_linear = r2_score(test_y, pred_linear)
print(f"r^2 SimpleRegressionLinear : {r2_linear:.4f}")
print(f"coef_ : {coef_linear}")
r^2 SimpleRegressionLinear : 0.8626
coef_ : [2.03889206]
plt.figure(figsize=(10, 6))
plt.scatter(df.x, df.y, color='blue', marker='o', alpha=0.7, s=50, label='Observed Data', edgecolor='darkblue', linewidth=0.5)
plt.plot(df.x, intercept_linear + coef_linear * df.x, color='red', linewidth=2.5, label='Linear Regression')
plt.title('Simple Linear Regression', fontsize=14, pad=15)
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='lower center', bbox_to_anchor=(0.5, -0.15), ncol=2, fancybox=True, shadow=True, fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
4. Polynomial Regression (Degree 2)
Now we increase model complexity by adding x² as a feature using PolynomialFeatures.
poly = PolynomialFeatures(degree=2)
poly_train_x = poly.fit_transform(train_x)
poly_test_x = poly.transform(test_x)
poly_train_x = poly.fit_transform(train_x)
poly_test_x = poly.fit_transform(test_x)
reg_poly = LinearRegression()
reg_poly.fit(poly_train_x, train_y)
poly_test_y_poly = reg_poly.predict(poly_test_x)
r2_poly = r2_score(test_y, poly_test_y_poly)
coef_poly = reg_poly.coef_[0]
intercept_poly = reg_poly.intercept_
print(f"r^2 PolynomialRegressionLinear : {r2_poly:.4f}")
print(f"coef_ : {coef_poly}")
r^2 PolynomialRegressionLinear : 0.8623
coef_ : [0. 2.03959043 0.00316868]
5. Final Comparison & Overfitting Detection
We compare both models side-by-side. Notice how the polynomial model fits the training noise slightly better — but performs worse on test data.
print(f"performance simple linear regression on train data : {reg_linear.score(train_x, train_y):.4f} and test data: {reg_linear.score(test_x, test_y):.4f}")
print(f"performance poly linear regression on train data : {reg_poly.score(poly_train_x, train_y):.4f} and test data: {reg_poly.score(poly_test_x, test_y):.4f}")
performance simple linear regression on train data : 0.8926 and test data: 0.8626
performance poly linear regression on train data : 0.8926 and test data: 0.8623
plt.figure(figsize=(10, 6))
plt.scatter(df.x, df.y, color='blue', marker='o', alpha=0.7, label='Observed Data')
plt.plot(df.x, intercept_linear + coef_linear * df.x, color='red', linewidth=2.5, label='Linear Regression')
plt.plot(df.x, intercept_poly + coef_poly[1] * df.x + coef_poly[2] * np.power(df.x, 2), color='green', linewidth=2.5, label='Polynomial Degree 2')
plt.title('Linear vs Polynomial Regression')
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='lower center', bbox_to_anchor=(0.5, -0.15), ncol=3, fancybox=True, shadow=True)
plt.grid(True, alpha=0.3)
plt.xlim(-5, 5)
plt.tight_layout()
plt.show()
Conclusion
- The true relationship is perfectly linear
- The polynomial model (more complex) slightly overfits the training noise
- The simpler linear model achieves better generalization on test data
- This demonstrates Occam's Razor: Among models with similar explanatory power, prefer the simpler one
Less is often more in machine learning.
This content originally appeared on DEV Community and was authored by MohammadReza Mahdian
MohammadReza Mahdian | Sciencx (2025-11-24T02:11:46+00:00) Occam’s Razor destroyed my polynomial model in 5 minutes. Retrieved from https://www.scien.cx/2025/11/24/occams-razor-destroyed-my-polynomial-model-in-5-minutes/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.


