Model guide
Linear Regression Feature Importance
Linear regression can be one of the most interpretable models, but feature importance is not just "sort the coefficients." Raw coefficients, standardized coefficients, regularized coefficients, and permutation importance answer different questions.
Quick answer
Use raw coefficients when you want to explain change in the original units. Use standardized coefficients when you want to compare feature strength across different scales. Use permutation importance when you want to know which features matter to predictive performance.
| Method | Answers | Main caveat |
|---|---|---|
| Raw coefficients | What is the effect per one original unit? | Not comparable across different units |
| Standardized coefficients | Which variables have larger model effects on a common scale? | Less direct in real-world units |
| Permutation importance | Which variables matter to validation performance? | Correlated features can hide each other |
Raw coefficients
A linear regression model predicts by adding each feature multiplied by its coefficient. A coefficient estimates how much the prediction changes when that feature increases by one unit, holding the other features constant.
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import pandas as pd
X, y = load_diabetes(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, random_state=42
)
model = LinearRegression()
model.fit(X_train, y_train)
coefficients = pd.Series(
model.coef_,
index=X_train.columns,
).sort_values()
print(coefficients) Raw coefficients preserve units, which makes them useful for domain interpretation. They are not a fair ranking when features have different scales.
Standardized coefficients
To compare coefficient magnitudes across features, fit the model on standardized inputs. After scaling, a larger absolute coefficient means the model changes more when that feature moves by one standard deviation.
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
import pandas as pd
pipeline = make_pipeline(
StandardScaler(),
LinearRegression(),
)
pipeline.fit(X_train, y_train)
linear_model = pipeline.named_steps["linearregression"]
standardized = pd.Series(
linear_model.coef_,
index=X_train.columns,
)
importance = standardized.abs().sort_values(ascending=False)
print(importance) Keep the signed standardized coefficients too. The magnitude gives a rough strength ranking; the sign tells you whether the model increases or decreases the prediction as the feature increases.
Permutation importance
Coefficients explain the fitted equation. Permutation importance answers a different question: how much does validation performance drop when this feature is shuffled?
from sklearn.inspection import permutation_importance
result = permutation_importance(
pipeline,
X_test,
y_test,
scoring="r2",
n_repeats=30,
random_state=42,
)
permutation = pd.DataFrame(
zip(X_test.columns, result.importances_mean, result.importances_std),
columns=["feature", "mean", "std"],
).sort_values("mean", ascending=False)
print(permutation) For many practitioner reports, standardized coefficients and permutation importance together are more useful than either one alone.
Ridge, Lasso, and Elastic Net
Regularized linear models change coefficient values on purpose. Ridge shrinks coefficients. Lasso can push some coefficients to exactly zero. Elastic Net mixes both penalties.
from sklearn.linear_model import ElasticNetCV
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
import pandas as pd
model = make_pipeline(
StandardScaler(),
ElasticNetCV(cv=5, random_state=42),
)
model.fit(X_train, y_train)
elastic_net = model.named_steps["elasticnetcv"]
regularized = pd.Series(
elastic_net.coef_,
index=X_train.columns,
).sort_values()
print(regularized) Do not interpret a zero Lasso coefficient as proof that the feature has no real-world value. With correlated features, Lasso may choose one feature and suppress another similar one.
Multicollinearity
Linear regression coefficients can become unstable when features are strongly correlated. Two useful features can compete for the same signal, causing signs and magnitudes to change across samples or model settings.
correlations = X_train.corr().abs()
high_corr = correlations.stack().reset_index()
high_corr.columns = ["feature_a", "feature_b", "correlation"]
high_corr = high_corr[
high_corr["feature_a"] < high_corr["feature_b"]
].sort_values("correlation", ascending=False)
print(high_corr.head(10)) If correlated features are important, report them as a group or compare results after removing redundant columns. Avoid pretending the model has precisely separated their individual effects.
Plot the results
Plot signed coefficients when direction matters. Plot absolute values when you want a magnitude ranking.
import matplotlib.pyplot as plt
top_n = 12
plot_data = standardized.reindex(
standardized.abs().sort_values(ascending=False).head(top_n).index
).sort_values()
ax = plot_data.plot.barh(
figsize=(8, 6),
color="#2563eb",
)
ax.set_title("Standardized linear regression coefficients")
ax.set_xlabel("Coefficient after standardizing inputs")
ax.set_ylabel("")
plt.tight_layout()
plt.show() If you plot permutation importance, include the variation across repeats.
plot_data = permutation.head(top_n).sort_values("mean")
fig, ax = plt.subplots(figsize=(8, 6))
ax.barh(
plot_data["feature"],
plot_data["mean"],
xerr=plot_data["std"],
color="#2563eb",
)
ax.set_title("Permutation importance")
ax.set_xlabel("Mean validation score decrease")
ax.set_ylabel("")
plt.tight_layout()
plt.show() How to interpret safely
- Raw coefficients are best for unit-based interpretation.
- Standardized coefficients are better for comparing feature magnitudes.
- Permutation importance is better for predictive reliance.
- Coefficients are conditional on the other features in the model.
- Correlated features can make rankings unstable.
- Coefficients are not causal unless the data and study design support causal claims.
What to report
A linear model can look transparent while still being easy to misinterpret. Report enough context for another practitioner to judge the explanation.
- The model type: ordinary least squares, Ridge, Lasso, or Elastic Net.
- Whether coefficients are raw or standardized.
- The validation metric and model performance.
- Any scaling, transformations, or feature engineering.
- Important correlated feature groups.
- Whether permutation importance agrees with coefficient rankings.
Sources: scikit-learn LinearRegression, StandardScaler, permutation importance, and linear models user guide.
Related guides
Feature Importance in Python
Compare coefficient-based importance with model-agnostic workflows.
Permutation Importance
Use held-out performance drops to measure model reliance.
Feature Importance vs Correlation
Separate linear association from model reliance and causality.
Model Interpretability
Place coefficients and feature importance in a broader explanation workflow.