Model guide

Linear Regression Feature Importance

Linear regression can be one of the most interpretable models, but feature importance is not just "sort the coefficients." Raw coefficients, standardized coefficients, regularized coefficients, and permutation importance answer different questions.

Quick answer

Use raw coefficients when you want to explain change in the original units. Use standardized coefficients when you want to compare feature strength across different scales. Use permutation importance when you want to know which features matter to predictive performance.

Method	Answers	Main caveat
Raw coefficients	What is the effect per one original unit?	Not comparable across different units
Standardized coefficients	Which variables have larger model effects on a common scale?	Less direct in real-world units
Permutation importance	Which variables matter to validation performance?	Correlated features can hide each other

Raw coefficients

A linear regression model predicts by adding each feature multiplied by its coefficient. A coefficient estimates how much the prediction changes when that feature increases by one unit, holding the other features constant.

from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import pandas as pd

X, y = load_diabetes(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=42
)

model = LinearRegression()
model.fit(X_train, y_train)

coefficients = pd.Series(
    model.coef_,
    index=X_train.columns,
).sort_values()

print(coefficients)

Raw coefficients preserve units, which makes them useful for domain interpretation. They are not a fair ranking when features have different scales.

Standardized coefficients

To compare coefficient magnitudes across features, fit the model on standardized inputs. After scaling, a larger absolute coefficient means the model changes more when that feature moves by one standard deviation.

from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
import pandas as pd

pipeline = make_pipeline(
    StandardScaler(),
    LinearRegression(),
)
pipeline.fit(X_train, y_train)

linear_model = pipeline.named_steps["linearregression"]

standardized = pd.Series(
    linear_model.coef_,
    index=X_train.columns,
)

importance = standardized.abs().sort_values(ascending=False)

print(importance)

Keep the signed standardized coefficients too. The magnitude gives a rough strength ranking; the sign tells you whether the model increases or decreases the prediction as the feature increases.

Permutation importance

Coefficients explain the fitted equation. Permutation importance answers a different question: how much does validation performance drop when this feature is shuffled?

from sklearn.inspection import permutation_importance

result = permutation_importance(
    pipeline,
    X_test,
    y_test,
    scoring="r2",
    n_repeats=30,
    random_state=42,
)

permutation = pd.DataFrame(
    zip(X_test.columns, result.importances_mean, result.importances_std),
    columns=["feature", "mean", "std"],
).sort_values("mean", ascending=False)

print(permutation)

For many practitioner reports, standardized coefficients and permutation importance together are more useful than either one alone.

Ridge, Lasso, and Elastic Net

Regularized linear models change coefficient values on purpose. Ridge shrinks coefficients. Lasso can push some coefficients to exactly zero. Elastic Net mixes both penalties.

from sklearn.linear_model import ElasticNetCV
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
import pandas as pd

model = make_pipeline(
    StandardScaler(),
    ElasticNetCV(cv=5, random_state=42),
)
model.fit(X_train, y_train)

elastic_net = model.named_steps["elasticnetcv"]

regularized = pd.Series(
    elastic_net.coef_,
    index=X_train.columns,
).sort_values()

print(regularized)

Do not interpret a zero Lasso coefficient as proof that the feature has no real-world value. With correlated features, Lasso may choose one feature and suppress another similar one.

Multicollinearity

Linear regression coefficients can become unstable when features are strongly correlated. Two useful features can compete for the same signal, causing signs and magnitudes to change across samples or model settings.

correlations = X_train.corr().abs()
high_corr = correlations.stack().reset_index()
high_corr.columns = ["feature_a", "feature_b", "correlation"]

high_corr = high_corr[
    high_corr["feature_a"] < high_corr["feature_b"]
].sort_values("correlation", ascending=False)

print(high_corr.head(10))

If correlated features are important, report them as a group or compare results after removing redundant columns. Avoid pretending the model has precisely separated their individual effects.

Plot the results

Plot signed coefficients when direction matters. Plot absolute values when you want a magnitude ranking.

import matplotlib.pyplot as plt

top_n = 12
plot_data = standardized.reindex(
    standardized.abs().sort_values(ascending=False).head(top_n).index
).sort_values()

ax = plot_data.plot.barh(
    figsize=(8, 6),
    color="#2563eb",
)
ax.set_title("Standardized linear regression coefficients")
ax.set_xlabel("Coefficient after standardizing inputs")
ax.set_ylabel("")
plt.tight_layout()
plt.show()

If you plot permutation importance, include the variation across repeats.

plot_data = permutation.head(top_n).sort_values("mean")

fig, ax = plt.subplots(figsize=(8, 6))
ax.barh(
    plot_data["feature"],
    plot_data["mean"],
    xerr=plot_data["std"],
    color="#2563eb",
)
ax.set_title("Permutation importance")
ax.set_xlabel("Mean validation score decrease")
ax.set_ylabel("")
plt.tight_layout()
plt.show()

How to interpret safely

Raw coefficients are best for unit-based interpretation.
Standardized coefficients are better for comparing feature magnitudes.
Permutation importance is better for predictive reliance.
Coefficients are conditional on the other features in the model.
Correlated features can make rankings unstable.
Coefficients are not causal unless the data and study design support causal claims.

What to report

A linear model can look transparent while still being easy to misinterpret. Report enough context for another practitioner to judge the explanation.

The model type: ordinary least squares, Ridge, Lasso, or Elastic Net.
Whether coefficients are raw or standardized.
The validation metric and model performance.
Any scaling, transformations, or feature engineering.
Important correlated feature groups.
Whether permutation importance agrees with coefficient rankings.

Sources: scikit-learn LinearRegression, StandardScaler, permutation importance, and linear models user guide.

Linear Regression Feature Importance

Quick answer

Raw coefficients

Standardized coefficients

Permutation importance

Ridge, Lasso, and Elastic Net

Multicollinearity

Plot the results

How to interpret safely

What to report

Related guides

Feature Importance in Python

Permutation Importance

Feature Importance vs Correlation

Model Interpretability