Model guide
Logistic Regression Feature Importance
Logistic regression can be highly interpretable, but its coefficients are not changes in probability. They are changes in log-odds. For feature importance, that means you need to choose whether you care about signed direction, comparable magnitude, odds ratios, or validation performance.
Quick answer
For logistic regression, use signed coefficients to understand direction, standardized coefficients to compare feature magnitudes, odds ratios to communicate multiplicative changes in odds, and permutation importance to measure predictive reliance on validation data.
| Method | Answers | Main caveat |
|---|---|---|
| Raw coefficients | How does a one-unit change affect log-odds? | Different feature scales are not comparable |
| Odds ratios | How are odds multiplied by a one-unit change? | Odds are not the same as probability |
| Permutation importance | Which features matter to validation performance? | Correlated features can hide each other |
Fit a logistic regression model
Scaling is usually a good default for logistic regression, especially when using regularization. This example uses a pipeline so the same preprocessing is applied during training and evaluation.
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, random_state=42, stratify=y
)
model = make_pipeline(
StandardScaler(),
LogisticRegression(max_iter=500, random_state=42),
)
model.fit(X_train, y_train)
print(model.score(X_test, y_test)) Coefficients and log-odds
Logistic regression models the log-odds of the positive class. A positive coefficient increases the log-odds as the feature increases. A negative coefficient decreases them. Larger absolute coefficients have stronger model effects after accounting for feature scale.
import pandas as pd
classifier = model.named_steps["logisticregression"]
coefficients = pd.Series(
classifier.coef_[0],
index=X_train.columns,
).sort_values()
print(coefficients)
Because this model was fit after StandardScaler, these
coefficients describe one-standard-deviation changes in each input, not
one raw-unit changes.
Odds ratios
Exponentiating a logistic regression coefficient turns it into an odds ratio. An odds ratio above 1 increases the odds of the positive class. An odds ratio below 1 decreases the odds.
import numpy as np
import pandas as pd
odds_ratios = pd.Series(
np.exp(classifier.coef_[0]),
index=X_train.columns,
).sort_values()
print(odds_ratios) Odds are not probabilities
An odds ratio of 2 does not mean the probability doubles. It means the odds double, holding the other model inputs constant.
Standardized coefficients
Standardized coefficients are often the cleanest coefficient-based importance ranking for logistic regression. They put features on a common scale before comparing coefficient magnitudes.
standardized_importance = coefficients.abs().sort_values(ascending=False)
print(standardized_importance.head(15)) Keep the signed coefficients nearby. Magnitude tells you strength; sign tells you direction.
Permutation importance
Coefficients describe the fitted equation. Permutation importance measures how much validation performance drops when a feature is shuffled. This is often better when you care about predictive reliance.
from sklearn.inspection import permutation_importance
result = permutation_importance(
model,
X_test,
y_test,
scoring="roc_auc",
n_repeats=30,
random_state=42,
)
permutation = pd.DataFrame(
zip(X_test.columns, result.importances_mean, result.importances_std),
columns=["feature", "mean", "std"],
).sort_values("mean", ascending=False)
print(permutation.head(15)) Use a metric that matches the decision. AUC, log loss, recall, precision, and accuracy can produce different rankings.
Regularization
Scikit-learn's LogisticRegression uses regularization by
default. Smaller values of C mean stronger regularization.
L2 regularization shrinks coefficients; L1 regularization can set some
coefficients to zero, depending on the solver.
l1_model = make_pipeline(
StandardScaler(),
LogisticRegression(
penalty="l1",
solver="liblinear",
C=0.5,
max_iter=500,
random_state=42,
),
)
l1_model.fit(X_train, y_train)
l1_classifier = l1_model.named_steps["logisticregression"]
l1_coefficients = pd.Series(
l1_classifier.coef_[0],
index=X_train.columns,
).sort_values()
print(l1_coefficients) Do not treat a zero L1 coefficient as proof that a feature is irrelevant. With correlated features, the model may keep one feature and suppress another similar one.
Multiclass models
In multiclass logistic regression, coef_ has one row per
class. Each class gets its own coefficient vector, so feature importance
can differ by class.
from sklearn.datasets import load_wine
X_multi, y_multi = load_wine(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
X_multi, y_multi, random_state=42, stratify=y_multi
)
multi_model = make_pipeline(
StandardScaler(),
LogisticRegression(max_iter=500, random_state=42),
)
multi_model.fit(X_train, y_train)
multi_classifier = multi_model.named_steps["logisticregression"]
class_0_coefficients = pd.Series(
multi_classifier.coef_[0],
index=X_train.columns,
).sort_values()
print(class_0_coefficients) Do not collapse multiclass coefficients into one ranking unless you are clear about how you aggregated them.
Plot the results
Plot signed coefficients when direction matters. Plot absolute coefficients when you want a magnitude ranking.
import matplotlib.pyplot as plt
top_n = 12
plot_data = coefficients.reindex(
coefficients.abs().sort_values(ascending=False).head(top_n).index
).sort_values()
ax = plot_data.plot.barh(
figsize=(8, 6),
color="#2563eb",
)
ax.set_title("Logistic regression coefficients")
ax.set_xlabel("Standardized coefficient")
ax.set_ylabel("")
plt.tight_layout()
plt.show() For permutation importance, include uncertainty from repeated shuffles.
plot_data = permutation.head(top_n).sort_values("mean")
fig, ax = plt.subplots(figsize=(8, 6))
ax.barh(
plot_data["feature"],
plot_data["mean"],
xerr=plot_data["std"],
color="#2563eb",
)
ax.set_title("Permutation importance")
ax.set_xlabel("Mean validation score decrease")
ax.set_ylabel("")
plt.tight_layout()
plt.show() How to interpret safely
- Coefficients describe changes in log-odds, not direct changes in probability.
- Odds ratios are often easier to communicate, but they are still not probabilities.
- Standardized coefficients are better for comparing magnitudes across features.
- Permutation importance is better for validation-performance reliance.
- Correlated features can make coefficients unstable or suppress each other.
- Coefficient signs and magnitudes are not causal unless the study design supports causality.
What to report
- The target class and whether the model is binary or multiclass.
- Whether coefficients are raw or standardized.
- The regularization penalty, solver, and
Cvalue. - The validation metric and model performance.
- Whether permutation importance agrees with coefficient rankings.
- Known correlated features, leakage risks, and excluded columns.
Sources: scikit-learn LogisticRegression, scikit-learn logistic regression user guide, StandardScaler, and permutation importance.
Related guides
Linear Regression Feature Importance
Compare coefficient interpretation for regression and classification.
Permutation Importance
Use validation performance drops to rank model reliance.
Feature Importance vs Correlation
Separate association, predictive reliance, proxies, and causal claims.
Model Interpretability
Place logistic regression explanations in a broader workflow.