Feature Importance

Model guide

Logistic Regression Feature Importance

Logistic regression can be highly interpretable, but its coefficients are not changes in probability. They are changes in log-odds. For feature importance, that means you need to choose whether you care about signed direction, comparable magnitude, odds ratios, or validation performance.

Quick answer

For logistic regression, use signed coefficients to understand direction, standardized coefficients to compare feature magnitudes, odds ratios to communicate multiplicative changes in odds, and permutation importance to measure predictive reliance on validation data.

Method Answers Main caveat
Raw coefficients How does a one-unit change affect log-odds? Different feature scales are not comparable
Odds ratios How are odds multiplied by a one-unit change? Odds are not the same as probability
Permutation importance Which features matter to validation performance? Correlated features can hide each other

Fit a logistic regression model

Scaling is usually a good default for logistic regression, especially when using regularization. This example uses a pipeline so the same preprocessing is applied during training and evaluation.

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=42, stratify=y
)

model = make_pipeline(
    StandardScaler(),
    LogisticRegression(max_iter=500, random_state=42),
)
model.fit(X_train, y_train)

print(model.score(X_test, y_test))

Coefficients and log-odds

Logistic regression models the log-odds of the positive class. A positive coefficient increases the log-odds as the feature increases. A negative coefficient decreases them. Larger absolute coefficients have stronger model effects after accounting for feature scale.

import pandas as pd

classifier = model.named_steps["logisticregression"]

coefficients = pd.Series(
    classifier.coef_[0],
    index=X_train.columns,
).sort_values()

print(coefficients)

Because this model was fit after StandardScaler, these coefficients describe one-standard-deviation changes in each input, not one raw-unit changes.

Odds ratios

Exponentiating a logistic regression coefficient turns it into an odds ratio. An odds ratio above 1 increases the odds of the positive class. An odds ratio below 1 decreases the odds.

import numpy as np
import pandas as pd

odds_ratios = pd.Series(
    np.exp(classifier.coef_[0]),
    index=X_train.columns,
).sort_values()

print(odds_ratios)

Odds are not probabilities

An odds ratio of 2 does not mean the probability doubles. It means the odds double, holding the other model inputs constant.

Standardized coefficients

Standardized coefficients are often the cleanest coefficient-based importance ranking for logistic regression. They put features on a common scale before comparing coefficient magnitudes.

standardized_importance = coefficients.abs().sort_values(ascending=False)

print(standardized_importance.head(15))

Keep the signed coefficients nearby. Magnitude tells you strength; sign tells you direction.

Permutation importance

Coefficients describe the fitted equation. Permutation importance measures how much validation performance drops when a feature is shuffled. This is often better when you care about predictive reliance.

from sklearn.inspection import permutation_importance

result = permutation_importance(
    model,
    X_test,
    y_test,
    scoring="roc_auc",
    n_repeats=30,
    random_state=42,
)

permutation = pd.DataFrame(
    zip(X_test.columns, result.importances_mean, result.importances_std),
    columns=["feature", "mean", "std"],
).sort_values("mean", ascending=False)

print(permutation.head(15))

Use a metric that matches the decision. AUC, log loss, recall, precision, and accuracy can produce different rankings.

Regularization

Scikit-learn's LogisticRegression uses regularization by default. Smaller values of C mean stronger regularization. L2 regularization shrinks coefficients; L1 regularization can set some coefficients to zero, depending on the solver.

l1_model = make_pipeline(
    StandardScaler(),
    LogisticRegression(
        penalty="l1",
        solver="liblinear",
        C=0.5,
        max_iter=500,
        random_state=42,
    ),
)
l1_model.fit(X_train, y_train)

l1_classifier = l1_model.named_steps["logisticregression"]

l1_coefficients = pd.Series(
    l1_classifier.coef_[0],
    index=X_train.columns,
).sort_values()

print(l1_coefficients)

Do not treat a zero L1 coefficient as proof that a feature is irrelevant. With correlated features, the model may keep one feature and suppress another similar one.

Multiclass models

In multiclass logistic regression, coef_ has one row per class. Each class gets its own coefficient vector, so feature importance can differ by class.

from sklearn.datasets import load_wine

X_multi, y_multi = load_wine(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
    X_multi, y_multi, random_state=42, stratify=y_multi
)

multi_model = make_pipeline(
    StandardScaler(),
    LogisticRegression(max_iter=500, random_state=42),
)
multi_model.fit(X_train, y_train)

multi_classifier = multi_model.named_steps["logisticregression"]

class_0_coefficients = pd.Series(
    multi_classifier.coef_[0],
    index=X_train.columns,
).sort_values()

print(class_0_coefficients)

Do not collapse multiclass coefficients into one ranking unless you are clear about how you aggregated them.

Plot the results

Plot signed coefficients when direction matters. Plot absolute coefficients when you want a magnitude ranking.

import matplotlib.pyplot as plt

top_n = 12
plot_data = coefficients.reindex(
    coefficients.abs().sort_values(ascending=False).head(top_n).index
).sort_values()

ax = plot_data.plot.barh(
    figsize=(8, 6),
    color="#2563eb",
)
ax.set_title("Logistic regression coefficients")
ax.set_xlabel("Standardized coefficient")
ax.set_ylabel("")
plt.tight_layout()
plt.show()

For permutation importance, include uncertainty from repeated shuffles.

plot_data = permutation.head(top_n).sort_values("mean")

fig, ax = plt.subplots(figsize=(8, 6))
ax.barh(
    plot_data["feature"],
    plot_data["mean"],
    xerr=plot_data["std"],
    color="#2563eb",
)
ax.set_title("Permutation importance")
ax.set_xlabel("Mean validation score decrease")
ax.set_ylabel("")
plt.tight_layout()
plt.show()

How to interpret safely

  • Coefficients describe changes in log-odds, not direct changes in probability.
  • Odds ratios are often easier to communicate, but they are still not probabilities.
  • Standardized coefficients are better for comparing magnitudes across features.
  • Permutation importance is better for validation-performance reliance.
  • Correlated features can make coefficients unstable or suppress each other.
  • Coefficient signs and magnitudes are not causal unless the study design supports causality.

What to report

  • The target class and whether the model is binary or multiclass.
  • Whether coefficients are raw or standardized.
  • The regularization penalty, solver, and C value.
  • The validation metric and model performance.
  • Whether permutation importance agrees with coefficient rankings.
  • Known correlated features, leakage risks, and excluded columns.

Sources: scikit-learn LogisticRegression, scikit-learn logistic regression user guide, StandardScaler, and permutation importance.

Related guides