Comparison guide

SHAP vs Feature Importance

Feature importance usually ranks features by how much they matter to a model overall. SHAP values explain how feature values contribute to individual predictions, and those local explanations can be aggregated into global summaries.

The practical difference

Use feature importance when the question is about model-level reliance: which inputs matter most to predictive performance or to the fitted model's internal split behavior? Use SHAP when the question is about contribution: for this row, which feature values pushed the prediction up or down from a baseline?

Feature importance

Best for answering: which columns does this model rely on most, under this importance method?

SHAP values

Best for answering: how did each feature push this specific prediction away from a baseline?

Question	Better first tool	Why
Which columns matter most overall?	Permutation importance	It connects importance to a validation metric.
Why did this customer, claim, or loan score high?	SHAP	It decomposes one model output into feature contributions.
Do high feature values push predictions up or down?	SHAP summary plots	They show direction and spread, not only rank.
Is the model worse without this feature?	Permutation importance or ablation	SHAP explains predictions; it is not a retraining experiment.

Local vs global explanations

A local explanation describes one prediction. A global explanation summarizes behavior across many predictions. The distinction matters because SHAP starts local, while most feature importance methods start global.

Local SHAP explanation

For one row, SHAP values add up from a baseline value to the model output for that row. Positive values push the output higher in the chosen output space. Negative values push it lower.

Global SHAP summary

A global SHAP ranking is usually computed by taking the mean absolute SHAP value for each feature across a dataset. This is still an aggregation of local contributions, not a direct test of what happens when a feature is removed or permuted.

The baseline is part of the claim

A SHAP explanation is relative to an expected model output over a background distribution. Change the background data or output space, and the explanation can change.

How SHAP values differ from feature importance

Feature importance is a family of global ranking methods. SHAP is a local attribution method with a global summary option. Treating those outputs as interchangeable is the common mistake.

Unit: feature importance gives one score per feature; SHAP gives one contribution per feature per row and per model output.
Direction: many importance rankings only say "large" or "small"; SHAP values also show whether a feature value moved the output up or down.
Additivity: SHAP values are designed to add back to the explained model output plus a baseline. Standard feature importance scores usually do not have that row-level accounting.
Performance link: permutation importance measures a score drop. SHAP values explain model outputs and do not directly say how much a metric would fall without a feature.

Example with a tree model

For tree ensembles, shap.TreeExplainer is the usual practical starting point. Use a representative background sample when you want interventional-style explanations, especially for probability outputs.

import shap
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=42, stratify=y
)

model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

background = shap.sample(X_train, 100, random_state=42)
explainer = shap.TreeExplainer(
    model,
    data=background,
    model_output="probability",
    feature_perturbation="interventional",
)

explanation = explainer(X_test)

The returned explanation object contains feature-level contributions, feature values, base values, feature names, and output information when available. For a scikit-learn binary classifier, the result may contain one explanation per class, so select the output you intend to explain before plotting or reporting.

Beeswarm, bar, and waterfall-style usage

The three most useful SHAP views answer different questions. A beeswarm shows distribution and direction. A bar plot gives a compact global ranking. A waterfall plot explains one row.

# Choose one model output before plotting.
# For many scikit-learn binary classifiers, index 1 is the positive class.
positive_class = explanation[:, :, 1]

# Global: direction, spread, and rank across rows.
shap.plots.beeswarm(positive_class, max_display=15)

# Global: compact ranking by mean absolute SHAP value.
shap.plots.bar(positive_class.abs.mean(0), max_display=15)

# Local: one prediction explained from baseline to model output.
row_index = 0
shap.plots.waterfall(positive_class[row_index])

Do not turn every SHAP plot into a ranking

A bar plot hides heterogeneity. A feature can have a modest average contribution but be decisive for a subset of rows. Check the beeswarm or dependence-style plots before compressing the story to a top-10 list.

Classes and model-output caveats

SHAP values explain a specific model output. For regression this is usually straightforward. For classification, decide whether you are explaining a raw margin, a probability, a loss value, or a specific class output.

For binary classification, confirm whether the explanation has one output or two class outputs. Do not assume class index 1 is correct without checking the model's class order.
For multiclass classification, create separate summaries for the classes that matter. A single averaged plot can hide opposite effects across classes.
Raw outputs and probabilities are different explanation targets. A contribution in log-odds space should not be reported as a percentage point probability effect.
If you use model_output="probability", document the background dataset and perturbation mode used to compute the values.

# Inspect classes and output shape before selecting a class.
print(model.classes_)
print(explanation.values.shape)

positive_label_position = 1
positive_class = explanation[:, :, positive_label_position]

Correlated-feature limitations

SHAP values do not make correlation disappear. When two features carry similar information, attribution can be split, shifted, or made sensitive to the background distribution and feature-dependence assumption. This is especially important for engineered features, duplicate measurements, lagged variables, one-hot groups, and proxy variables.

Practical checks

Group one-hot encoded or highly related features when reporting.
Compare SHAP summaries with permutation importance or ablation tests.
Look for feature pairs where importance moves when one is removed.
Avoid causal language unless the data and study design support it.

When SHAP is overkill

SHAP is powerful, but it adds computational cost and interpretation complexity. For many model-review tasks, a simpler method is easier to defend.

If you only need a global ranking tied to validation performance, use permutation importance first.
If you are debugging a tree model during development, built-in tree importance may be enough.
If stakeholders do not need row-level explanations, SHAP plots may add detail without improving the decision.
If the model is linear and features are well prepared, coefficients plus permutation importance may be clearer.

Reporting checklist

A useful SHAP report states exactly what was explained and avoids treating attribution as proof of causality or model quality.

Model type, training data window, and evaluation dataset.
Explainer used, background dataset, and number of background rows.
Output explained: raw score, probability, loss, class, or target.
Whether the plot is local or global.
Top features by mean absolute SHAP value, with direction from beeswarm or dependence plots.
Known correlated feature groups and proxy variables.
Validation metric and a separate feature-importance or ablation check for high-stakes claims.
Plain-language caveat: the explanation is for the fitted model, not necessarily the real-world causal process.

Do not skip validation

SHAP explains model output. It does not prove the model is good, fair, causal, or suitable for a decision.