Library guide
XGBoost Feature Importance
XGBoost can rank features from the fitted booster itself, or you can measure feature reliance with permutation importance on validation data. The practical mistake is treating every importance column as the same thing. Gain, weight, cover, total gain, and permutation importance answer different questions.
Quick answer
For most practitioner reports, start with gain or
total_gain from the fitted booster, then compare the ranking
with permutation importance on held-out data. Built-in XGBoost
importance is fast and useful for model diagnostics. Permutation
importance is slower, but it is tied to the metric and dataset you care
about.
| Method | Best first use | Main caveat |
|---|---|---|
gain | Which features made useful splits on average | Averages can hide how often a feature was used |
total_gain | Overall contribution to split improvement | Can favor features used many times |
weight | Split frequency debugging | Frequent splits are not necessarily valuable splits |
| Permutation | Held-out metric reliance | Correlated features can mask each other |
Fit an XGBoost model
The examples below use the scikit-learn estimator interface because it
fits naturally into cross-validation, pipelines, metrics, and
sklearn.inspection.permutation_importance. Passing
importance_type makes the meaning of
feature_importances_ explicit.
from xgboost import XGBClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
import pandas as pd
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.25,
random_state=42,
stratify=y,
)
model = XGBClassifier(
n_estimators=300,
learning_rate=0.05,
max_depth=3,
subsample=0.9,
colsample_bytree=0.9,
eval_metric="logloss",
importance_type="gain",
random_state=42,
)
model.fit(X_train, y_train)
proba = model.predict_proba(X_test)[:, 1]
print("ROC AUC:", roc_auc_score(y_test, proba)) Always compute importance for a model whose validation performance is already acceptable. Feature importance from a weak or leaky model is usually a description of the modeling problem, not a trustworthy explanation.
Use feature_importances_
The scikit-learn wrapper exposes one score per feature through
feature_importances_. For tree boosters, the property uses
the estimator's configured importance_type, such as
gain, weight, cover,
total_gain, or total_cover.
gain_importance = pd.Series(
model.feature_importances_,
index=X_train.columns,
name="gain",
).sort_values(ascending=False)
print(gain_importance.head(10)) This is convenient when you already use the estimator API. It is less flexible than querying the underlying booster, especially when you want several importance types side by side.
Use get_score
model.get_booster().get_score() reads importance directly
from the fitted booster. XGBoost omits features that were never used in
a split, so reindex the result back to your full feature list before
sorting, joining, or reporting.
booster = model.get_booster()
gain = pd.Series(
booster.get_score(importance_type="gain"),
name="gain",
)
total_gain = pd.Series(
booster.get_score(importance_type="total_gain"),
name="total_gain",
)
weight = pd.Series(
booster.get_score(importance_type="weight"),
name="weight",
)
booster_importance = pd.concat(
[gain, total_gain, weight],
axis=1,
).reindex(X_train.columns).fillna(0)
booster_importance = booster_importance.sort_values(
"total_gain",
ascending=False,
)
print(booster_importance.head(10)) Feature name check
If you fit with a NumPy array instead of a DataFrame, the booster may
use generated names such as f0, f1, and
f2. Prefer fitting with a DataFrame or keep a reliable
mapping from model columns back to source feature names.
Importance types
XGBoost builds an additive model of trees. Each tree split chooses a feature and threshold that improve the objective. The built-in importance types summarize those split decisions after training.
weight
Counts how many times a feature is used to split data across the trees. It is useful for debugging tree structure, but it does not measure how much each split improved the model. A feature can appear often in small, low-value splits.
gain
Averages the improvement in the training objective for splits that used the feature. This is often the most useful built-in ranking when you want to know which features made strong splits when they were selected.
cover
Averages the coverage of splits that used the feature. Coverage is a measure of how much training data reached those splits, based on the instance weights or second-order statistics used by the booster. Treat it as a split-reach diagnostic, not as predictive value.
total_gain and total_cover
Sum gain or cover across all splits that used the feature. These can be better than averages when you care about aggregate model usage, but they naturally reward features that appear in many splits.
Plot importance
For quick inspection, XGBoost includes plot_importance. Use
an explicit importance_type and limit the number of
displayed features so the chart remains readable.
from xgboost import plot_importance
import matplotlib.pyplot as plt
ax = plot_importance(
model,
importance_type="gain",
max_num_features=15,
height=0.5,
show_values=False,
)
ax.set_title("XGBoost feature importance by gain")
ax.set_xlabel("Average gain")
plt.tight_layout()
plt.show()
For reports, you will often get a cleaner result by plotting your own
sorted Series. That lets you control labels, normalization,
confidence intervals, and side-by-side comparisons.
top_gain = gain_importance.head(15).sort_values()
ax = top_gain.plot.barh(figsize=(7, 5))
ax.set_title("Top XGBoost features by gain")
ax.set_xlabel("Gain importance")
ax.set_ylabel("")
plt.tight_layout()
plt.show() Compare with permutation importance
Built-in XGBoost importance describes the fitted booster's split behavior. Permutation importance asks a different question: how much does a chosen validation metric drop when one feature is shuffled?
from sklearn.inspection import permutation_importance
permutation = permutation_importance(
model,
X_test,
y_test,
n_repeats=20,
random_state=42,
scoring="roc_auc",
)
permutation_scores = pd.Series(
permutation.importances_mean,
index=X_test.columns,
name="permutation_auc_drop",
).sort_values(ascending=False)
comparison = pd.concat(
[
gain_importance.rename("xgboost_gain"),
permutation_scores,
],
axis=1,
).fillna(0)
comparison = comparison.sort_values(
"permutation_auc_drop",
ascending=False,
)
print(comparison.head(10)) Disagreement is not automatically a problem. It can reveal correlated features, redundant variables, train-test drift, leakage, or features that helped training-time splits without improving held-out performance.
Interpretation caveats
- Feature importance is model-specific. A different objective, preprocessing pipeline, random seed, or train-test split can change the ranking.
- Built-in importance is based on training-time split statistics. It does not prove that a feature improves future performance.
- Correlated predictors split credit. One feature can look weak because another feature carries similar information.
- High importance does not imply causality. It means the model used information associated with that feature under the fitted training setup.
- Leakage features often look extremely important. Investigate any top feature that would not be available, stable, or legitimate at prediction time.
- One-hot encoded or target-encoded features may need to be grouped back to their source field before the result is meaningful to a stakeholder.
What to report
A useful XGBoost feature-importance report should make the calculation reproducible and avoid overclaiming. Include the model objective, the validation metric, the dataset split, the importance type, and whether the ranking came from training-time booster statistics or held-out permutation tests.
Good phrasing
"On the validation split, shuffling mean radius reduced ROC AUC more than shuffling any other feature. In the fitted XGBoost model, mean radius also had the highest total gain. This suggests the model relies heavily on this variable, but it does not establish a causal effect."
Avoid phrasing
"Mean radius causes the prediction because it is the most important feature." That conclusion requires a causal design, not a feature importance ranking.
API reference: XGBoost Python API documentation .
Related guides
Feature Importance in Python
Compare XGBoost importance with scikit-learn, LightGBM, and SHAP workflows.
Permutation Importance
Use held-out performance drops to rank features by model reliance.
SHAP vs Feature Importance
Use SHAP when you need row-level contributions and directionality.
Feature Importance vs Correlation
Separate predictive reliance, association, leakage, and causal claims.