CIS520 Machine Learning | Lectures / Variable Importance

These can either be based entirely on input/output (e.g. the effect on prediction accuracy of removing the feature) or specific to the model in question.

Linear Models

A scaled measure of the regression coefficients.

E.g., the absolute value of the t-statistic for each coefficient, or the correlation of each feature with y.

Decision Trees

from the CART package: “To calculate a variable importance score, CART looks at the improvement measure attributable to each variable in its role as a either a primary or a surrogate splitter. The values of ALL these improvements are summed over each node and totaled, and are then scaled relative to the best performing variable. The variable with the highest sum of improvements is scored 100, and all other variables will have lower scores ranging downwards toward zero. A variable can obtain an importance score of zero in CART only if it never appears as either a primary or a surrogate splitter. Because such a variable plays no role anywhere in the tree, eliminating it from the data set should make no difference to the results”

Random Forest

from the R package: “For each tree, the prediction accuracy on the out-of-bag portion of the data is recorded. Then the same is done after permuting each predictor variable. The difference between the two accuracies are then averaged over all trees, and normalized by the standard error. For regression, the MSE is computed on the out-of-bag data for each tree, and then the same computed after permuting a variable. The differences are averaged and normalized by the standard error. If the standard error is equal to 0 for a variable, the division is not done.”

Back to Lectures