Random forests

Given n observations with p predictors.

Input: {$m << p$} the fraction of the predictors to sample (often sqrt(p)) , and {$f$}, the fraction of the data to use for training

Repeat many times:

Choose a training set by choosing f*N training cases (with replacement). This is called {$bagging$}
Build a decision tree as follows
- For each node of the tree, randomly choose m variables and find the best split from among those m variables
- repeat until the full tree is built (no pruning)
  - Sometimes people just do this with “stumps” — a single split.

To predict, take the modal classification (‘majority vote’) over all the trees.

Random Forests