NestedCross-validation
The simplest "honest" estimate of test error requires breaking the data into three sets:
1) a training set, which is used to estimate the model parameters: {$argmin_w Loss(y,f(x;w))$}
2) a validation set, which is used to select the hyperparameters, by searching for the hyperparameters that, when used to train on the training set, give the lowest error on the validation set.
3) a test set, that is used to estimate how accurate the model is. The model, recall, was trained on the training set using the hyperparameters which gave the lowest Loss on the validation set. We now test it on a third data set.
If there is not enough labeled data, we can approximate the above procedure using nested cross-validation.
1. Outer loop: Divide the data into 10 roughly equal pieces. Remove each of the tenths (as your test set) and then do step (1) with the remaining 90% to get a model. Evaluate that model on the held-out tenth. The average loss over all 10 held-out 10ths is your test set loss estimate.
2: Inner Loop 2a. For each of the 90 (the training set) to fit the model for each different value of hyperparameters. Pick the hyperparameter set that does best on average over all 10 held-out ("validation") sets.
2b. Train a model on the entire training and validation set (90% of the total data) using the selected hyperparameters. This is the model to evaluate on the 10% test set.
3. After you have done the above on each of the 10 models, you can estimate the total error as the average. If you want a single model, retrain it on all the data using the average (or median or such) of the hyperparameters that were used on each of the 10 "outer" folds.