FAQs : What are cross-validation and bootstrapping?
Cross Validation
In k-fold cross-validation, you divide the data into k subsets of
(approximately) equal size. You train the net k times, each time leaving out one of the subsets from training, but using only the omitted subset to compute whatever error criterion interests you.
If k equals the sample size, this is called "leave-one-out" cross-validation. "Leave-v-out" is a more elaborate and expensive version of cross-validation that involves
leaving out all possible subsets of v cases.
Split Sample or Hold Out
cross-validation is quite different from the "split-sample" or "hold-out" method that is commonly used for early stopping in NNs. In the split-sample method, only a single subset (the validation set) is used to estimate the generalization error, instead of k different subsets; i.e., there is no "crossing".
While various people have suggested that cross-validation be applied to early stopping, the proper way of doing so is not obvious.
The rest of the document is interesting - it defines and discusses Jackknifing and Bootstrapping.
MATLAB neural network manual does not use either terms (as far as I can see, and I was mistaken earlier in thinking that it is called cross validation in MATLAB) - it uses the term "early stopping" for improving generalisation.
(pg 5-55, Neural Network Toolbox User's guide Version 4)
pps: there is more to the term 'cross validation', and the ambiguous way it is being used in the literature. I have seen more than one paper using the term in place of early stopping (etc). will investigate on that later if necessary - else will stick to the definition as above.
1 comment:
there is more to the term 'cross validation', and the ambiguous way it is being used in the literature. I have seen more than one paper using the term in place of early stopping (etc). will investigate on that later if necessary - else will stick to the definition as above.
Post a Comment