Learning Speed Curves

Research Project Name

Learning Speed Curves: Prediction of Average Case Learning Using VC-Dimension Analysis and Regression

Objective

The objective of this project is to predict the learning speed curve for an inductive learning algorithm, when given just a small number of examples drawn from the target distribution. This objective differs form most existing research in that the goal is to predict average case performance, not worse case performance; and to produce results for both noisy and noise-free input.

Results

Existing general regression techniques are analyzed with respect to their ability to accurately create predictive learning-speed curves.

A new method of general regression is presented, and implemented in a system called SEER. The new method’s model, called the Effective Dimension Model, is based on the Vapnik-Chervonenkis dimension.

The described experimental results show that SEER accurately predicts, from a small sample of cases, with and without noise, the number of cases required to achieve a desired level of classification accuracy.

The resulting average learning speed curves can be used in various ways. If the goal is to achieve a particular inductive learning accuracy, the prediction algorithm predicts how many examples are needed. If an additional number of training examples are planned to be collected, the prediction algorithm predicts the resultant accuracy. If there is a difference in time between collecting noise-free and noisy-examples, SEER predicts which approach produces the highest quality level of classification accuracy.

Publications

Carl Myers Kadie (1995). "Seer: Maximum Likelihood Regression for Learning-Speed Curves," Ph.D. Dissertation, Department of Computer Science, University of Illinois, Urbana-Champaign, July 1995, 106 pages. pdf

Kadie, C. M. and Wilkins, D. C., Speed Curves: Prediction of Average Case Performance Using VC-Dimension Analysis and Regression, draft manuscript, 42 pages. pdf