Characterizing Model Errors and Differences

Stephen D. Bay and Michael J. Pazzani
Department of Information and Computer Science
University of California, Irvine
Irvine, CA 92697, USA
{sbay, pazzani}@ics.uci.edu

Abstract

A critical component of applying machine learning algorithms is evaluating the performance of the models induced and using the evaluation to guide further development. Traditionally the most common evaluation metric is error or loss, however this provides very little information for the designer to use when constructing a system. We argue that an evaluation method should provide detailed feedback on the performance of an algorithm and that this feedback should be in the language of the problem: Our goal is to characterize model errors or the differences between models in the feature space. We provide a framework for this that allows different algorithms to be used as the discovery engine and we consider two approaches: (1) a classification strategy where we use a standard rule learner such as C5; (2) a descriptive paradigm where we use a new discovery algorithm: a contrast set miner. We show that C5 suffers from several problems that make it unsuitable for this task.

Postscript. PDF. Slides

Home