Discovering and Describing Category Differences: What makes a discovered difference insightful?

Stephen D. Bay and Michael J. Pazzani
Department of Information and Computer Science
University of California, Irvine
Irvine, CA 92697, USA
{sbay, pazzani}@ics.uci.edu

Abstract

Many organizations have turned to computer analysis of their data to deal with the explosion of available electronic data. The goal of this analysis is to gain insight and new knowledge about their core activities. A common query is comparing several different categories (e.g., customers who default on loans versus those that don't) to discover previously unknown differences between them. Current mining algorithms can produce rules which differentiate the groups with high accuracy, but often human domain experts find these results neither insightful nor useful. In this paper, we take a step toward understanding how humans interpret discovered rules by presenting a case study: we compare the responses of admissions officers (domain experts) on the output of two data mining algorithms which attempt to find out why admitted students choose to enroll or not enroll at UC Irvine. We analyze the responses and identify several factors that affect what makes the discovered rules insightful.

Postscript. PDF.

Home