Stephen Bay
Research |
Publications |
KDD Archive |
Software |
Teaching |
Fun Stuff
Research
I am interested in Machine Learning, Data Mining, and Knowledge Discovery. I have worked on projects in the following areas:
- Detecting anomalies and outliers in large data sets.
I am interested in using machine learning techniques to discover outliers and anomalies in large and complex data sets. I ran a symposium on this topic with Pat Langley.
- Revising models of gene regulation.
- Monitoring and Modeling the Space Station Electrical Power System. The International Space Station is an extremely large and complicated system that is composed of many interacting components. I examined methods for building models to be used in monitoring that use both machine learning techniques combined with domain knowledge.
- Detecting group differences. An important descriptive analysis task is understanding the differences between several contrasting groups from observational data. These groups could represent different categories of objects, such as male or female students, or the same category over multiple time periods as with consecutive cohorts of students (e.g. computer science students in 1993 through to 1999).
- Cognitive aspects of knowledge discovery. To date most work in data mining has not paid attention to cognitive factors which affect the usability of discovered results by humans. Thus many mining algorithms return results that either not credible or are just not interesting to a user. I have begun investigating some cognitive factors which affect humans ability to use data mining results.
- Combining nearest neighbor classifiers. There are many combining algorithms that dramatically improve the performance of classifiers such as decision trees and neural networks but are not effective with nearest neighbor classifiers. I examined methods for combining nearest neighbor classifiers to improve accuracy.
Professional Activities
- Symposium on Machine Learning for Anomaly Detection, Stanford 2004
- Program Committee, ACM SIGKDD, 2004
- Program Committee, IEEE International Conference on Data Mining, ICDM 2003
- Program Committee, International Conference on Machine Learning, ICML 2002, 2003, 2004
- Program Committee, International Workshop on Active Mining (AM-2002), ICDM 2002
- Panelist, NSF Information and Intelligent Systems, 1999
Publications
Complete list (reverse chronological)
Some recent papers:
- Bay, S., Saito, K., Ueda, N., Langley, P. A Framework for Discovering Anomalous Regimes in Multivariate Time-Series Data with Local Models.
- Bay, S., Chrisman, L., Pohorille, A., and Shrager, J. (to appear). Temporal Aggregation Bias and Inference of Causal Regulatory Networks. Journal of Computational Biology
- J. Shrager, R. Labiosa, JP Massar, M. Travers, S. Bay, J. Elhai, A. Pohorille, K. Arrigo, P. Langley, D. Bhaya, and A. Grossman. (2004) The BioLingua Multi-Cyanobacterial BioComputation Platform, and its Application in Cyclodynamic Microarray Analysis of Cyanobacterial Light Acclimation. Gordon Research Conference. Roscoff, France.
- Shrager, J., Labiosa, R., Bay, S., Arrigo, K., Bhaya, D., Tu, C., Grossman, A. (2004). Genome-wide Analyses of Light Driven and Circadian Expression Response in Synechocystis PCC 6803. Proceedings of the 12th American Geophysical Union Ocean Sciences Meeting.
- George, D., Saito, K., Langley, P., Bay, S., & Arrigo, K. (2003). Discovering ecosystem models from time-series data. Proceedings of the Sixth International Conference on Discovery Science. Postscript.
- Bay, S. D., and Schwabacher, M. (2003). Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule. Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. PDF. Postscript. Slides.
My BibTeX file myrefs.bib.
Knowledge Discovery in Databases Archive
I used to be the librarian and maintainer for the new UCI KDD Archive. This is an online repository of large databases which is designed to encompass a wide variety of data types and analysis tasks. It expands on the current UCI Machine Learning Archive by storing databases which are much larger and involve other tasks than just classification.
Software
Teaching
I taught ICS 171: Introduction to Artificial Intelligence at UCI in Summer 2000. The course webpage has detailed problem sets with solutions. If you are an instructor, I can make available the latex source.
Fun Stuff
Long Beach, California: jellyfish
Balboa Park, San Diego: roots
Venice Beach, California: two curious dogs, Kara
San Francisco, California: Exploratorium, Golden Gate Bridge
Ensenada, Mexico: a little island, La Bufadora (calm), La Bufadora (active), KC
Air and Space Museum, Washington D.C.: Wright Flyer
Juggling: 3 Clubs, 5 Balls
Banff, Alberta: 10000+ft, at the top, a wild eep (Does anybody know what animal this really is?)
Arches National Park, Utah: a neat looking rock, underneath it
YellowStone, Wyoming: Old Faithful, Elmerald Pool, Smoke Mountain, somewhere in the park, Grand Canyon of Yellowstone
Grand Canyon, Arizona: North Side
Moab, Utah: slickrock , more slickrock
Algonquin Park, Ontario: resting after a 2km portage, a moose
Salt Lake City, Utah: biggest pipe organ I have ever seen
Don't try this at home: fire
My Favorite Stories
Funny Stuff
Links: ArtGangLa, Eamonn, Engine Turning, photos, SFBookArts.com