Computational Induction of Scientific Process Models

This project aims to develop a framework that unifies two separate but central themes in information technology -- computational simulation of models to explain important phenomena and computational induction of knowledge from observed regularities in data. Unlike most previous work in machine learning and data mining, the approach emphasizes methods that generate knowledge in established scientific formalisms, incorporate domain knowledge where possible, focus on causal and explanatory models, address induction from observational time-series data, and are embedded in a simulation environment which scientists can use for model development.

Our approach revolves around a new class of models that consist of interacting quantitative processes and the problem of inducing such models from time-series data. Computational challenges that we will address include reducing overfitting and variance, inducing conditions on processes, handling large, heterogeneous data sets with missing values, and scaling to complex models. We will incorporate the resulting algorithms in a trainable simulation environment that lets users construct models manually or induce them from data, then simulate their behavior. Experimental evaluation will involve both Earth Science observations from the Ross Sea and synthetic data.

The trainable simulation environment will let Earth scientists search the space of candidate models systematically, producing more accurate models in much less time. Moreover, the novel computational methods should aid model construction in other fields like systems biology and engineering. Both the environment and sample models will be utilized in courses and accessible at future incarnations of this Web site.

Our early work on this effort was funded by NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, with later support coming through Grant IIS-0326059 from the National Science Foundation. Current funding comes from Grant N00014-11-1-0107 from the Office of Naval Research. Researchers currently involved in the effort include Pat Langley and Adam Arvay. Past contributers to the project include Will Bridewell, Kevin Arrigo, Richard Billington, Chunki Park, Christine Desmarais, and Bernard Widrow. Nima Asgharbeygi, Narges Bani Asadi, Tahir Azim, Dorrit Billman, Stuart Borrett, Matthew Bravo, Jed Crosby, Yi Ding, Matthew Janes, Danny Korenblum, Fabian Lischka, Stephen Racunas, Nikhil Raghavan, Tamar Shinar, Oren Shiran, and Jeff Shrager. In addition, the ISLE/Stanford team collaborates with Saso Dzeroski and Ljupco Todorovski in the Department of Intelligent Systems at the Jozef Stefan Institute in Ljubljana, Slovenia.

Inductive Process Modeling Software

Find out more about the Prometheus modeling environment and download an initial version.

Related Publications

Langley, P., & Arvay, A. (2019). Scientific discovery, process models, and the social sciences. In M. Addis, P. C. R. Lane, P. D. Sozou, & F. Gobet (Eds.), Scientific discovery in the social sciences. Heidelberg: Springer.
Langley, P. (2019). Scientific discovery, causal explanation, and process model induction. Mind & Society, 18, 43-56.
Langley, P., & Arvay, A. (2017). Flexible model induction through heuristic process discovery. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (pp. 4415-4421). San Francisco: AAAI Press.
Arvay, A., & Langley, P. (2016). Selective induction of rate-based process models. Proceedings of the Fourth Annual Conference on Cognitive Systems. Evanston, IL.
Arvay, A., & Langley, P. (2016). Heuristic adaptation of scientific process models. Advances in Cognitive Systems, 4, 207-226.
Arvay, A., & Langley, P. (2015). Heuristic adaptation of rate-based process models. Proceedings of the Third Annual Conference on Cognitive Systems. Atlanta, GA.
Langley, P., & Arvay, A. (2015). Heuristic induction of rate-based process models. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (pp. 537-543). Austin, TX: AAAI Press.
Todorovski, L., Bridewell, W., & Langley, P. (2012). Discovering constraints for inductive process modeling. Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (pp. 256-262). Toronto: AAAI Press.
Park, C., Bridewell, W., & Langley, P. (2010). Integrated systems for inducing spatio-temporal process models. Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (pp. 1555-1560). Atlanta: AAAI Press.
Bridewell, W., & Todorovski, L. (2010). The induction and transfer of declarative bias. Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (pp. 401-406). Atlanta: AAAI Press.
Bridewell, W., & Langley, P. (2010). Two kinds of knowledge in scientific discovery. Topics in Cognitive Science, 2, 36-52.
Bridewell, W., Borrett, S. R., & Langley, P. (2009). Supporting the construction of dynamic scientific models. In A. Markman & & K. L. Wood (Eds.), Tools for innovation. New York: Oxford University Press.
Langley, P., & Bridewell, W. (2008). Processes and constraints in explanatory scientific discovery. Proceedings of the Thirtieth Annual Meeting of the Cognitive Science Society. Washingon, D.C.
Bridewell, W., Langley, P., Todorovski, L., & Dzeroski, S. (2008). Inductive process modeling. Machine Learning, 71, 1-32.
Bridewell, W., Borrett, S., & Todorovski, L. (2007). Extracting constraints for process modeling. Proceedings of the Fourth International Conference on Knowledge Capture (pp. 87-94). Whistler, BC.
Bridewell, W., & Todorovski, L. (2007). Learning declarative bias. Proceedings of the Seventeenth International Conference on Inductive Logic Programming. Corvallis, OR.
Borrett, S. R., Bridewell, W., Langley, P., & Arrigo, K. R. (2007). A method for representing and developing process models. Ecological Complexity, 4, 1-12.
Bridewell, W., Sanchez, J. N., Langley, P., & Billman, D. (2006). An interactive environment for the modeling and discovery of scientific knowledge. International Journal of Human-Computer Studies, 64, 1099-1114.
Bridewell, W., Langley P., Racunas, S., & Borrett, S. R. (2006). Learning process models with missing data. Proceedings of the Seventeenth European Conference on Machine Learning (pp. 557-565). Berlin: Springer.
Langley, P., Shiran, O., Shrager, J., Todorovski, L., & Pohorille, A. (2006). Constructing explanatory process models from biological data and knowledge. AI in Medicine, 37, 191-201.
Asgharbeygi, N., Bay, S., Langley, P., & Arrigo, K. (2006). Inductive revision of quantitative process models. Ecological Modelling, 194, 70-79.
Bridewell, W., Bani Asadi, N., Langley, P., & Todorovski, L. (2005). Reducing overfitting in process model induction. Proceedings of the Twenty-Second International Conference on Machine Learning (pp. 81-88). Bonn, Germany.
Todorovski, L., Bridewell, W., Shiran, O., & Langley, P. (2005). Inducing hierarchical process models in dynamic domains. Proceedings of the Twentieth National Conference on Artificial Intelligence (pp. 892-897). Pittsburgh, PA: AAAI Press.
Asgharbeygi, N., Bay, S., Langley, P., & Arrigo, K. (2004). Computational revision of ecological process models. Proceedings of the Fourth International Workshop on Environmental Applications of Machine Learning (pp. 13-14). Bled, Slovenia.
Langley, P., Shrager, J., Asgharbeygi, N., Bay, S., & Pohorille, A. (2004). Inducing explanatory process models from biological time series Proceedings of the Ninth Workshop on Intelligent Data Analysis and Data Mining (pp. 85-90). Stanford, CA.
Lavrac, N., Motoda, H., Fawcett, T., Holte, R., Langley, P., & Adriaans, P. (2004). Lessons learned from data mining applications and collaborative problem solving. Machine Learning, 57, 13-34.
George, D., Saito, K., Langley, P., Bay, S., & Arrigo, K. (2003). Discovering ecosystem models from time-series data. Proceedings of the Sixth International Conference on Discovery Science (pp. 141-152). Saporro, Japan: Springer.
Sanchez, J. N., & Langley, P. (2003). An interactive environment for scientific model construction. Proceedings of the Second International Conference on Knowledge Capture (pp. 138-145). Sanibel Island, FL: ACM Press.
Langley, P., George, D., Bay, S., & Saito, K. (2003). Robust induction of process models from time-series data. Proceedings of the Twentieth International Conference on Machine Learning (pp. 432-439).
Langley, P., Sanchez, J., Todorovski, L., & Dzeroski, S. (2002). Inducing process models from continuous data. Proceedings of the Nineteenth International Conference on Machine Learning (pp. 347-354). Sydney: Morgan Kaufmann.

Project-Related Presentations

For more information, send electronic mail to langley@isle.org