Computational Scientific Discovery

I became fascinated with the nature of scientific discovery as an undergraduate at TCU, and the interest has remained to this day. My dissertation work at CMU focused on Bacon, an AI system that rediscovered numeric laws from the history of physics. Herbert Simon served as my advisor and contributed many ideas to the effort. Gary Bradshaw and I extended the system to handle additional laws, including ones from the history of chemistry. After Jan Zytkow joined our group, we developed new systems (Stahl and Dalton) that dealt with the discovery of qualitative laws and structural models. This CMU work forms the basis of my early publications on scientific discovery, culminating in our 1987 book (see below).

After moving to UCI, I continued my collaborations with graduate students there. Donald Rose and I refined the Stahl work on chemical discovery and devised a hill-climbing system, Revolver, that handled aspects of particle physics. Randy Jones and I developed Eureka, a computational model of scientific insight that relied on analogical reasoning combined with spreading activiation retrieval. And with Bernd Nordhausen, I constructed IDS, a system that integrated our previous work on taxonomy formation, discovery of qualitative laws, and finding numeric relations. In addition, Jeff Shrager and I organized a symposium on scientific discovery and edited a book reporting recent work in the area.

My activities in scientific discovery slowed down during my times at NASA Ames, Siemens, and Stanford, but for funding reasons rather than for lack of interest. After this hiatus, a collaboration with Sakir Kocabas led to some new results in particle physics and astrophysics, followed by joint work with Jeff Shrager, Kazumi Saito, Mark Schwabacher, Chris Potter, Andrew Pohorille, and others on approaches to computational discovery in biology and in Earth science.

Over the past decade, most of my discovery research has focused on a new framework, inductive process modeling, that combines background knowledge in the form of generic processes with time-series data to construct explanatory models stated as sets of differential equations. The basic approach carries out exhaustive search through a space of model structures followed by gradient descent through the parameter space for each candidate structure. Later work extended the framework to use constraints among processes to guide search through the structure space and even to induce constraints to discriminate between successful and unsuccessful structures. This effort involved collaborations with Will Bridewell, Ljupco Todorovski, Saso Dzeroski, Kevin Arrigo, Suart Borrett, and many others.

Most recently, Adam Arvay and I have developed and implemented a new approach to inducing process models that associates a rate expression with each process P and that assumes each of P's derivatives is proportional to this rate. Together, these assumptions let us estimate parameters not with gradient descent search, which requires repeated simulations and can find local optima, but with multiple linear regression. The resulting systems are both far more robust than their predecessors and run nearly a million times faster even on simple tasks. Our latest efforts have extended this approach to support adaptation of models to new settings and to find more complex equations through a form of variable selection.


Papers on Induction of Rate-Based Process Modeling

Langley, P., & Arvay, A. (2019). Scientific discovery, process models, and the social sciences. In M. Addis, P. C. R. Lane, P. D. Sozou, & F. Gobet (Eds.), Scientific discovery in the social sciences. Heidelberg: Springer.

Langley, P. (2019). Scientific discovery, causal explanation, and process model induction. Mind & Society, 18, 43-56.

Langley, P., & Arvay, A. (2017). Flexible model induction through heuristic process discovery. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (pp. 4415-4421). San Francisco: AAAI Press.

Arvay, A., & Langley, P. (2016). Selective induction of rate-based process models. Proceedings of the Fourth Annual Conference on Cognitive Systems. Evanston, IL.

Arvay, A., & Langley, P. (2016). Heuristic adaptation of scientific process models. Advances in Cognitive Systems, 4, 207-226.

Arvay, A., & Langley, P. (2015). Heuristic adaptation of rate-based process models. Proceedings of the Third Annual Conference on Cognitive Systems. Atlanta, GA.

Langley, P., & Arvay, A. (2015). Heuristic induction of rate-based process models. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (pp. 537-543). Austin, TX: AAAI Press.


Earlier Papers on Inductive Process Modeling

Todorovski, L., Bridewell, W., & Langley, P. (2012). Discovering constraints for inductive process modeling. Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (pp. 256-262). Toronto: AAAI Press.

Park, C., Bridewell, W., & Langley, P. (2010). Integrated systems for inducing spatio-temporal process models. Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (pp. 1555-1560). Atlanta: AAAI Press.

Bridewell, W., & Langley, P. (2010). Two kinds of knowledge in scientific discovery. Topics in Cognitive Science, 2, 36-52.

Bridewell, W., Borrett, S. R., & Langley, P. (2009). Supporting the construction of dynamic scientific models. In A. Markman & & K. L. Wood (Eds.), Tools for innovation. New York: Oxford University Press.

Langley, P., & Bridewell, W. (2008). Processes and constraints in explanatory scientific discovery. Proceedings of the Thirtieth Annual Meeting of the Cognitive Science Society. Washingon, D.C.

Bridewell, W., Langley, P., Todorovski, L., & Dzeroski, S. (2008). Inductive process modeling. Machine Learning, 71, 1-32.

Borrett, S. R., Bridewell, W., Langley, P., & Arrigo, K. R. (2007). A method for representing and developing process models. Ecological Complexity, 4, 1-12.

Bridewell, W., Langley P., Racunas, S., & Borrett, S. R. (2006). Learning process models with missing data. Proceedings of the Seventeenth European Conference on Machine Learning (pp. 557-565). Berlin: Springer.

Bridewell, W., Sanchez, J. N., Langley, P., & Billman, D. (2006). An interactive environment for the modeling and discovery of scientific knowledge. International Journal of Human-Computer Studies, 64, 1099-1114.

Langley, P., Shiran, O., Shrager, J., Todorovski, L., & Pohorille, A. (2006). Constructing explanatory process models from biological data and knowledge. AI in Medicine, 37, 191-201.

Asgharbeygi, N., Bay, S., Langley, P., & Arrigo, K. (2006). Inductive revision of quantitative process models. Ecological Modelling, 194, 70-79.

Bridewell, W., Bani Asadi, N., Langley, P., & Todorovski, L. (2005). Reducing overfitting in process model induction. Proceedings of the Twenty-Second International Conference on Machine Learning (pp. 81-88). Bonn, Germany.

Todorovski, L., Bridewell, W., Shiran, O., & Langley, P. (2005). Inducing hierarchical process models in dynamic domains. Proceedings of the Twentieth National Conference on Artificial Intelligence (pp. 892-897). Pittsburgh, PA: AAAI Press.

Asgharbeygi, N., Bay, S., Langley, P., & Arrigo, K. (2004). Computational revision of ecological process models. Proceedings of the Fourth International Workshop on Environmental Applications of Machine Learning (pp. 13-14). Bled, Slovenia.

Langley, P., Shrager, J., Asgharbeygi, N., Bay, S., & Pohorille, A. (2004). Inducing explanatory process models from biological time series Proceedings of the Ninth Workshop on Intelligent Data Analysis and Data Mining (pp. 85-90). Stanford, CA.

George, D., Saito, K., Langley, P., Bay, S., & Arrigo, K. (2003). Discovering ecosystem models from time-series data. Proceedings of the Sixth International Conference on Discovery Science (pp. 141-152). Saporro, Japan: Springer.

Sanchez, J. N., & Langley, P. (2003). An interactive environment for scientific model construction. Proceedings of the Second International Conference on Knowledge Capture (pp. 138-145). Sanibel Island, FL: ACM Press.

Langley, P., George, D., Bay, S., & Saito, K. (2003). Robust induction of process models from time-series data. Proceedings of the Twentieth International Conference on Machine Learning (pp. 432-439).

Langley, P., Sanchez, J., Todorovski, L., & Dzeroski, S. (2002). Inducing process models from continuous data. Proceedings of the Nineteenth International Conference on Machine Learning (pp. 347-354). Sydney: Morgan Kaufmann.

Papers on Equation Discovery

Schwabacher, M., & Langley, P. (2007). Discovering communicable scientific knowledge from spatio-temporal data. In S. Dzeroski & L. Todorovski (Eds.), Computational discovery of communicable scientific knowledge. Berlin: Springer.

Saito, K., & Langley, P. (2007). Quantitative revision of scientific models. In S. Dzeroski & L. Todorovski (Eds.), Computational discovery of communicable scientific knowledge. Berlin: Springer.

Todorovski, L., Dzeroski, S., Langley, P., & Potter, C. (2003). Using equation discovery to revise an Earth ecosystem model of carbon net production. Ecological Modelling, 170, 141-154.

Bay, S. D., Shapiro, D. G., & Langley, P. (2002). Revising engineering models: Combining computational discovery with knowledge. Proceedings of the Thirteenth European Conference on Machine Learning (pp. 10-22). Helsinki, Finland.

Saito, K., Langley, P., Grenager, T., Potter, C., Torregrosa, A., & Klooster, S. A. (2001). Computational revision of quantitative scientific models. Proceedings of the Fourth International Conference on Discovery Science (pp. 336-349). Washington, D.C.: Springer.

Schwabacher, M., & Langley, P. (2001). Discovering communicable scientific knowledge from spatio-temporal data. Proceedings of the Eighteenth International Conference on Machine Learning (pp. 489-496). Williamstown, MA: Morgan Kaufmann.

Nordhausen, B., & Langley, P. (1990). A robust approach to numeric discovery. Proceedings of the Seventh International Conference on Machine Learning (pp. 411-418). Austin, TX: Morgan Kaufmann.

Langley, P., & Zytkow, J. M. (1989). Data-driven approaches to empirical discovery. Artificial Intelligence, 40, 283-312.

Langley, P., Bradshaw, G. L., & Simon, H. A. (1987). Heuristics for empirical discovery. In L. Bolc (Ed.), Computational models of learning. Berlin: Springer-Verlag.

Langley, P., Bradshaw, G. L., & Simon, H. A. (1983). Rediscovering chemistry with the Bacon system. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach. San Mateo, CA: Morgan Kaufmann.

Langley, P., Bradshaw, G. L., & Simon, H. A. (1982). Data-driven and expectation-driven discovery of empirical laws. Proceedings of the Fourth Biennial Conference of the Canadian Society for Computational Studies of Intelligence (pp. 137-143). Saskatoon, Saskatchewan.

Langley, P., Bradshaw, G. L., & Simon, H. A. (1981). Bacon.5: The discovery of conservation laws. Proceedings of the Seventh International Joint Conference on Artificial Intelligence (pp. 121-126). Vancouver, British Columbia: Morgan Kaufmann.

Simon, H. A., Langley, P., & Bradshaw, G. L. (1981). Scientific discovery as problem solving. Synthese, 47, 1-27.

Langley, P. (1981). Data-driven discovery of physical laws. Cognitive Science, 5, 31-54.

Bradshaw, G. L., Langley, P., & Simon, H. A. (1980). Bacon.4: The discovery of intrinsic properties. Proceedings of the Third Biennial Conference of the Canadian Society for Computational Studies of Intelligence (pp. 19-25). Victoria, British Columbia.

Langley, P. (1979). Rediscovering physics with Bacon.3. Proceedings of the Sixth International Joint Conference on Artificial Intelligence (pp. 505-507). Tokyo, Japan: Morgan Kaufmann.

Langley, P. (1979). A production system model for the induction of mathematical functions. Behavioral Science, 24, 121-139.

Langley, P. (1978). Bacon.1: A general discovery system. Proceedings of the Second Biennial Conference of the Canadian Society for Computational Studies of Intelligence (pp. 173-180). Toronto, Ontario.

Langley, P. (1977). Bacon: A production system that discovers empirical laws. Proceedings of the Fifth International Joint Conference on Artificial Intelligence (pp. 344). Cambridge, MA: Morgan Kaufmann.

Papers on Qualitative Discovery

Bay, S. D., Shrager, J., Pohorille, A., & Langley, P. (2003). Revising regulatory networks: From expression data to linear causal models. Journal of Biomedical Informatics, 35, 289-297.

Chrisman, L., Langley, P., & Bay, S. (2003). Incorporating biological knowledge into evaluation of causal regulatory hypotheses. Proceedings of the Pacific Symposium on Biocomputing (pp. 128-139). Lihue, Hawaii.

Saito, K., Bay, S., & Langley, P. (2002). Revising qualitative models of gene regulation (pp. 59-70). Proceedings of the Fifth International Conference on Discovery Science.

Shrager, J., Langley, P., & Pohorille, A. (2002). Guiding revision of regulatory models with expression data. Proceedings of the Pacific Symposium on Biocomputing (pp. 486-497). Lihue, Hawaii.

Kocabas, S., & Langley, P. (2000). Computer generation of process explanations in nuclear astrophysics. International Journal of Human-Computer Studies, 53, 377-392.

Kocabas, S., & Langley, P. (1998). Generating process explanations in nuclear astrophysics. Proceedings of the ECAI-98 Workshop on Machine Discovery (pp. 4-9). Brighton, UK.

Kocabas, S., & Langley, P. (1995). Integration of research tasks for modeling discoveries in particle physics. Proceedings of the AAAI Spring Symposium on Systematic Methods of Scientific Discovery (pp. 87-92). Stanford, CA: AAAI Press.

Rose, D. (1988). Using domain knowledge to aid scientific theory revision. Proceedings of the Fifth International Workshop on Machine Learning (pp. 272-277). Ithaca, NY: Morgan Kaufmann.

Rose, D., & Langley, P. (1988). A hill-climbing approach to machine discovery. Proceedings of the Fifth International Conference on Machine Learning (pp. 367-373). Ann Arbor, MI: Morgan Kaufmann.

Langley, P., & Jones, R. (1988). A computational model of scientific insight. In R. Sternberg (Ed.), The nature of creativity. Cambridge University Press.

Jones, R., & Langley, P. (1988). A theory of scientific problem solving. Proceedings of the Tenth Conference of the Cognitive Science Society (pp. 244-250). Montreal, Quebec: Lawrence Erlbaum.

Rose, D., & Langley, P. (1987). Belief revision and induction. Proceedings of the Ninth Conference of the Cognitive Science Society (pp. 748-752). Seattle, WA: Lawrence Erlbaum.

Zytkow, J. M., Langley, P., & Simon, H. A. (1987). Computer system of discovery Stahl. Studia Filozoficzne or Zagadnienia Naukoznawstwa, 23, 518-536.

Rose, D., & Langley, P. (1986). Chemical discovery as belief revision. Machine Learning, 1, 423-451.

Zytkow, J. M., & Simon, H. A. (1986). A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107-137.

Rose, D., & Langley, P. (1986). Stahlp: Belief revision in scientific discovery. Proceedings of the Fifth National Conference of the American Association for Artificial Intelligence} (pp. 528-532). Philadelphia, PA: Morgan Kaufmann.

Jones, R. (1986). Generating predictions to aid the scientific discovery process. Proceedings of the Fifth National Conference of the American Association for Artificial Intelligence} (pp. 513-517). Philadelphia, PA: Morgan Kaufmann.

Nordhausen, B. (1986). Conceptual clustering using relational information. Proceedings of the Fifth National Conference of the American Association for Artificial Intelligence} (pp. 508-512). Philadelphia, PA: Morgan Kaufmann.

Langley, P., Simon, H. A., Zytkow, J. M., & Fisher, D. H. (1985). Discovering qualitative empirical laws (Technical Report No. 85-18). Irvine: University of California, Department of Information & Computer Science.

Zytkow, J., Langley, P., & Simon, H. A. (1984). A model of early chemical reasoning. Proceedings of the Sixth Conference of the Cognitive Science Society (pp. 378-381). Boulder, CO: Lawrence Erlbaum.

Langley, P., Bradshaw, G. L., Zytkow, J., & Simon, H. A. (1983). Three facets of scientific discovery. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (pp. 465-468). Karlsruhe, West Germany: Morgan Kaufmann.

Papers on Integrated Discovery

Langley, P. (in press). Integrated systems for computational scientific discovery. Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence. Vancouver, BC: AAAI Press.

Kocabas, S., & Langley, P. (2001). An integrated framework for extended discovery in particle physics. Proceedings of the Fourth International Conference on Discovery Science (pp. 182-195). Washington, D.C.: Springer.

Nordhausen, B., & Langley, P. (1993). An integrated framework for empirical discovery. Machine Learning, 12, 17-47.

Nordhausen, B., & Langley, P. (1990). An integrated approach to empirical discovery. In J. Shrager & P. Langley (Eds.), Computational models of scientific discovery and theory formation. San Mateo, CA: Morgan Kaufmann.

Nordhausen, B., & Langley, P. (1987). Towards an integrated discovery system. Proceedings of the Tenth International Joint Conference on Artificial Intelligence (pp. 198-200). Milan, Italy: Morgan Kaufmann.

Langley, P., & Nordhausen, B. (1986). A framework for empirical discovery. Proceedings of the International Meeting on Advances in Learning. Les Arc, France.

Generic Publications on Scientific Discovery

Langley, P. (2021). Agents of exploration and discovery. AI Magazine, 42, 72-82.

Dzeroski, S., Langley, P., & Todorovski, L. (2007). Computational discovery of scientific knowledge. In S. Dzeroski & L. Todorovski (Eds.), Computational discovery of communicable scientific knowledge. Berlin: Springer.

Schwabacher, M., & Langley, P. (2007). Discovering communicable scientific knowledge from spatio-temporal data. In S. Dzeroski & L. Todorovski (Eds.), Computational discovery of communicable scientific knowledge. Berlin: Springer.

Langley, P. (2002). Lessons for the computational discovery of scientific knowledge. Proceedings of First International Workshop on Data Mining Lessons Learned (pp. 9-12). Sydney.

Langley, P., Shrager, J., & Saito, K. (2002). Computational discovery of communicable scientific knowledge. In L. Magnani, N. J. Nersessian, & C. Pizzi (Eds.), Logical and Computational Aspects of Model-Based Reasoning. Dordrecht: Kluwer Academic.

Dzeroski, S., & Langley, P. (2001). Computational discovery of communicable knowledge: Symposium report. Proceedings of the Fourth International Conference on Discovery Science (pp. 45-49). Washington, D.C.: Springer.

Langley, P., Magnani, L., Cheng, P. C.-H., Gordon, A., Kocabas, S., & Sleeman, D. H. (2001). Computational models of historical scientific discoveries. Proceedings of the Twenty-Third Annual Conference of the Cognitive Science Society (p. 3). Edinburgh: Lawrence Erlbaum.

Langley, P. (2000). The computational support of scientific discovery. International Journal of Human-Computer Studies, 53, 1149-1164.

Langley, P. (1999). The computer-aided discovery of scientific knowledge . Proceedings of the First International Conference on Discovery Science. Fukuoka, Japan: Springer.

Langley, P. (1995). Stages in the process of scientific discovery. Proceedings of the AAAI Spring Symposium on Systematic Methods for Scientific Discovery (p. 93). Stanford, CA: AAAI Press.

Shrager, J., & Langley, P. (Eds.) (1990). Computational models of scientific discovery and theory formation. San Mateo, CA: Morgan Kaufmann.

Shrager, J., & Langley, P. (1990). Computational approaches to scientific discovery. In J. Shrager & P. Langley (Eds.), Computational models of scientific discovery and theory formation. San Mateo, CA: Morgan Kaufmann.

Langley, P., Simon, H. A., Bradshaw, G. L., & Zytkow, J. M. (1987). Scientific discovery: Computational explorations of the creative processes. Cambridge, MA: MIT Press.

Langley, P., Zytkow, J., Simon, H. A., & Bradshaw, G. L. (1986). The search for regularity: Four aspects of scientific discovery. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (Vol. 2). San Mateo, CA: Morgan Kaufmann.

Bradshaw, G. L., Langley, P., & Simon, H. A. (1983). Studying scientific discovery by computer simulation. Science, 222, 971-975.

For more information, send electronic mail to langley@isle.org


© 1997 Institute for the Study of Learning and Expertise. All rights reserved.