Cognitive Architectures for Physical Agents

While I was studying at CMU, I became infected with two ideas: the AI community's dream of creating a complete intelligent agent and Allen Newell's vision of a unified theory of the cognitive architecture. Unortunately, most work on the latter topic then focused on purely intellectual tasks and ignored issues of perception and action in the physical world. My interests in language acquisition convinced me that an agent's concepts should be grounded in sensori-motor descriptions, but most work in AI and cognitive science sidestepped this issue.

Discussions with Jaime Carbonell and others convinced me that we needed to develop architectures for complete physical agents, but the state of robotics did not seem ready to support such work. Instead, we organized meetings to discuss the idea of developing a simulated physical environment as a testbed for our research. Although some participants, like Glenn Iba, argued for discrete `gridworld' environments, others felt strongly that we needed a continuous environment that was as much like the physical world as possible.

This decision led to the World Modelers Project, which led to a 1981 paper on the simulated environment and our aims for it. We secured funding from the Army Research Institute to development the simulation and develop an agent architecture. Greg Hood, Hans Tallis, and Klaus Gross were involved in the project at CMU, while John Gennari, Wayne Iba, Kevin Thompson, Patrick Young, and John Allen joined after I moved to UCI in 1984. At Irvine, our work led to designs for the Icarus architecture, and we implemented components aimed to operate in the simulated world, but we never constructed the complete system.

Our work on the World Modelers Project, and our goal of building complete physical agents, seems to have been ahead of its time. Now there is growing interest in physical agents that reside in virtual environments and also considerable work on architectures for robotics, which has advanced to the stage where complete agents are possible. Of course, few people know about our early work along these lines, but then AI has never had much sense of history. In hindsight, it would have made sense to work in a more restricted physical domain with well-defined but still challenging tasks. One such domain is flight control, which I used to test a more recent design of Icarus that I implemented between 1994 and 1996 under AFOSR funding. Another involves driving behavior, which Dan Shapiro and I used to test an even more version of the architecture that we designed and implemented at DaimlerChrysler Research & Technology Center between 1998 and 2000.

The latest incarnation of Icarus encodes knowledge as reactive skills, each of which specifies the goal-relevant reactions to a class of situations. A skill consists of three elements stated in terms of logical expressions: a set of objectives, a set of requirements or preconditions, and a set of alternate means for accomplishing the objective under those conditions. Each objective, requirement, or means can refer to primitive actions/sensors or to other Icarus skills, thus imposing a hierarchical organization on long-term memory. Each skill also has an associated utility cast as a linear function of sensory attributes.

The basic Icarus interpreter operates on a recognize-act cycle but, unlike many architectures, focuses on reactive execution of existing skills rather than on problem-space search. Given a top-level skill to pursue, on each cycle the system first checks the objective field for that skill. If the objectives are true, nothing further needs to be done, but, if not, the interpreter examines the requirements to determine if the preconditions for action are met. If not, Icarus invokes a subskill associated with the failed requirement in an effort to satisfy it; otherwise, it selects one of the alternate means and calls on the primitive action or subskill associated with it. The architecture selects the alternative with the highest expected utility as predicted by the linear function associated with each skill.

The central learning method in Icarus involves estimating these utility functions from delayed reward. On each cycle, the system receives a numeric reward signal from the environment, which it uses to update the expected values for recently executed skills. In particular, the architecture uses an on-line, model-free version of reinforcement learning that propagates value backwards over time. However, because Icarus skills can refer to subskills, it extends standard approaches to operate in a hierarchical manner, so that it calculates value estimates over a stack of state-action pairs, rather than a single pair. Over time, the system learns to select means that lead to higher reward signals, and its utilization of hierarchical skills to modulate this process makes learning much more rapid than in traditional reinforcement learning methods.

We are currently extending the Icarus architecture to support additional capabilities, including categorization, planning, and monitoring, along with representations, performance elements, and learning mechanisms to support such behaviors. We are also looking at new physical domains in which to evaluate the framework.


Related Publications

Langley, P., Laird, J. E., & Rogers, S. (2006). Cognitive architectures: Research issues and challenges (Technical Report). Computational Learning Laboratory, CSLI, Stanford University, CA.

Langley, P., & Choi, D. (2006). A unified cognitive architecture for physical agents. Proceedings of the Twenty-First National Conference on Artificial Intelligence. Boston: AAAI Press.

Nejati, N., Langley, P., & Konik, T. (2006). Learning hierarchical task networks by observation. Proceedings of the Twenty-Third International Conference on Machine Learning (pp. 665-672). Pittsburgh, PA.

Asgharbeygi, N., Stracuzzi, D., & Langley, P. (2006). Relational temporal difference learning. Proceedings of the Twenty-Third International Conference on Machine Learning (pp. 49-56). Pittsburgh, PA.

Langley, P. (2006). Cognitive architectures and general intelligent systems. AI Magazine, 27, 33-44.

Langley, P., & Choi, D. (2006). Learning recursive control programs from problem solving. Journal of Machine Learning Research, 7, 493-518

Langley, P. (2005). An adaptive architecture for physical agents. Proceedings of the 2005 IEEE/WIC/ACM International Conference on Intelligent Agent Technology (pp. 18-25). Compiegne, France: IEEE Computer Society Press.

Choi, D., & Langley, P. (2005). Learning teleoreactive logic programs from problem solving. Proceedings of the Fifteenth International Conference on Inductive Logic Programming (pp. 51-68). Bonn, Germany: Springer.

Asgharbeygi, N., Nejati, N., Langley, P., & Arai, S. (2005). Guiding inference through relational reinforcement learning. Proceedings of the Fifteenth International Conference on Inductive Logic Programming (pp. 20-37). Bonn, Germany: Springer.

Langley, P., & Rogers, S. (2005). An extended theory of human problem solving. Proceedings of the Twenty-Seventh Annual Meeting of the Cognitive Science Society. Stresa, Italy.

Langley, P., Choi, D., & Rogers, S. (2005). Interleaving learning, problem solving, and execution in the Icarus architecture (Technical Report). Computational Learning Laboratory, CSLI, Stanford University, CA.

Langley, P., & Rogers, S. (2004). Cumulative learning of hierarchical skills. Proceedings of the Third International Conference on Development and Learning. San Diego, CA: IEEE Press.

Langley, P. (2004). Cognitive architectures and the construction of intelligent agents. Proceedings of the AAAI-2004 Workshop on Intelligent Agent Architectures (pp. 82). Stanford, CA.

Langley, P., Arai, S., & Shapiro, D. (2004). Model-based learning with hierarchical relational skills. Proceedings of the ICML-2004 Workshop on Relational Reinforcement Learning. Banff, Alberta.

Langley, P., Cummings, K., & Shapiro, D. (2004). Hierarchical skills and cognitive architectures. Proceedings of the Twenty-Sixth Annual Conference of the Cognitive Science Society (pp. 779-784). Chicago, IL.

Choi, D., Kaufman, M., Langley, P., Nejati, N., & Shapiro, D. (2004). An architecture for persistent reactive behavior. Proceedings of the Third International Joint Conference on Autonomous Agents and Multi Agent Systems (pp. 988-995). New York: ACM Press.

Langley, P., Choi, D., & Shapiro, D. (2004). A cognitive architecture for physical agents (Technical Report). Institute for the Study of Learning and Expertise, Palo Alto, CA.

Ichise, R., Shapiro, D., & Langley, P. (in press). Structured program induction from behavioral traces. IEICE Transactions on Information and Systems (in Japanese).

Langley, P., Shapiro, D., Aycinena, M., & Siliski, M. (2003). A value-driven architecture for intelligent behavior. Proceedings of the IJCAI-2003 Workshop on Cognitive Modeling of Agents and Multi-Agent Interactions (pp. 10-18). Acapulco, Mexico.

Langley, P., & Laird, J. E. (2002). Cognitive architectures: Research issues and challenges (Technical Report). Institute for the Study of Learning and Expertise, Palo Alto, CA.

Ichise, R., Shapiro, D. G., & Langley, P. (2002). Learning hierarchical skills from observation (pp. 247-258). Proceedings of the Fifth International Conference on Discovery Science.

Shapiro, D., & Langley, P. (2002). Separating skills from preference: Using learning to program by reward. Proceedings of the Nineteenth International Conference on Machine Learning (pp. 570-577). Sydney: Morgan Kaufmann.

Shapiro, D., Langley, P., & Shachter, R. (2001). Using background knowledge to speed reinforcement learning in physical agents. Proceedings of the Fifth International Conference on Autonomous Agents (pp. 254-261). Montreal: ACM Press.

Shapiro, D., & Langley, P. (1999). Controlling physical agents through reactive logic programming. Proceedings of the Third International Conference on Autonomous Agents (pp. 386-387). Seattle: ACM Press.

Langley, P. (1997). Learning to sense selectively in physical domains. Proceedings of the First International Conference on Autonomous Agents (pp. 217-226). Marina del Rey, CA: ACM Press.

Langley, P., Iba, W., & Shrager, J. (1994). Reactive and automatic behavior in plan execution. Proceedings of the Second International Conference on AI Planning Systems (pp. 299-304). Chicago: AAAI Press.

Langley, P., McKusick, K. B., Allen, J. A., Iba, W. F., & Thompson, K. (1991). A design for the Icarus architecture. SIGART Bulletin, 2, 104-109.

Langley, P., Thompson, K., Iba, W. F., Gennari, J., & Allen, J. A. (1989). An integrated cognitive architecture for autonomous agents (Technical Report 89-28). Irvine: University of California, Department of Information & Computer Science.

Iba, W., & Langley, P. (1987). A computational theory of motor learning. Computational Intelligence, 3, 338-350.

Langley, P., Nicholas, D., Klahr, D., & Hood, G. (1981). A simulated world for modeling learning and development. Proceedings of the Third Conference of the Cognitive Science Society (pp. 274-276). Berkeley, CA.

For more information, send electronic mail to langley@isle.org


© 1997 Institute for the Study of Learning and Expertise. All rights reserved.