Cognitive Architectures for Physical Agents
While I was studying at CMU, I became infected with two ideas:
the AI community's dream of creating a complete intelligent agent
and Allen Newell's vision of a unified theory of the cognitive
architecture. Unortunately, most work on the latter topic then
focused on purely intellectual tasks and ignored issues of
perception and action in the physical world. My interests in
language acquisition convinced me that an agent's concepts
should be grounded in sensori-motor descriptions, but most
work in AI and cognitive science sidestepped this issue.
Discussions with Jaime Carbonell and others convinced me
that we needed to develop architectures for complete physical
agents, but the state of robotics did not seem ready to support
such work. Instead, we organized meetings to discuss the idea
of developing a simulated physical environment as a testbed for
our research. Although some participants, like Glenn Iba, argued
for discrete `gridworld' environments, others felt strongly that
we needed a continuous environment that was as much like the
physical world as possible.
This decision led to the World Modelers Project, which led to
a 1981 paper on the simulated environment and our aims for it.
We secured funding from the Army Research Institute to development
the simulation and develop an agent architecture. Greg Hood, Hans
Tallis, and Klaus Gross were involved in the project at CMU, while
John Gennari, Wayne Iba, Kevin Thompson, Patrick Young, and John
Allen joined after I moved to UCI in 1984. At Irvine, our work
led to designs for the Icarus architecture, and we implemented
components aimed to operate in the simulated world, but we never
constructed the complete system.
Our work on the World Modelers Project, and our goal of building
complete physical agents, seems to have been ahead of its time.
Now there is growing interest in physical agents that reside in
virtual environments and also considerable work on architectures
for robotics, which has advanced to the stage where complete agents
are possible. Of course, few people know about our early work along
these lines, but then AI has never had much sense of history.
In hindsight, it would have made sense to work in a more restricted
physical domain with well-defined but still challenging tasks. One
such domain is flight control, which I used to test a more recent
design of Icarus that I implemented between 1994 and 1996 under AFOSR
funding. Another involves driving behavior, which Dan Shapiro and I
used to test an even more version of the architecture that we designed
and implemented at DaimlerChrysler Research & Technology Center between
1998 and 2000.
The latest incarnation of Icarus encodes knowledge as reactive skills,
each of which specifies the goal-relevant reactions to a class of
situations. A skill consists of three elements stated in terms of
logical expressions: a set of objectives, a set of requirements or
preconditions, and a set of alternate means for accomplishing the
objective under those conditions. Each objective, requirement, or
means can refer to primitive actions/sensors or to other Icarus
skills, thus imposing a hierarchical organization on long-term
memory. Each skill also has an associated utility cast as a linear
function of sensory attributes.
The basic Icarus interpreter operates on a recognize-act cycle but,
unlike many architectures, focuses on reactive execution of existing
skills rather than on problem-space search. Given a top-level skill
to pursue, on each cycle the system first checks the objective field
for that skill. If the objectives are true, nothing further needs to
be done, but, if not, the interpreter examines the requirements to
determine if the preconditions for action are met. If not, Icarus
invokes a subskill associated with the failed requirement in an effort
to satisfy it; otherwise, it selects one of the alternate means and
calls on the primitive action or subskill associated with it. The
architecture selects the alternative with the highest expected utility
as predicted by the linear function associated with each skill.
The central learning method in Icarus involves estimating these
utility functions from delayed reward. On each cycle, the system
receives a numeric reward signal from the environment, which it
uses to update the expected values for recently executed skills.
In particular, the architecture uses an on-line, model-free version
of reinforcement learning that propagates value backwards over time.
However, because Icarus skills can refer to subskills, it extends
standard approaches to operate in a hierarchical manner, so that it
calculates value estimates over a stack of state-action pairs, rather
than a single pair. Over time, the system learns to select means that
lead to higher reward signals, and its utilization of hierarchical
skills to modulate this process makes learning much more rapid than
in traditional reinforcement learning methods.
We are currently extending the Icarus architecture to support
additional capabilities, including categorization, planning, and
monitoring, along with representations, performance elements, and
learning mechanisms to support such behaviors. We are also looking
at new physical domains in which to evaluate the framework.
Related Publications
-
Langley, P., Laird, J. E., & Rogers, S. (2006).
Cognitive architectures: Research issues and challenges
(Technical Report). Computational Learning Laboratory, CSLI,
Stanford University, CA.
-
Langley, P., & Choi, D. (2006).
A unified cognitive architecture for physical agents.
Proceedings of the Twenty-First National Conference on Artificial
Intelligence.
Boston: AAAI Press.
-
Nejati, N., Langley, P., & Konik, T. (2006).
Learning hierarchical task networks by observation.
Proceedings of the Twenty-Third International Conference on
Machine Learning (pp. 665-672).
Pittsburgh, PA.
-
Asgharbeygi, N., Stracuzzi, D., & Langley, P. (2006).
Relational temporal difference learning.
Proceedings of the Twenty-Third International Conference on
Machine Learning (pp. 49-56).
Pittsburgh, PA.
-
Langley, P. (2006).
Cognitive architectures and general intelligent systems.
AI Magazine, 27, 33-44.
-
Langley, P., & Choi, D. (2006).
Learning recursive control programs from problem solving.
Journal of Machine Learning Research, 7, 493-518
-
Langley, P. (2005).
An adaptive architecture for physical agents.
Proceedings of the 2005 IEEE/WIC/ACM International Conference on
Intelligent Agent Technology (pp. 18-25).
Compiegne, France: IEEE Computer Society Press.
-
Choi, D., & Langley, P. (2005).
Learning teleoreactive logic programs from problem solving.
Proceedings of the Fifteenth International Conference on
Inductive Logic Programming
(pp. 51-68). Bonn, Germany: Springer.
-
Asgharbeygi, N., Nejati, N., Langley, P., & Arai, S. (2005).
Guiding inference through relational reinforcement learning.
Proceedings of the Fifteenth International Conference on
Inductive Logic Programming
(pp. 20-37). Bonn, Germany: Springer.
-
Langley, P., & Rogers, S. (2005).
An extended theory of human problem solving.
Proceedings of the Twenty-Seventh Annual Meeting of the Cognitive
Science Society. Stresa, Italy.
-
Langley, P., Choi, D., & Rogers, S. (2005).
Interleaving learning, problem solving, and execution in the
Icarus architecture
(Technical Report). Computational Learning Laboratory, CSLI, Stanford
University, CA.
-
Langley, P., & Rogers, S. (2004).
Cumulative learning of hierarchical skills.
Proceedings of the Third International Conference on Development
and Learning. San Diego, CA: IEEE Press.
-
Langley, P. (2004).
Cognitive architectures and the construction of intelligent agents.
Proceedings of the AAAI-2004 Workshop on Intelligent Agent
Architectures (pp. 82). Stanford, CA.
-
Langley, P., Arai, S., & Shapiro, D. (2004).
Model-based learning with hierarchical relational skills.
Proceedings of the ICML-2004 Workshop on Relational Reinforcement
Learning. Banff, Alberta.
-
Langley, P., Cummings, K., & Shapiro, D. (2004).
Hierarchical skills and cognitive architectures.
Proceedings of the Twenty-Sixth Annual Conference of the Cognitive
Science Society (pp. 779-784). Chicago, IL.
-
Choi, D., Kaufman, M., Langley, P., Nejati, N., & Shapiro, D. (2004).
An architecture for persistent reactive behavior.
Proceedings of the Third International Joint Conference on
Autonomous Agents and Multi Agent Systems (pp. 988-995).
New York: ACM Press.
-
Langley, P., Choi, D., & Shapiro, D. (2004).
A cognitive architecture for physical agents
(Technical Report). Institute for the Study of Learning and Expertise,
Palo Alto, CA.
-
Ichise, R., Shapiro, D., & Langley, P. (in press). Structured program
induction from behavioral traces. IEICE Transactions on Information
and Systems (in Japanese).
-
Langley, P., Shapiro, D., Aycinena, M., & Siliski, M. (2003).
A value-driven architecture for intelligent behavior.
Proceedings of the IJCAI-2003 Workshop on Cognitive Modeling of
Agents and Multi-Agent Interactions (pp. 10-18). Acapulco, Mexico.
-
Langley, P., & Laird, J. E. (2002).
Cognitive architectures: Research issues and challenges
(Technical Report). Institute for the Study of Learning and Expertise,
Palo Alto, CA.
-
Ichise, R., Shapiro, D. G., & Langley, P. (2002).
Learning hierarchical skills from observation (pp. 247-258).
Proceedings of the Fifth International Conference on Discovery
Science.
-
Shapiro, D., & Langley, P. (2002).
Separating skills from preference: Using learning to program by reward.
Proceedings of the Nineteenth International Conference on Machine
Learning (pp. 570-577). Sydney: Morgan Kaufmann.
-
Shapiro, D., Langley, P., & Shachter, R. (2001).
Using background knowledge to speed reinforcement learning in physical
agents.
Proceedings of the Fifth International Conference on Autonomous
Agents (pp. 254-261). Montreal: ACM Press.
-
Shapiro, D., & Langley, P. (1999).
Controlling physical agents through reactive logic programming.
Proceedings of the Third International Conference on Autonomous
Agents (pp. 386-387). Seattle: ACM Press.
-
Langley, P. (1997).
Learning to sense selectively in physical domains.
Proceedings of the First International Conference on Autonomous
Agents (pp. 217-226). Marina del Rey, CA: ACM Press.
-
Langley, P., Iba, W., & Shrager, J. (1994).
Reactive and automatic behavior in plan execution.
Proceedings of the Second International Conference on AI Planning
Systems (pp. 299-304). Chicago: AAAI Press.
-
Langley, P., McKusick, K. B., Allen, J. A., Iba, W. F., &
Thompson, K. (1991).
A design for the Icarus architecture.
SIGART Bulletin, 2, 104-109.
-
Langley, P., Thompson, K., Iba, W. F., Gennari, J., &
Allen, J. A. (1989).
An integrated cognitive architecture for autonomous agents
(Technical Report 89-28). Irvine: University of California,
Department of Information & Computer Science.
-
Iba, W., & Langley, P. (1987).
A computational theory of motor learning.
Computational Intelligence, 3, 338-350.
-
Langley, P., Nicholas, D., Klahr, D., & Hood, G. (1981).
A simulated world for modeling learning and development. Proceedings
of the Third Conference of the Cognitive Science Society (pp. 274-276).
Berkeley, CA.
For more information, send electronic mail to
langley@isle.org