CSLI
Home
Contact Us
Projects
People
Links

Dialogue Systems:

Multi-Modal Conversational Interfaces for activity based dialogues

The main aim of the Conversational Interfaces project at CSLI is to build a general purpose dialogue system which supports multi-modal (i.e. speech and gesture) activity-oriented dialogues with devices, applications, or services. An "activity" is a task or collection of tasks that a device can carry out (for example, switch VCR on, record a TV show between 7 and 8 pm, switch off). In one application (the WITAS system), the device is an autonomous mobile robot helicopter, and it can carry out activities such as searching for a object, following a vehicle, flying to a location, and taking off and landing. In another system (the MURI project) the device is an intelligent tutoring system. Our Conversational Interfaces architecture allows each device's designers to supply an activity model for their device, which is compiled as part of the dialogue manager. The dialogue moves used in the interfaces are domain-general, and thus re-usable across different devices. Each device's dialogue manager manipulates an Activity Tree (similar to a HTN), describing the device's current and planned activities, which is built and modified through dialogue with a human operator. Human-Computer dialogue is then used to monitor the progress of activities, and to modify and construct new tasks for the device.

Documentation on system installation and our OAA components can be found here

We collaborate with the RIALIST Group at NASA .

Video clips

These are video clips of a recent demonstration system (2001). The user and robot must use dialogue to collaborate in the joint activities of finding and tracking vehicles. Speech recognition is speaker independent: Start of a mission , Mid-mission .

Dialogue Information State Logs

An example of HTML logs of information state updates generated during dialogue with the system (click on "Next" to see the next information state).

Upcoming/Recent events


General Information

Our systems use a common software base consisting of the the Open Agent Architecture, Nuance speech recogniser, Gemini (SRI's Natural Language parser and generator), and speech synthesis using Festival.

Our systems are able to handle "unscriptable" dialogues where there is no finite state transition network describing a conversation, and no clear end state for a conversation. This distinguishes them from dialogue systems in the "form-filling" paradigm (such as many travel-planning systems), in which a state transition network suffices to control dialogue flow. Our long-term research aim is to address specific theoretical questions through the development of the system. For instance:

  • What is the right level of abstraction at which to describe dialogue moves, and what structures best represent dialogue context?
  • How do dialogue contributions "update" the context of a conversation?
  • How can we build robust conversational interfaces?
  • What is an effective multi-modal communication act? How can they be generated?
  • How should the interface adapt to different states of the world, the dialog, the user, and the device?
  • What notion of dialogue context or "information state" is appropriate in multi-modal contexts?

The WITAS Unmanned Aerial Vehicle (UAV) , under development at Linköping University, Sweden, is an autonomous mobile helicopter with onboard AI, adjustable with respect to the operating environment and operator decisions. We have built a multi-modal communication interface for this robot, capable of complex dialogues about the UAV's tasks and state, and about situations as they unfold on the ground.

The interface supports complex dialogues between the operator and the UAV using natural conversational language. The multi-modal aspects of the interface derive from the ability to combine speech, text, graphics, gestures, live video, and sensor data in the same communication. Dialogues about multiple topics can be interleaved, in contrast to familiar ``form filling'' dialogue systems where an inflexible ordering of inputs and outputs is required.

Papers from this project

People:

Former personnel:

Links: