r11 - 13 Jul 2005 - 20:01:21 - JeffRussellYou are here:  Public Web > SemlabResearch

Research in the Semlab

These are some of the lab's current general areas of interest, and specific research themes that are being pursued. For more information on their specific practical applications in the projects we are working on, see our project pages.

Modelling Dialogue Context

The majority of our work fits under the general heading of computational pragmatics - analyzing and modelling the emerging context in a dialogue and using it in automatic processing, either to improve our understanding of what dialogue contributions mean, or to improve our generation of suitable automatic contributions. For several years, the lab has worked in advancing the information-state update (ISU) approach to dialogue processing (see our publications) and using it to build practical human-computer spoken dialogue systems. We are now investigating extensions of this basic approach in many areas: in particular its application to multi-device and multi-party dialogue.

Multi-Device Dialogue: Situations with more than one dialogue-capable device - such as systems which communicate with teams of humans & robots, or systems which control an intelligent house or car, with multiple speech-controlled devices (e.g. stereo, phone, navigation systems) - pose new challenges. Firstly, the dialogue context is affected by the device(s) being addressed, with otherwise identical inputs needing to be interpreted and reacted to in very different ways; secondly, determining which device is being addressed is a hard problem in the first place. Specific current efforts include:

  • Applying an extended ISU approach in a multi-device, multi-threaded in-car dialogue system
  • Allowing 'plug-&-play' devices to specify their own dialogue capabilities and contextual effects
  • Using device & discourse context to disambiguate between multiple possible interpretation hypotheses (including intended device)
  • Using device context to improve interpretation by adjusting expectations (for parsing and speech recognition)

Multi-Party Dialogue: When developing intelligent personal assistants or systems for automatic meeting recording and reporting, computers must understand human-human dialogue, often with more than two participants. Here, the discourse processes and required models of context can differ significantly from standard models designed for human-computer interaction. Determining who is being addressed is no longer trivial; simple question-answer exchanges become more complex; non-verbal communication becomes more common; and modelling how arguments unfold, agreement or disagreement is reached and decisions are made becomes important. Current topics here include:

  • Developing an automatic meeting assistant to track, interpret and summarize multi-party dialogue
  • Modelling discourse structure and context for multi-party dialogue, including modelling & processing argumentative and decision-making discourse
  • Developing annotation techniques & tools for multi-modal discourse and its structure
  • Developing ontologies to support multi-party discourse modelling and natural multi-modal interpretation
  • Addressee detection using verbal, physical and speech-based features
  • Dialogue act detection using machine-learning techniques

Interpretation & Generation for Dialogue

Another feature of these more complex types of dialogue is that the language people use is more complex than in the simple human-computer domains that so much dialogue research has concentrated on. For example, human-human dialogue is usually freer, less formal & constrained, and open- or wide-domain, requiring different parsing & interpretation techniques and more expressive representations. Language generation also presents challenges in multiple device scenarios, with different devices requiring different behaviours in different contexts; but also opportunities, with system output providing a potential key to more predictable & interpretable user input. We are:

  • Developing robust open-domain parsing techniques for dialogue, using large lexical resources such as WordNet, FrameNet and VerbNet
  • Recognizing decisions, agreement and action items in human-human meetings
  • Using general and domain-specific ontologies to leverage and disambiguate interpretation
  • Integrating multi-modal communication, including physical gestures, with linguistically-derived semantic representations
  • Using context-based NL generation techniques to aid interpretation, via generation and ranking of expected possible inputs
  • Using user modelling & user history to influence generation, allowing more natural system output, without unnecessary repetition of already-familiar information, and taking user preferences into account

Working with Uncertainty and Error

Working with spoken dialogue necessarily requires taking noise, error and uncertainty into account - speech recognition is never perfect, and parsing never unambiguous. Until recently, most ISU approaches (including our own) have essentially tried to avoid this problem by trying to improve interpretation accuracy, usually in a domain-specific way, then rejecting or clarifying whenever confidence is too low - but as we start to deal with more complex representations, broader domains, multiple devices and human-human dialogue this becomes impractical or ineffective. We are now pursuing various techniques (including probabilistic and statistical techniques) to give more effective ways of reducing ambiguity and uncertainty where possible:

  • Integrating machine-learning & probabilistic techniques into our ISU systems, to help identify:
    • the most likely alternative speech recognition hypotheses or parser outputs from n-best lists or lattices
    • the addressed device in a multi-device system, and the intended user move
    • when a user is addressing an open-microphone system (rather than speaking to another person), removing the need for push-to-talk buttons
  • Developing optimal (in context) clarification strategies to confirm hypotheses or fill in missing information when required
  • Identifying, intepreting and responding to user (driver) clarification requests during in-car navigation or direction-giving dialogues

A more radical departure from the standard ISU architecture, though, is to work with uncertainty, allowing ambiguity in the dialogue state itself by use of probabilistic models of context. We are investigating the application of this both to human-computer systems, where although an optimal strategy must be chosen to generate system output at each stage, maintaining multiple hypotheses can help error recovery when the most likely strategy turns out to be wrong or problematic; and in systems processing human-human dialogue, where multiple hypotheses can be maintained longer-term, allowing later discourse to be taken into account to determine the overall most likely interpretation(s):

  • Maintaining multiple hypotheses about user moves and dialogue context in multi-device dialogue
  • Using Markov models (including unsupervised approaches) to identify likely changes in topic state in multi-party dialogue
  • Probabilistic modelling of and maintenance of uncertainty for multi-party dialogue processes (dialogue move relations and decision-making structure)

Combining Multiple Information Sources

All of these are hard problems, compounded by often large levels of noise and error, and rather than looking for one best method for each, we are increasingly interested in treating them by combining multiple sources of information - whether multiple raw data streams (e.g. gestures, writing and speech), multiple processing agents (e.g. shallow and deep parsing), or multiple levels of interpretation (e.g. topic segmentation and discourse structure; discourse structure and semantics). We are currently working on:

  • Developing ontologies and knowledge-bases using description logics, for integration and inference over the hypotheses produced by multiple independent agents
  • Using physical gesture recognition to extend and disambiguate semantic parsing and contextual interpretation
  • Combining lexical & semantic information with features of the discourse (verbal & non-verbal) to improve topic segmentation
  • Using shallow topic segmentation & identification to constrain semantic issue-recognition and discourse structure modelling - and vice versa
  • Using co-training between independent agents to improve their performance over time without supervision by user or system designer

More information about these themes and their practical application can be found on our project pages.

Public.SemlabResearch moved from Semlab.ResearchAgenda on 13 Jul 2005 - 18:13 by MatthewPurver - put it back
 

Semlab Home      
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Semlab? Send feedback