Research in the Semlab
These are some of the lab's current general areas of interest, and specific
research themes that are being pursued. For more information on their specific
practical applications in the projects we are working on, see our
project pages.
Modelling Dialogue Context
The majority of our work fits under the general heading of
computational
pragmatics - analyzing and modelling the emerging context in a dialogue and
using it in automatic processing, either to improve our understanding of what
dialogue contributions mean, or to improve our generation of suitable automatic
contributions. For several years, the lab has worked in advancing the
information-state update (ISU) approach to dialogue processing (see our
publications) and using it to build practical
human-computer spoken dialogue systems. We are now investigating extensions of
this basic approach in many areas: in particular its application to
multi-device and
multi-party dialogue.
Multi-Device Dialogue: Situations with more than one dialogue-capable device -
such as systems which communicate with teams of humans & robots, or systems
which control an intelligent house or car, with multiple speech-controlled
devices (e.g. stereo, phone, navigation systems) - pose new challenges.
Firstly, the dialogue context is affected by the device(s) being addressed, with
otherwise identical inputs needing to be interpreted and reacted to in very
different ways; secondly, determining which device is being addressed is a hard
problem in the first place. Specific current efforts include:
- Applying an extended ISU approach in a multi-device, multi-threaded in-car dialogue system
- Allowing 'plug-&-play' devices to specify their own dialogue capabilities and contextual effects
- Using device & discourse context to disambiguate between multiple possible interpretation hypotheses (including intended device)
- Using device context to improve interpretation by adjusting expectations (for parsing and speech recognition)
Multi-Party Dialogue: When developing intelligent personal assistants or
systems for automatic meeting recording and reporting, computers must understand
human-human dialogue, often with more than two participants. Here, the discourse
processes and required models of context can differ significantly from standard
models designed for human-computer interaction. Determining who is being
addressed is no longer trivial; simple question-answer exchanges become more
complex; non-verbal communication becomes more common; and modelling how
arguments unfold, agreement or disagreement is reached and decisions are made
becomes important. Current topics here include:
- Developing an automatic meeting assistant to track, interpret and summarize multi-party dialogue
- Modelling discourse structure and context for multi-party dialogue, including modelling & processing argumentative and decision-making discourse
- Developing annotation techniques & tools for multi-modal discourse and its structure
- Developing ontologies to support multi-party discourse modelling and natural multi-modal interpretation
- Addressee detection using verbal, physical and speech-based features
- Dialogue act detection using machine-learning techniques
Interpretation & Generation for Dialogue
Another feature of these more complex types of dialogue is that the language
people use is more complex than in the simple human-computer domains that so
much dialogue research has concentrated on. For example, human-human dialogue is
usually freer, less formal & constrained, and open- or wide-domain, requiring
different parsing & interpretation techniques and more expressive
representations. Language generation also presents challenges in multiple device
scenarios, with different devices requiring different behaviours in different
contexts; but also opportunities, with system output providing a potential key
to more predictable & interpretable user input. We are:
- Developing robust open-domain parsing techniques for dialogue, using large lexical resources such as WordNet, FrameNet and VerbNet
- Recognizing decisions, agreement and action items in human-human meetings
- Using general and domain-specific ontologies to leverage and disambiguate interpretation
- Integrating multi-modal communication, including physical gestures, with linguistically-derived semantic representations
- Using context-based NL generation techniques to aid interpretation, via generation and ranking of expected possible inputs
- Using user modelling & user history to influence generation, allowing more natural system output, without unnecessary repetition of already-familiar information, and taking user preferences into account
Working with Uncertainty and Error
Working with spoken dialogue necessarily requires taking noise, error and
uncertainty into account - speech recognition is never perfect, and parsing
never unambiguous. Until recently, most ISU approaches (including our own) have
essentially tried to avoid this problem by trying to improve interpretation
accuracy, usually in a domain-specific way, then rejecting or clarifying
whenever confidence is too low - but as we start to deal with more complex
representations, broader domains, multiple devices and human-human dialogue this
becomes impractical or ineffective. We are now pursuing various techniques
(including probabilistic and statistical techniques) to give more effective ways
of reducing ambiguity and uncertainty where possible:
- Integrating machine-learning & probabilistic techniques into our ISU systems, to help identify:
- the most likely alternative speech recognition hypotheses or parser outputs from n-best lists or lattices
- the addressed device in a multi-device system, and the intended user move
- when a user is addressing an open-microphone system (rather than speaking to another person), removing the need for push-to-talk buttons
- Developing optimal (in context) clarification strategies to confirm hypotheses or fill in missing information when required
- Identifying, intepreting and responding to user (driver) clarification requests during in-car navigation or direction-giving dialogues
A more radical departure from the standard ISU architecture, though, is to work
with uncertainty, allowing ambiguity in the dialogue state itself by use of
probabilistic models of context. We are investigating the application of this
both to human-computer systems, where although an optimal strategy must be
chosen to generate system output at each stage, maintaining multiple hypotheses
can help error recovery when the most likely strategy turns out to be wrong or
problematic; and in systems processing human-human dialogue, where multiple
hypotheses can be maintained longer-term, allowing later discourse to be taken
into account to determine the overall most likely interpretation(s):
- Maintaining multiple hypotheses about user moves and dialogue context in multi-device dialogue
- Using Markov models (including unsupervised approaches) to identify likely changes in topic state in multi-party dialogue
- Probabilistic modelling of and maintenance of uncertainty for multi-party dialogue processes (dialogue move relations and decision-making structure)
Combining Multiple Information Sources
All of these are hard problems, compounded by often large levels of noise and
error, and rather than looking for one best method for each, we are increasingly
interested in treating them by combining multiple sources of information -
whether multiple raw data streams (e.g. gestures, writing and speech), multiple
processing agents (e.g. shallow and deep parsing), or multiple levels of
interpretation (e.g. topic segmentation and discourse structure; discourse
structure and semantics). We are currently working on:
- Developing ontologies and knowledge-bases using description logics, for integration and inference over the hypotheses produced by multiple independent agents
- Using physical gesture recognition to extend and disambiguate semantic parsing and contextual interpretation
- Combining lexical & semantic information with features of the discourse (verbal & non-verbal) to improve topic segmentation
- Using shallow topic segmentation & identification to constrain semantic issue-recognition and discourse structure modelling - and vice versa
- Using co-training between independent agents to improve their performance over time without supervision by user or system designer
More information about these themes and their practical application can be found
on our
project pages.