MMDO (Multimodal Discourse Ontology)
This document provides a common-sense description and draft specification for ontologies which may be used as a common language for information exchange between agents in the "task discussion" domain of the CALO project, i.e. the meeting environment. The ontology is a central component to CSLI's
MoKb (Meeting Ontology Knowledge Base) system and in the current system serves as the principal interface to "meeting knowledge" for other components in CALO, such as Query Manager.
The ontologies defined here model the following domains of meeting environment knowledge:
- (human) agent communication (based on CLib Communication Model)
- physical media, multimodality, and sensors
- spatial objects, relations, and physical actions and gestures
- linguistic characterizations of communicative acts (both written, spoken, and sketched)
- discourse, dialogue, and communicative context
- discourse phases and discourse types
- argumentation and decision-making
- topics, hotspots, and other abstract features of discourse
The ontologies defined here
do not model many remaining elements of relevance to the meeting-understanding domain, particularly those having to do with the "subject matter" of the meeting discourse. We reserve the modelling of these application-specific aspects to other ontologies in CALO, which can be viewed at
http://www.cs.utexas.edu/users/mfkb/RKF/tree/specs/ontologies/) . These include (partial list):
- an ontology of projects, tasks, milestones, etc.
- an office ontology including computers, purchasing etc.
- an ontology of people and qualities of a person
It is, however, an intention that the ontologies presented here be merged to a certain extent back into a central CALO ontology, at the discretion of the CALO ontologists.
We define three separate ontologies in this document to cover the domains listed above. For each ontology, we define an umbrella concept which provides a frontier for the concepts being modelled. For each of the ontologies, we constrain ourselves to modelling the following. The filenames listed are the OWL ontologies which are distributed as part of CSLI's CALO components in the folder
calo/lib/ontology.
| Ontologies for MoKb |
| Ontology | filename | Umbrella Concept |
| Physical Awareness Ontology | undefined | concepts directly associated with the physical qualities of the discourse environment and their sensing |
| Multimodal Discourse Ontology | undefined | concepts directly associated with natural multimodal human discourse, independent of the topic or subject matter of that discourse |
All the ontologies have been designed with influence from several directions, including:
- the need to fit the model into the model provided by CLib and the CALO ontology
- the need to effectively model the types of information currently being generated by components in the task-discussion system
- the need to define important concepts that will be directly relevant to IET test questions
- the need to provide an abstract communicative framework for discourse understanding components, such that everyone will be able to find a convenient interface to their own more specific models of communicative and linguistic behaviors
- the need to have something abstract enough that it can persist through subsequent years of system improvements
The following sections attempt to spell out, using plain English and references to explicitly defined classes in the ontology, a conceptualization of the domains listed above, in a way which takes into account the influences and constraints listed above. We also encourage you to see the document
MokbOntologyNotes, which contains some preliminary examples and a general notepad of ideas for the ontology which have not yet been properly integrated into the model.
This document also contains many concrete examples of various kinds. In particular, we describe examples of phenomena which relate to specific interpretations of the general concepts in the ontology. We also list some IET questions and describe how they might be answered using the model. And for each of these, we attempt to provide relevant MOKB update and query statements.
- Descriptions which apply to expected phenomena in the meeting environment are listed in a red font.
- Descriptions which relate these concepts to general CALO-wide uses, particularly for answering of IET questions, are listed in a green font.
- Concrete query and update examples for the MoKb are listed in grey.
-
References to actual concept classes in the ontology are written in a bold monotype.
Each of the concepts are presented as a row in a table. In most of these listings, the concept class name is given, followed by a listing of the slots for the class. A description of how these elements interact and their meanings are given through plain English as much as possible. To the best of my ability, I have listed every single class and slot which might be used by a component in the current system.
It should be noted that the model is meant primarily as a starting point for further specification of critical details by other task-discussion research groups. The current generic model should be flexible enough to interface with these specific models of communicative or physical behavior while providing a convenient interface into the CLib CALO ontology. We encourage all the teams to get involved with the definition of these ontologies.
If you have any questions about this document or the ontology, or how it relates to
MoKb or CSLI's Dialogue Understanding system for CALO, email John Niekrasz at
niekrasz@csli.stanford.edu.
The CLib
The ontologies described here are intended to be modular additions to the CLib ontology. We therefore build our model around existing models and concepts in that ontology wherever possible. The majority of the discourse model is built on the Commincation Model component of CLib, while the physical awareness elements make use of spatial entity concepts in the CLib. Every component makes use of the highly generic classes in the CLib. We describe these connections to CLib below. We provide a brief English description of the concept, but we recommend that any user of these ontologies become familiar with this set of concepts first by going directly to the CLib and its documentation because there may be discrepancies with the intentions of the original authors.
Upper-level CLib Classes
| Generically Useful CLib Concept Classes |
Entity | .has-part | Something which is. Entities can be characterized as being made up of parts, which are themselves Entities. |
Event | .time-during, .subevent, .agent, .object, .recipient | Something which happens. Every event has a time interval during which it occurred. |
Role | .played-by, .in-event | Some role and Entity plays in an Event. |
Person | | A human. |
Action | | An Event that is performed by an agent. |
Activity | | A loosely constructed collection of Events. |
TimeInterval | .time-ends, .time-begins | An interval of time. This is the object which maintains temporal information about an event that has occurred. You need to create one of these objects for each event, unless there is a mechnism in place in the ontology which allows you to reason the time from some related event. |
The CLib Communication Model and related concepts (DIAGRAM)
In this section we describe the concepts present in the CLib ontology which relate to communication, and the Communication Model presented in the document
http://www.cs.utexas.edu/~kbarker/working_notes/gpd/gpd07280301-Communication.html. From the outline presented in that document, and through an investigation of surrounding concepts in the CLib, we provide our explication of the concepts in the model. We refer the reader to the
Communication Model diagram and propose the following highly general conceptual divisions for that diagram to assist interpretation:
- The bottom row consists of physically-relevant concepts
- The middle row consists of symbolic and linguistic representations
- The top row consists of cognitive- and domain- relevant concepts
- For a single communicative act between two individuals, the actions on the right side of the diagram can be conceived of as the attempt of the recipient to reconstruct the left side of the diagram.
- For a third-party discourse understanding system like CALO, we are attempting to construct a complete diagram for each agent-agent relation in play, though in practical terms this simply involves constructing the left side and the entities along the center column.
| Communication Model Event Classes |
Communicate | .base, .object, .recipient, .subevent, .agent | A single, unified act of discourse or communication between two agents. This is the essential building block from which models of discourse are built (see Communicative Act in the FIPA ACL). The communication is from the agent to the recipient. Subevents can be Express, Convey, and Interpret. The base is a Message and the object is an Information. |
Express | .agent, .object, .result | The component of a Communicate action which involves the agent formulating linguistic Messages to be conveyed. Note that a single communicative act may involve the Conveying of multiple Messages, such as the utterance of the word "that" while pointing to an object, in which case the Express action generates multiple Messages to be Conveyed. |
Convey | .agent, .object, .result | To deliver a single linguistic message through some physical means. This action encompasses the Embody, Transmit, and Sense actions. |
Interpret | .agent, .object, .result | To infer, deduce, or otherwise understand the Information being communicated in a linguistic Message. |
Embody | .agent, .object, .result | To encode a Message into a physical form, a Signal, to be Transmitted through some Medium. For example, to Speak, Write, or Gesticulate. |
Transmit | .agent, .object, .result | The physical action which takes place in order to Convey a Message via some Signal through some Medium. We are unsure if this is truly an action. |
Sense | .agent, .object, .result | To sense a Signal |
| Communication Model Entity Classes |
Information | | In the context of a Communicate action, this is a set of facts, objects, or questions which the action is "about" (the propositional content which is being communicated). The division between this information and general discourse-contextual information is not easily made, but currently we take this set to include the intended direct reference in the domain (i.e. after anaphora resolution etc.) but not further indirect inferences such as conversational implicatures. |
Message | .information-content, .base | This is a symbolic realization of a communication (see "Abstract" in GOLD). For example, a set of words in English, or perhaps a sequence of symbolic hand gestures in the language of gesture. We posit an abstract feature-structure representation for this called LinguisticUnit. |
Signal | | This is the measurable physical realization of the communicated information. |
| Communication Model Supporting Concepts |
base | No known common interpretation. Can often can be taken as the method by which an action is achieved - you Communicate some Information via its Message base, you Convey a Message via its Signal base, but this doesn't generalise to all classes. |
Medium | A Medium is a role played by a tangible entity, the base of the Transmit element of a communicative action. For example, the role Sound plays in a speaking action. |
Language | A set of linguistic constructions or symbols. |
Other Important Classes in CLib
| Spatial Entities and Physical Aspects |
SpatialEntity | An entity which has spatial existence and relations to other spatial entities. |
PhysicalObject | An entity which has a physical existence as an object. |
Meeting | .meeting-participant, .meeting-room | An Activity where people get together. |
Device | An inanimate object which has a maker, model, etc. |
Document | An entity, such as a gantt chart. |
Computer | A computer. |
Special Notes
It seems that we need a connection between a
Computer and the
Person who is using it. This allows the system to hypothesize that the
Person speaking into a
Microphone which is attached to the
Computer or in view of the
Cameras that are attached to the
FramDevice which is attached to the
Computer. This should also be linked up with the Meeting Recorded Client, which allows users to log in to the computer and announce their status not only as a
.meeting-participant in a
Meeting but as a
.user of the
Computer.
Input From Task Setup
The Physical Awareness Ontology
We now go beyond the CLib to define new concepts and classes. In this ontology, we focus on concepts which are relevant to the phyical aspects of the meeting environment and their sensing. Note that we at CSLI are only aware generally of the capabilities of certain PA devices, and not aware of an appropriate model for the spatial objects in the meeting space. We therefore leave much of this model undefined and in some cases simply provide some examples of situations which we feel would be useful for understanding the dialogue. We hope that a model can be made to account for these situations.
Spatial Objects and Relationships
Some spatial concepts are currently modelled in the CLib as part of
SpatialEntity, but it appears as though the PA teams will want a more detailed model of the relationships between physical entities, with something like a 6D account of object location and orientation. I have left this domain basically unspecified in the hopes that some of you will have a model that you wish to put forth.
Multimodality and Communicative Media
Modelling multimodal dialogue and physical sensing both require an explication of the physical concepts which allow us to distinguish between the various modes (modalities) of communication. To this end, we elaborate on some ideas surrounding the
Medium role concept in CLib. The basic point is to lay out the set of physical things which can play this role in communicative and perceptive acts. From this, we can classify these acts into more intuitive ones, grounded in our multimodal, multi-sensor context.
Instances of TangibleEntities which can play the Medium role |
Sound | Plays a Medium role in microphone- or audition-based communication (spoken discourse). |
Light | Plays the Medium role in camera- or vision-based communication (gestural discourse). |
Ink | Plays the Medium role in whiteboard-, document-, or two-dimensional communication (sketching). |
Sensors and Hardware Devices
For practical purposes, the phsyical things we care about are those which our sensors are able to sense. We therefore posit the notion of a
Sensor which is our direct link to the physical
Medium characterized above.
The meeting room environment contains many different devices which do sensing. Typically, the devices are
TangibleEntities which are actually a combination of multiple
Sensors into one physical structure. Here we define the specific classes of hardware sensing devices in the CALO meeting environment.
All of the above are also
Device entities, and therefore may have attributes such as make, model, version, etc. This may be helpful to signal processing algorithms for calibration of frequency response and light sensitivity when analysing recordings produced by various versions of the
Sensor components.
A
Sensor is also a piece of computer hardware and should be classed as such. In the CLib, this is the
Device or
ComputerComponent class. By virtue of being a
Device, a
Sensor is therefore a
SpatialEntity and therefore has spatial attributes like orientation and position which could then be used as calibration parameters, etc.
Recordings are the most essential information resource for doing understanding, since they are the output of our
Sensor devices. From these information resources, we are able to do analysis and further understanding of the communicative behaviors we may or may not have observed. To model this, we posit an
Action called
Record whose
agent must be a
Sensor. The
result of the
Recording action is a
Recording.
These are closely related to the notion of
Sense and
Signal in the Communication Model, but we are not making the connect yet. This is because the sensing actions being performed by the sensors are not participating in a communicative behavior. Rather, they are observing, and from this information we will later posit
Communicate acts and their subevents (if we happen to detect some communication going on).
Non-Communicative Physical Acts
For the time being, we simply posit the following classes of
Actions which enumerate some currently available examples from David's Integrator.
(DIAGRAM)
The following is a model which encapsulates the kind of knowledge being produced by Paul's activity tracker. This and the following
Actions are preformed by a
Person in or near their seat.
(DIAGRAM)
Sit | .agent, .object, .time-during | To be sitting. The .object is the seat being sat in. |
Stand | .agent, .time-during | To be standing. |
SitDown | .agent, .time-during | To transition from standing to sitting. |
StandUp | .agent, .time-during | To transitiong from sitting to standing. |
Fidget | .agent, .time-during | To fidget. |
Artifacts
The physical presence of documents is important to CALO. We interpret an "artifact" to be a
Document. We posit an action which states that the document is physically present in the meeting room. This allows us to answer IET questions pertaining to the "focus" of a discussion on a particular artifact.
Display | .object, .site | An Action performed on a Document which positions the document within view of the agents at the .site. The .agent is generally unknown. |
The Meeting Environment (DIAGRAM)
Now that we have all of the tools for describing sensors, recordings, physical relationships, and phsyical events, we can use these to construct a method for representing the state of the meeting room. This will be the "glue" which connects all of the various aspects of physical awareness together in one cohesive structure. We propose the following model and look to the PA team to come up with a final solution.
Currently, the model posits a
Meeting activity which is connected to a
Place which is the room and
.meeting-participants. We expect to receive this infomation from Task Setup, and take
.meeting-participants to mean the "intended" participants in the meeting. The Meeting Recorder posits a
Register action which is performed by a person when registering with the MR. Also, there is the basic
BeingInRoom action which is the type of attendance detected by the AV integrator. It is proposed that this action also serve to say whether the various computers and devices are in the room as well. All of this together helps to define the various high-level notions of the
Meeting activity and how
Persons participate in it.
Register | .agent, .object, .instrument, .time-during | An Action performed by a Person which states that they have officially registered into a Meeting (using the Meeting Recorder). |
The Multimodal Discourse Ontology
This section describes additions to the CLib & CALO ontology which elaborate the generic communication model to a model of multimodal communicative processes in human-human and human-computer dialogue. Its subsections proceed from more general concepts to more specific concepts (with each subsection building on previous ones, without depending on later ones).
Discourses
The major analytical unit in the discourse ontology is a
Discourse, which we define as an
Event with a collection of
Communicate events as
subevents which cohere into a unified communicative exchange of information between agents. An example is a conversation or coherent discussion, in which the
Communicate subevents are the contributions (verbal or otherwise) of its participants.
Discourses are not the same as
Meetings; rather, a single
Meeting will have one or more
Discourses as
subevents. The division between them may be due to: the switch from one major meeting topic or "phase" to another; large temporal gaps due to major disruptions; or simultaneous "side-conversations" on different topics with separate participants. There should be no "dialogical" connections between utterances in different
Discourses. There may be spatial, temporal, or referential connections, but no connections which would disrupt the interpretation of the communicative acts if broken.
Discourse Types and Discourse "Phases"
Discourses come in many forms including multi-party discussions, human-computer information-seeking, monologues, etc. In this section we create a small taxonomy of
Discourse types for use in CALO. To make such a taxonomy is very difficult due to the substantial role the physical and domain context play in determining the type of discourse. We consider this problem to nearly exceed the scope of the ontology outlined at the beginning of this document. Nevertheless, for practical purposes, we posit the following categories of
Discourse.
| Types of Discourse |
Discussion | A Discourse which involves substantial, interposed contribution of at least two participants in a "free-flowing" dialogue. |
Presentation | A Discourse in which the large majority of contributions are made by a single agent. Note that a Presentation need not "present" external resources like slides or documents. For example, a lecture without slides can be a Presentation. Rather we conceptualize the agent as "presenting" the information contained in the communication, not the external resources which may or may not be used. |
Briefing | A Discourse where multiple parties are providing information in summary form to some other party who is there to consume it. |
There are also various
Roles that discourse participants play in the context of different phases or discourse types. The following list describes these:
Threads and Communicative Acts
This section develops a model of the internal structure of a
Discourse. A
Discourse contains as
subevents a collection of
Communicate events (individual dialogue moves) which may be organised into one or more threads (or sub-topics).
Communicate events can be related to one another as
antecedents, and to the
Issue to which they are relevant; a thread is a collection of
antecedent -related
Communicate events which share an
Issue.
We therefore posit the addition of the following slots to the
Communicate class:
Communicate | .antecedent | A previous Communicate event, but not necessarily the immediately linearly preceding one. The antecedent event is the move which the new move is immediately discourse-relevant to: e.g. the query that is being answered, or a proposal that is being clarified. |
| | .info-state | The InformationState resulting from this Communicate event once its update effects have been taken into account. |
Information States
Our model of discourse and argumentative structure depends upon keeping track of the dialogue context: the issues and questions being discussed, the entities which are currently salient and can be referred to, and more generally the previous moves that have been made. The record of previous moves is available directly (the previous
Communicate events in the
Discourse, and more specifically via the chain of
antecedent slots). For other information, we use the
info-state attribute and the
InformationState class:
.salience-list
.qud
.iun
InformationState | | An Entity recording the discourse context resulting from a Communicate event. |
| | .salience-list | A list of entities or events which are salient. This provides context for anaphora resolution, etc. NOTE: IET talk about artifacts being "in focus" - if this means that discourse about the object of focus was or would have been felicitous at that time, this is where to answer their questions from. |
| | .qud | A list of Questions (see below) which are currently "under discussion", in order with the most salient or maximal first. These are short-term questions determined from the propositional content of individual dialogue moves, and used for ellipsis resolution and answerhood determination. |
| | .iun | A list of Issues which are currently "under negotation". These are the longer-term questions being discussed in this thread of the Discourse, and which can be considered as the "topic(s)" of the thread. |
For further explanation of IUNs/QUDs, see
Staffan Larsson's thesis on issue-based dialogue systems, and Jonathan Ginzburg's notion of Questions Under Discussion (QUD).
Communicative Act (Dialogue Move) Types
We can now use this machinery to move towards our principal goal in discourse understanding, which is to classify
Communicate events depending on their relationship to thread-previous and thread-subsequent
Communicate events in a discourse, and to draw relationships between these events and the objects to which they share reference. While we do not propose a nuanced model of discourse pragmatics to account for all of the facets of truly "appropriate" dialogue structure, we nevertheless posit a complete utterance-by-utterance classification.
Given the
antecedent relation and the
info-state attribute, we can specify:
- conditions on the type of
Communicate act which can be considered as an antecedent
- conditions on the
info-state values of the antecedent act
- transitions between the
antecedent info-state values and the new info-state values.
For example, if a
Communicate event introduces a new referent (perhaps an individual who is known, but has not been discussed before), the new
info-state.salience-list must add the new referent to the value of the old
antecedent.info-state.salience-list. Similarly, a
Communicate event which explicitly asks a question must add that question to the head of the
antecedent.info-state.qud list to form its own
info-state.qud list.
salience-list transitions will be due to individual gestures, referents and anaphors; we leave them aside for now.
qud transitions will be determined by the dialogue move type, which we outline next;
iun transitions by the broader rhetorical or argumentative structure, which is discussed in the next section.
The following table outlines some basic
Communicate types, together with some of their major properties expressed as conditions on antecedents and QUD transitions.
NOTE: these should not be taken as the final or complete definitions of these types, as more detailed properties and conditions will almost certainly apply - but they should be enough to give the idea.
Argumentation, Decision-Making, and Rhetorical Relations
This section extend the model to include notions of argumentation and decision-making.
Communicate events will not only be classifiable by their short-term roles and effects, as in the previous section, but also on their longer-term role in negotiating solutions to collectively agreed-upon conversational issues. This role is defined in terms of the event's relation to the
Issue under negotation, as expressed in the
info-state.iun attribute. Again, we refer the reader to Larsson (2000)'s
Issue-Based Information System (IBIS) model.
We model an
Issue as a type of
Question (see "Units of Meaning" below). Argumentative discussions (threads) will be series of related
Communicate events which share an
Issue in their
.info-state.iun value.
Issues thus correspond to specific sub-topics or 'minor' topics, rather than the broad overall 'major' topic which might cover a whole
Discourse. In the dialogues we expect to handle as part of CALO,
Issues are likely to be questions about
Task,
Milestone,
ActionItem, and
Project entities and their attributes, although this will not necessarily be the case in other domains. We do not model these here. Rather, we model the decision-making processes which lead to their specification.
Resolving an
Issue is equivalent to making a "decision". Resolving (and/or agreeing) an
Issue which is a question about a
Task or
ActionItem will be the model for assigning that
Task or
ActionItem, so it is via this argumentative model that we expect to answer IET test questions about such assignments.
| Subjects of Argumentation |
Issue | A Question (i.e. a Proposition with a piece of missing information to be provided, or whose truth or applicability is to be determined). More specific than a "topic", an Issue is the "content" of a argumentative discourse. |
Alternative | A whole or partial solution to the Issue Question. When an alternative is proposed by an agent, we take it to mean that the agent believes the Alternative "resolves" the Issue to some degree, either partially or completely. |
Argument | A piece of information which is meant to provide evidence either in favor or in opposition to an Alternative. In the real world, Alternatives and Arguments interact in the realm of human reasoning. In a computational model, the relationship depends on the IUN modelling strategy. |
This allows us to classify the actions taken by participants in the course of an argumentative discourse. These actions imply relationships between the
Arguments and
Alternatives to which they refer.
For simplicity's sake, we currently assume that these issue-relevant rhetorical-relations operate orthogonally to the question-under-discussion model:
Communicate acts will be classified both according to their utterance-level, QUD-relevant effects, and their thread/issue-level, IUN-relevant effects. The
Information of the
Communicate act will likewise be classified both according to its short-term QUD-relevant assertion/asking content, and its longer-term IUN-relevant argumentative content. As far as the ontological model goes, this seems a reasonable first approximation.
In reality, the two models will not be independent: certain low-level communicative classes will be used to perform certain argumentative roles. For example, the raising of an issue for discussion will often coincide with the introduction of a QUD through the utterance of a question. But this is not always the case, as in the utterance "We really need to figure out who's going to buy the computer." We hope to model their interdependence in future, and express it via constraints on the relevant classes.
Physical Acts of Communication
In performing an individual
Communicate action, agents
Embody their
Message into some physical
Signal, to be
Transmitted over some
Medium. Correspondingly, other agents may
Sense the
Signal, relating it to (their version of) the
Message. We can classify
Embody actions by the
Medium which they use and the corresponding
instrument used; this allows us to specify classes for some intuitive multimodal dialogical actions. We can classify the corresponding
Sense actions similarly, though these are not currently used.
A classification of Embody actions based on the Medium trasmission |
Speak | An Embody action of which the medium is Sound, with the added distinction that the agent is a Person. This is a vocalization, which may or may not be verbal, bounded by a pause or breath. In CALO, a Speak event would most likely be posited by a speech-recognition front-end, which would analyze an audio signal and determine that the user of the microphone spoke something (whether verbal or nonverbal). |
Gesticulate | An Embody action which uses the medium of Light, with the added distinction that the instrument is a PhysicalEntity (usually a body part, but not necessarily). This is a pointing, signing, or looking action. A body tracker may posit an instance of this class. |
Sketch | An Embody action which uses the medium of Ink. This would include the action of writing text or drawing figures. |
NOTE: In some cases, Sensors are simply turned on at the beginning of a meeting and then turned off at the end. Their continuous recording can be represented as a single Record action performed by the Sensor, whose outcome is a Recording. In the case of a far-field microphone, this will be a single (long) raw audio signal, and will not correspond directly to the Signal role in any single Communicate event. In other cases, like the whiteboard, recording events may not produce single long signals, but rather series of sensed strokes. In these cases, we will be able to make the assumption that the Recording does play the Signal role in a Communicate act being peformed by the person writing. This might also hold with close-talking microphones, where we can probably assume that Recordings will only be recorded when the Person is producing them as the Signal role part of a Communicate act. This may not be possible in a far-field or mic-array situation where we will have to calculate which parts of the (possibly multiple) signal(s) correspond to Communicate events.
Sense sub-classes
We now turn to characterizing the
Sense actions, again by
Medium, and also by the different
Sensors which can be
instruments. In each case, the
object of the
Sense event will be a
Recording, which can play the
Signal role; the
result will (eventually) be a
Message. For completeness, the
agent of these actions should be the software agent, or 'CALO' generally; however, we do not anticipate that these actions will be explicitly entered into MOKB at the moment -- rather, software agents will posit
Signals and their associated
Embody events, and the
Sense action will be left implicit.
Note that this framework also allows us to posit
Sense actions for meeting participants themselves (i.e. that actions could be posited with human
agents). However, we do not currently intend to model the sensing side of human-human communication explicitly. This does not exclude attempting to understand who is the intended addressee (
recipient) of a
Communicate action, only attempting to detect whether that addressee actually perceived it. This might be possible through observation of grounding indications such as eye gaze, head movements in the future.
In many cases,
Sensors record
Recordings which get saved in some external file for later access. However, we might also like to say that some sub-element of that
Recording embodies information which can play the
Signal role in a physical communicative act. For example, some subsegment of an
AudioRecording may contain the information which we consider to be the
Signal being transmitted as part of a
Speak action. To account for this, consumers of information pertaining to the
Signal role of communicative acts must reference the attributes of the
Record action which generated the
Recording as well as look at the timing information embodied in the
Transmit event to reason about where in the file to get the desired information.
The Linguistic Representation
Common sense models of natural language justify intermediate representations between the physical and the cognitive. In the communication model, this is the role of the middle level and especially the
Message concept: an entity which can play a part in associating meaning with a physical realization. In the following sections, we take several steps toward fleshing out this critical intermediate level. We derive some of our conceptualization from the
General Ontology for Linguistic Description (GOLD). We also refer the reader to the
SIL Linguistics Glossary.
As a first step, we posit a recursively-defined
LinguisticUnit, which is the building-block of
Messages and is a
Message itself. It may be useful to conceive of the
LinguisticUnit as one would conceive of a feature structure in HPSG. An instance of a
LinguisticUnit is an instance of a
Message or some subpart of that message. Units can be built into constructions through composition, generating the following classes of unit:
We would like to model the fact that
LinguisticUnits have the same base
Language as their parent
LinguisticUnit. MOKB does not do reasoning currently, therefore, we request that the
Language be specified as the
.base of the root
Message of the communicative act.
Languages
Different languages are realizable in certain physical forms. For example, many natural languages like
English are realizable both in
Sound through speaking and through
Ink through sketching. It is even possible for
English to be realized in sign. Other languages like the "language of human gesture" are only realizable through
Light and physical gesturing.
LinguisticUnits are expressed in particular
Languages, and this attribute of a
Language helps us to determine how the
LinguisticUnits "may" be realized.
Instances of Languages in CALO |
Language | realizable-in | |
English | | A natural language which is realizable-in Ink, Sound, and Light. |
Gesture | | The language of physical human gesture, realizable-in Light. |
Units of Meaning
Meaning, or semantic content, in
Messages is represented as the value of the
.information-content attribute. At the top level, the
.information-content of the overall
Message (or multiple
Messages in different
Languages when combining multiple modalities into a single
Communicate) will provide the
.information-content of the
Information being
Communicated by the overall act. At lower levels, subordinate
LinguisticConstituents will have their own
.information-content attribute. In general, this can be filled by any
Thing, but certain types of
LinguisticUnit will be restricted to have certain types of content. For example, proper names will require that their denotation be an
Entity (as might pointing gestures); verbs will denote
Events. At a higher level, assertions and queries will require particular types of information,
Propositions and
Questions, which we take to be the basic units of which the overall
Communicated Information will be made:
| Basic Units of Meaning |
Proposition | The proposition that some Event holds or takes place. A subclass of Information, its information-content must be an Event. It should not be confused with the Event itself, as its truth value may be true or false (i.e. the the Event may or may not hold or take place). Propositions will generally be the Information associated with Communicate acts which are assertions or proposals. |
Question | An Information which can be taken as an abstracted proposition, with an undefined truth value. Its .information-content attribute is filled by the Proposition under question, and the .params attribute is a set of Information objects which are the parameters being queried/abstracted from the proposition. For a yes/no question, this set will be empty. For a standard wh-question, it will have a single member corresponding to the wh-element; multiple wh-questions have larger sets. |
In theory, as the
.information-content of a parent
LinguisticConstruction will be a (compositional) function of the
.information-contents of its child
LinguisticConstituents, this composition might be performed by reasoning over the ontology and class properties. Give the currently restricted reasoning capabilities of MOKB, for now we assume that each level will have its
.information-content specified by some external agent (e.g. a parser).
Natural Language
In this section we characterize the types of
LinguisticUnits used to express
Messages in natural languages. The list is currently based on the units which are produced by functional parts of the system. Future development will include the integration of the
LinguisticUnits used in the Gemini natural language processing system, including clauses and other super-word-level constructions.
Embodiment of Natural Language
As described in the section outlining
LinguisticUnits and their
.base, we mentioned that units of natural languages are
realizable-in many modalities, which means that units of natural language like
Word and
Sentence can be physically realized in different manners. For example a word in
English like "travel" has both a spoken realization and a written one. In the following sections we outline the portion of the ontology which ties the abstract linguistic units of natural language described above, to the events and entities which make up its physical realization.
One particular modelling strategy to focus on in this section is the tripartite division between the
Message and its parts, the
Embody action and its subevents, and the
Signal and its parts. The conceptualization here which transcends modality is to say that the
Message and its parts contain everything about the message which is completely independent of its physical realization. The
Embody action and its subevents contain the information pertaining to actual embodying of the
Message in time. The
Signal and its parts contain the information pertaining to the embodiment which the
Embody action created. In speech, the embodiment in a
Signal is fleeting, and we need a recording device to "capture" this. But for ink, "capturing" of the
Signal is a natural aspect of the action because the ink records the action. Thus, the
Signal for an ink sketch may have a more intuitive feel as an
Entity than one might for a
Speak event.
Spoken Natural Language (DIAGRAM)
Much of the analysis here is driven by the capabilities and modelling strategy of the speech recognizer. Namely, we are interested in the segmentation of a
Speak event into segments which represent the utterance of
Sentences,
Words, and
Phones. The strategy for modelling the
Speak event is to perform a multi-layer segmentation of the entire
Speak event using recursion. We use the
subevent and
next-event slots to capture this structure. Note that for economy, the time information of the event should be given to the lowest level, which will allow timing information to be inferred for the higher-level events.
Speak | .time-during | For a Person to Embody a natural language LinguisticUnit into an AudioSignal |
AudioSignal | .pitch, .energy, .source | An Entity which is the realization of a Speak action. These entities might have some measurements attachedas attributes. Those listed are currently just stubs. It will also need to reference an information resource (probably an AudioRecording which actual contains the info). |
Phone | | A type of AudioSignal which is the basic temporal unit of analysis in spoken language. |
We also propose to subclass the
Phone class into a phone set based on the IPA. This may need to be changed depending on the set of phones used by Sphinx. With this model, each occurence of a
Phone as a part of an
AudioSignal would be actually be an instance of one of the phones in the set (see diagram).
Written Natural Language
This section should parallel the previous, but with an explanation of hand-written text.
Physical Gesture
In this section, we conceptualize the linguistic units that play a role in
Messages in the language of 3D gesture. Generally, I have left this section unspecified and have simply based the concepts on my knowledge of the abilities of the 3D gesture recognizer. In each case, we are looking at
LinguisticUnits that combine to form
Messages. These
Messages are then realized by a
Gesticulate version of an
Embody event.
Sketch, Graphical Objects, and Charts
Here we intend to provide a characterization of linguistic units for other modalities, like sketch and gesture. This may involve the integration of some aspects of the
chart ontology that is being written by Sanjeev and Jerry.
no concepts defined yet...
Topics, Hotspots, Focus, and Abstract and Statistical Features of Discourse
Some important concepts having to do with multi-party discourse are very difficult to model concretely. This section focuses on those elements of our analysis of discourse.
| Abstract Phenomena |
TopicDiscussion | An Event which occurs as a sub-event of a Discourse. It represents a time-slice of the dialogue which is generally about a single domain concept. We characterize Discourses as always having some major topic of discussion, with the optional occurrence of a minor topic within those segments. The minor topic would be another TopicDiscussion event which is the subevent of a major topic. TopicDiscussion events will be posited by agents such as CMU's CAMSeg and the CSLI offline analysis suite; they relate directly to argumentative threads as described above, but while threads will be determined from low-level analysis of individual utterances, TopicDiscussion segments will be determined from high-level statistical analysis. Eventually, the two sources of information will be combined to provide a more reliable discourse thread structure. |
Hotspot | An Event which occurs as a sub-event of a TopicDiscussion. Again these segments will be determined from higher-level statistical analysis (particularly prosodic information); they will eventually be used to help provide a more reliable argumentative structure by identifying areas in which important issues are being introduced, agreed or disagreed. |
Multi-agent communication, CALO as communicative agent
It should be noted that a single
Communicate act is between two agents--an agent and a recipient. To model multi-agent discourse, we therefore must either allow
Communicate actions to be tied to one-another through shared elements or sub-events, or (preferably) must change the model so that Communicate allows multiple recipients. We describe those mechanisms here.
To be continued...
To keep open the possibility of modelling CALO as a participant in the meeting, we conceive of CALO (and thus the dialogue interpretation system) as being an agent at the same level of communicative participation as the human participants.
To be continued...
Discrepancies with CLib, Questions, and Problems
- A base is a relation from an event to and entity, but a message has a base!
- The agent of a communicate act does not necessarily know the information being communicated. One might claim that they believe it though.
- Does not make sense to label information as being of only one type. Some utterances relate to the task at hand while also having a direct relationship to the discourse itself.
- We would argue that
Write is a subclass of Embody, not Express. I can "express" information into a orthographic language without actually writing it.
Diagrams
Here is a list of all the diagrams currently available. I have attempted to follow the following guidelines in diagram construction:
-
Events are in light blue
-
Entities are in green
-
Roles are in yellow
- literal xml-schema datatypes are in grey
For queries, we also wrap the assumed knowledge in a grey box, while providing the desired result in a magenta box.
Abstract Models:
Concrete Examples:
Examples, Examples, Examples
Examples have moved to
MokbOntologyUsageExamples.
IET Test Questions
IET question descriptions have moved to
IetQuestionsYearTwo?
--
JohnNiekrasz - 15 Feb 2005