r4 - 16 Nov 2005 - 23:00:27 - JohnNiekraszYou are here:  Calo Web > MokbOntologies?  > MokbOntologyY2Specification

MMDO (Multimodal Discourse Ontology)

This document provides a common-sense description and draft specification for ontologies which may be used as a common language for information exchange between agents in the "task discussion" domain of the CALO project, i.e. the meeting environment. The ontology is a central component to CSLI's MoKb (Meeting Ontology Knowledge Base) system and in the current system serves as the principal interface to "meeting knowledge" for other components in CALO, such as Query Manager.

The ontologies defined here model the following domains of meeting environment knowledge:

  • (human) agent communication (based on CLib Communication Model)
  • physical media, multimodality, and sensors
  • spatial objects, relations, and physical actions and gestures
  • linguistic characterizations of communicative acts (both written, spoken, and sketched)
  • discourse, dialogue, and communicative context
  • discourse phases and discourse types
  • argumentation and decision-making
  • topics, hotspots, and other abstract features of discourse

The ontologies defined here do not model many remaining elements of relevance to the meeting-understanding domain, particularly those having to do with the "subject matter" of the meeting discourse. We reserve the modelling of these application-specific aspects to other ontologies in CALO, which can be viewed at http://www.cs.utexas.edu/users/mfkb/RKF/tree/specs/ontologies/) . These include (partial list):

  • an ontology of projects, tasks, milestones, etc.
  • an office ontology including computers, purchasing etc.
  • an ontology of people and qualities of a person

It is, however, an intention that the ontologies presented here be merged to a certain extent back into a central CALO ontology, at the discretion of the CALO ontologists.

We define three separate ontologies in this document to cover the domains listed above. For each ontology, we define an umbrella concept which provides a frontier for the concepts being modelled. For each of the ontologies, we constrain ourselves to modelling the following. The filenames listed are the OWL ontologies which are distributed as part of CSLI's CALO components in the folder calo/lib/ontology.

Ontologies for MoKb
Ontology filename Umbrella Concept
Physical Awareness Ontology undefined concepts directly associated with the physical qualities of the discourse environment and their sensing
Multimodal Discourse Ontology undefined concepts directly associated with natural multimodal human discourse, independent of the topic or subject matter of that discourse

All the ontologies have been designed with influence from several directions, including:

  • the need to fit the model into the model provided by CLib and the CALO ontology
  • the need to effectively model the types of information currently being generated by components in the task-discussion system
  • the need to define important concepts that will be directly relevant to IET test questions
  • the need to provide an abstract communicative framework for discourse understanding components, such that everyone will be able to find a convenient interface to their own more specific models of communicative and linguistic behaviors
  • the need to have something abstract enough that it can persist through subsequent years of system improvements

The following sections attempt to spell out, using plain English and references to explicitly defined classes in the ontology, a conceptualization of the domains listed above, in a way which takes into account the influences and constraints listed above. We also encourage you to see the document MokbOntologyNotes, which contains some preliminary examples and a general notepad of ideas for the ontology which have not yet been properly integrated into the model.

This document also contains many concrete examples of various kinds. In particular, we describe examples of phenomena which relate to specific interpretations of the general concepts in the ontology. We also list some IET questions and describe how they might be answered using the model. And for each of these, we attempt to provide relevant MOKB update and query statements.

  • Descriptions which apply to expected phenomena in the meeting environment are listed in a red font.
  • Descriptions which relate these concepts to general CALO-wide uses, particularly for answering of IET questions, are listed in a green font.
  • Concrete query and update examples for the MoKb are listed in grey.
  • References to actual concept classes in the ontology are written in a bold monotype.

Each of the concepts are presented as a row in a table. In most of these listings, the concept class name is given, followed by a listing of the slots for the class. A description of how these elements interact and their meanings are given through plain English as much as possible. To the best of my ability, I have listed every single class and slot which might be used by a component in the current system.

It should be noted that the model is meant primarily as a starting point for further specification of critical details by other task-discussion research groups. The current generic model should be flexible enough to interface with these specific models of communicative or physical behavior while providing a convenient interface into the CLib CALO ontology. We encourage all the teams to get involved with the definition of these ontologies.

If you have any questions about this document or the ontology, or how it relates to MoKb or CSLI's Dialogue Understanding system for CALO, email John Niekrasz at niekrasz@csli.stanford.edu.

The CLib

The ontologies described here are intended to be modular additions to the CLib ontology. We therefore build our model around existing models and concepts in that ontology wherever possible. The majority of the discourse model is built on the Commincation Model component of CLib, while the physical awareness elements make use of spatial entity concepts in the CLib. Every component makes use of the highly generic classes in the CLib. We describe these connections to CLib below. We provide a brief English description of the concept, but we recommend that any user of these ontologies become familiar with this set of concepts first by going directly to the CLib and its documentation because there may be discrepancies with the intentions of the original authors.

Upper-level CLib Classes

Generically Useful CLib Concept Classes
Entity .has-part Something which is. Entities can be characterized as being made up of parts, which are themselves Entities.
Event .time-during, .subevent, .agent, .object, .recipient Something which happens. Every event has a time interval during which it occurred.
Role .played-by, .in-event Some role and Entity plays in an Event.
Person   A human.
Action   An Event that is performed by an agent.
Activity   A loosely constructed collection of Events.
TimeInterval .time-ends, .time-begins An interval of time. This is the object which maintains temporal information about an event that has occurred. You need to create one of these objects for each event, unless there is a mechnism in place in the ontology which allows you to reason the time from some related event.

The CLib Communication Model and related concepts (DIAGRAM)

In this section we describe the concepts present in the CLib ontology which relate to communication, and the Communication Model presented in the document http://www.cs.utexas.edu/~kbarker/working_notes/gpd/gpd07280301-Communication.html. From the outline presented in that document, and through an investigation of surrounding concepts in the CLib, we provide our explication of the concepts in the model. We refer the reader to the Communication Model diagram and propose the following highly general conceptual divisions for that diagram to assist interpretation:

  • The bottom row consists of physically-relevant concepts
  • The middle row consists of symbolic and linguistic representations
  • The top row consists of cognitive- and domain- relevant concepts
  • For a single communicative act between two individuals, the actions on the right side of the diagram can be conceived of as the attempt of the recipient to reconstruct the left side of the diagram.
  • For a third-party discourse understanding system like CALO, we are attempting to construct a complete diagram for each agent-agent relation in play, though in practical terms this simply involves constructing the left side and the entities along the center column.

Communication Model Event Classes
Communicate .base, .object, .recipient, .subevent, .agent A single, unified act of discourse or communication between two agents. This is the essential building block from which models of discourse are built (see Communicative Act in the FIPA ACL). The communication is from the agent to the recipient. Subevents can be Express, Convey, and Interpret. The base is a Message and the object is an Information.
Express .agent, .object, .result The component of a Communicate action which involves the agent formulating linguistic Messages to be conveyed. Note that a single communicative act may involve the Conveying of multiple Messages, such as the utterance of the word "that" while pointing to an object, in which case the Express action generates multiple Messages to be Conveyed.
Convey .agent, .object, .result To deliver a single linguistic message through some physical means. This action encompasses the Embody, Transmit, and Sense actions.
Interpret .agent, .object, .result To infer, deduce, or otherwise understand the Information being communicated in a linguistic Message.
Embody .agent, .object, .result To encode a Message into a physical form, a Signal, to be Transmitted through some Medium. For example, to Speak, Write, or Gesticulate.
Transmit .agent, .object, .result The physical action which takes place in order to Convey a Message via some Signal through some Medium. We are unsure if this is truly an action.
Sense .agent, .object, .result To sense a Signal

Communication Model Entity Classes
Information   In the context of a Communicate action, this is a set of facts, objects, or questions which the action is "about" (the propositional content which is being communicated). The division between this information and general discourse-contextual information is not easily made, but currently we take this set to include the intended direct reference in the domain (i.e. after anaphora resolution etc.) but not further indirect inferences such as conversational implicatures.
Message .information-content, .base This is a symbolic realization of a communication (see "Abstract" in GOLD). For example, a set of words in English, or perhaps a sequence of symbolic hand gestures in the language of gesture. We posit an abstract feature-structure representation for this called LinguisticUnit.
Signal   This is the measurable physical realization of the communicated information.

Communication Model Supporting Concepts
base No known common interpretation. Can often can be taken as the method by which an action is achieved - you Communicate some Information via its Message base, you Convey a Message via its Signal base, but this doesn't generalise to all classes.
Medium A Medium is a role played by a tangible entity, the base of the Transmit element of a communicative action. For example, the role Sound plays in a speaking action.
Language A set of linguistic constructions or symbols.

Other Important Classes in CLib

Spatial Entities and Physical Aspects
SpatialEntity An entity which has spatial existence and relations to other spatial entities.
PhysicalObject An entity which has a physical existence as an object.
Meeting .meeting-participant, .meeting-room An Activity where people get together.
Device An inanimate object which has a maker, model, etc.
Document An entity, such as a gantt chart.
Computer A computer.

Special Notes

It seems that we need a connection between a Computer and the Person who is using it. This allows the system to hypothesize that the Person speaking into a Microphone which is attached to the Computer or in view of the Cameras that are attached to the FramDevice which is attached to the Computer. This should also be linked up with the Meeting Recorded Client, which allows users to log in to the computer and announce their status not only as a .meeting-participant in a Meeting but as a .user of the Computer.

Input From Task Setup

The Physical Awareness Ontology

We now go beyond the CLib to define new concepts and classes. In this ontology, we focus on concepts which are relevant to the phyical aspects of the meeting environment and their sensing. Note that we at CSLI are only aware generally of the capabilities of certain PA devices, and not aware of an appropriate model for the spatial objects in the meeting space. We therefore leave much of this model undefined and in some cases simply provide some examples of situations which we feel would be useful for understanding the dialogue. We hope that a model can be made to account for these situations.

Spatial Objects and Relationships

Some spatial concepts are currently modelled in the CLib as part of SpatialEntity, but it appears as though the PA teams will want a more detailed model of the relationships between physical entities, with something like a 6D account of object location and orientation. I have left this domain basically unspecified in the hopes that some of you will have a model that you wish to put forth.

Multimodality and Communicative Media

Modelling multimodal dialogue and physical sensing both require an explication of the physical concepts which allow us to distinguish between the various modes (modalities) of communication. To this end, we elaborate on some ideas surrounding the Medium role concept in CLib. The basic point is to lay out the set of physical things which can play this role in communicative and perceptive acts. From this, we can classify these acts into more intuitive ones, grounded in our multimodal, multi-sensor context.

Instances of TangibleEntities which can play the Medium role
Sound Plays a Medium role in microphone- or audition-based communication (spoken discourse).
Light Plays the Medium role in camera- or vision-based communication (gestural discourse).
Ink Plays the Medium role in whiteboard-, document-, or two-dimensional communication (sketching).

Sensors and Hardware Devices

For practical purposes, the phsyical things we care about are those which our sensors are able to sense. We therefore posit the notion of a Sensor which is our direct link to the physical Medium characterized above.

Types of Sensors which sense attribues of various media
Sensor A SpatialEntity which has the ability to directly measure signals delivered through some Medium.
Camera A SpatialEntity which has the ability to directly measure signals delivered through Light.
Microphone A SpatialEntity which has the ability to directly measure signals delivered through Sound.
Smartboard A SpatialEntity which has the ability to directly measure signals delivered through Ink.

The meeting room environment contains many different devices which do sensing. Typically, the devices are TangibleEntities which are actually a combination of multiple Sensors into one physical structure. Here we define the specific classes of hardware sensing devices in the CALO meeting environment.

Hardware Sensing Devices
FrameDevice A TangibleEntity with a spatial location and orientation, etc., which has multiple Camera and Microphone Sensors as .has-part.
CameoDevice Same as a FrameDevice, except it has four Camera sensors.
CloseTalkingMicrophone A type of Microphone which is worn by a Person and moves around with the Person.
FarFieldMicrophone A far-field microphone which typically stays in a single static location (though does not have to).

All of the above are also Device entities, and therefore may have attributes such as make, model, version, etc. This may be helpful to signal processing algorithms for calibration of frequency response and light sensitivity when analysing recordings produced by various versions of the Sensor components.

A Sensor is also a piece of computer hardware and should be classed as such. In the CLib, this is the Device or ComputerComponent class. By virtue of being a Device, a Sensor is therefore a SpatialEntity and therefore has spatial attributes like orientation and position which could then be used as calibration parameters, etc.

Recordings (DIAGRAM)

Recordings are the most essential information resource for doing understanding, since they are the output of our Sensor devices. From these information resources, we are able to do analysis and further understanding of the communicative behaviors we may or may not have observed. To model this, we posit an Action called Record whose agent must be a Sensor. The result of the Recording action is a Recording.

These are closely related to the notion of Sense and Signal in the Communication Model, but we are not making the connect yet. This is because the sensing actions being performed by the sensors are not participating in a communicative behavior. Rather, they are observing, and from this information we will later posit Communicate acts and their subevents (if we happen to detect some communication going on).

Actions performed by Sensors and information resources produced by them
Record .agent, .result, .time-during An Action performed by a Sensor which produces a Recording.
Recording .sample-rate, .tracked-entity, .tracked-property, .url A Document produced by a Sensor which contains information directly measuring some physical occurrence.
AudioRecording A type of Recording produced by a Microphone
VideoRecording A type of Recording produced by a Camera
InkRecording A type of Recording produced by a Smartboard

Non-Communicative Physical Acts

For the time being, we simply posit the following classes of Actions which enumerate some currently available examples from David's Integrator.

(DIAGRAM)

Spatial Events
BeingInFrontOf .agent, .object An Action performed by a SpatialEntity or Person which describes their being in front of something (a Frame).
BeingInRoom .agent, .object An Action which describes a Entity as being in the meeting room.
OrientedToward .agent, .object An Action which describes that an entity is oriented toward another entity.
Facing .agent, .object An OrientedToward which describes a Person whose head is oriented toward something (different from Look, which is a deictic gesture, see below).

The following is a model which encapsulates the kind of knowledge being produced by Paul's activity tracker. This and the following Actions are preformed by a Person in or near their seat.

(DIAGRAM)

Sit .agent, .object, .time-during To be sitting. The .object is the seat being sat in.
Stand .agent, .time-during To be standing.
SitDown .agent, .time-during To transition from standing to sitting.
StandUp .agent, .time-during To transitiong from sitting to standing.
Fidget .agent, .time-during To fidget.

Artifacts

The physical presence of documents is important to CALO. We interpret an "artifact" to be a Document. We posit an action which states that the document is physically present in the meeting room. This allows us to answer IET questions pertaining to the "focus" of a discussion on a particular artifact.

Display .object, .site An Action performed on a Document which positions the document within view of the agents at the .site. The .agent is generally unknown.

The Meeting Environment (DIAGRAM)

Now that we have all of the tools for describing sensors, recordings, physical relationships, and phsyical events, we can use these to construct a method for representing the state of the meeting room. This will be the "glue" which connects all of the various aspects of physical awareness together in one cohesive structure. We propose the following model and look to the PA team to come up with a final solution.

Currently, the model posits a Meeting activity which is connected to a Place which is the room and .meeting-participants. We expect to receive this infomation from Task Setup, and take .meeting-participants to mean the "intended" participants in the meeting. The Meeting Recorder posits a Register action which is performed by a person when registering with the MR. Also, there is the basic BeingInRoom action which is the type of attendance detected by the AV integrator. It is proposed that this action also serve to say whether the various computers and devices are in the room as well. All of this together helps to define the various high-level notions of the Meeting activity and how Persons participate in it.

Register .agent, .object, .instrument, .time-during An Action performed by a Person which states that they have officially registered into a Meeting (using the Meeting Recorder).

The Multimodal Discourse Ontology

This section describes additions to the CLib & CALO ontology which elaborate the generic communication model to a model of multimodal communicative processes in human-human and human-computer dialogue. Its subsections proceed from more general concepts to more specific concepts (with each subsection building on previous ones, without depending on later ones).

Discourses

The major analytical unit in the discourse ontology is a Discourse, which we define as an Event with a collection of Communicate events as subevents which cohere into a unified communicative exchange of information between agents. An example is a conversation or coherent discussion, in which the Communicate subevents are the contributions (verbal or otherwise) of its participants.

Discourses are not the same as Meetings; rather, a single Meeting will have one or more Discourses as subevents. The division between them may be due to: the switch from one major meeting topic or "phase" to another; large temporal gaps due to major disruptions; or simultaneous "side-conversations" on different topics with separate participants. There should be no "dialogical" connections between utterances in different Discourses. There may be spatial, temporal, or referential connections, but no connections which would disrupt the interpretation of the communicative acts if broken.

The Discourse concept
Discourse An Event with a collection of Communicate events as subevents which cohere into a unified communicative exchange of information between agents.

Discourse Types and Discourse "Phases"

Discourses come in many forms including multi-party discussions, human-computer information-seeking, monologues, etc. In this section we create a small taxonomy of Discourse types for use in CALO. To make such a taxonomy is very difficult due to the substantial role the physical and domain context play in determining the type of discourse. We consider this problem to nearly exceed the scope of the ontology outlined at the beginning of this document. Nevertheless, for practical purposes, we posit the following categories of Discourse.

Types of Discourse
Discussion A Discourse which involves substantial, interposed contribution of at least two participants in a "free-flowing" dialogue.
Presentation A Discourse in which the large majority of contributions are made by a single agent. Note that a Presentation need not "present" external resources like slides or documents. For example, a lecture without slides can be a Presentation. Rather we conceptualize the agent as "presenting" the information contained in the communication, not the external resources which may or may not be used.
Briefing A Discourse where multiple parties are providing information in summary form to some other party who is there to consume it.

There are also various Roles that discourse participants play in the context of different phases or discourse types. The following list describes these:

Some Roles played in Discourse Types
Observer A Role played by a Person in a Discourse which signifies that they are not participating actively but are present nonetheless.
Participator A Role played by a Person in a Discussion which signifies that they are participating actively.
Presenter A Role played by a Person in a Presentation which signifies that they are doing the presentation.
InformationProvider A Role played by a Person in a Briefing which signifies that they are prividing information.
InformationConsumer A Role played by a Person in a Briefing which signifies that they are not providing but consuming information.

Threads and Communicative Acts

This section develops a model of the internal structure of a Discourse. A Discourse contains as subevents a collection of Communicate events (individual dialogue moves) which may be organised into one or more threads (or sub-topics). Communicate events can be related to one another as antecedents, and to the Issue to which they are relevant; a thread is a collection of antecedent -related Communicate events which share an Issue.

We therefore posit the addition of the following slots to the Communicate class:

Communicate .antecedent A previous Communicate event, but not necessarily the immediately linearly preceding one. The antecedent event is the move which the new move is immediately discourse-relevant to: e.g. the query that is being answered, or a proposal that is being clarified.
  .info-state The InformationState resulting from this Communicate event once its update effects have been taken into account.

Information States

Our model of discourse and argumentative structure depends upon keeping track of the dialogue context: the issues and questions being discussed, the entities which are currently salient and can be referred to, and more generally the previous moves that have been made. The record of previous moves is available directly (the previous Communicate events in the Discourse, and more specifically via the chain of antecedent slots). For other information, we use the info-state attribute and the InformationState class:

.salience-list .qud .iun

InformationState   An Entity recording the discourse context resulting from a Communicate event.
  .salience-list A list of entities or events which are salient. This provides context for anaphora resolution, etc. NOTE: IET talk about artifacts being "in focus" - if this means that discourse about the object of focus was or would have been felicitous at that time, this is where to answer their questions from.
  .qud A list of Questions (see below) which are currently "under discussion", in order with the most salient or maximal first. These are short-term questions determined from the propositional content of individual dialogue moves, and used for ellipsis resolution and answerhood determination.
  .iun A list of Issues which are currently "under negotation". These are the longer-term questions being discussed in this thread of the Discourse, and which can be considered as the "topic(s)" of the thread.

For further explanation of IUNs/QUDs, see Staffan Larsson's thesis on issue-based dialogue systems, and Jonathan Ginzburg's notion of Questions Under Discussion (QUD).

Communicative Act (Dialogue Move) Types

We can now use this machinery to move towards our principal goal in discourse understanding, which is to classify Communicate events depending on their relationship to thread-previous and thread-subsequent Communicate events in a discourse, and to draw relationships between these events and the objects to which they share reference. While we do not propose a nuanced model of discourse pragmatics to account for all of the facets of truly "appropriate" dialogue structure, we nevertheless posit a complete utterance-by-utterance classification.

Given the antecedent relation and the info-state attribute, we can specify:

  • conditions on the type of Communicate act which can be considered as an antecedent
  • conditions on the info-state values of the antecedent act
  • transitions between the antecedent info-state values and the new info-state values.

For example, if a Communicate event introduces a new referent (perhaps an individual who is known, but has not been discussed before), the new info-state.salience-list must add the new referent to the value of the old antecedent.info-state.salience-list. Similarly, a Communicate event which explicitly asks a question must add that question to the head of the antecedent.info-state.qud list to form its own info-state.qud list.

salience-list transitions will be due to individual gestures, referents and anaphors; we leave them aside for now. qud transitions will be determined by the dialogue move type, which we outline next; iun transitions by the broader rhetorical or argumentative structure, which is discussed in the next section.

The following table outlines some basic Communicate types, together with some of their major properties expressed as conditions on antecedents and QUD transitions. NOTE: these should not be taken as the final or complete definitions of these types, as more detailed properties and conditions will almost certainly apply - but they should be enough to give the idea.

Communicative Action Events Attribute Value
Assert .information
.info-state.qud
some Proposition P
?P + .antecedent.info-state.qud
Ask .information
.info-state.qud
some Question Q
Q + .antecedent.info-state.qud
Answer .information
.antecedent
.antecedent.info-state.qud
.info-state.qud
some Proposition P
Ask
Q where P answers Q
.antecedent.info-state.qud - Q
Nonpropositional .information
.info-state.qud
e.g. a Greeting
.antecedent.info-state.qud
Grounding .antecedent.info-state.qud
.info-state.qud
Q = ?max-qud(Q')
.antecedent.info-state.qud - Q

Argumentation, Decision-Making, and Rhetorical Relations

This section extend the model to include notions of argumentation and decision-making. Communicate events will not only be classifiable by their short-term roles and effects, as in the previous section, but also on their longer-term role in negotiating solutions to collectively agreed-upon conversational issues. This role is defined in terms of the event's relation to the Issue under negotation, as expressed in the info-state.iun attribute. Again, we refer the reader to Larsson (2000)'s Issue-Based Information System (IBIS) model.

We model an Issue as a type of Question (see "Units of Meaning" below). Argumentative discussions (threads) will be series of related Communicate events which share an Issue in their .info-state.iun value. Issues thus correspond to specific sub-topics or 'minor' topics, rather than the broad overall 'major' topic which might cover a whole Discourse. In the dialogues we expect to handle as part of CALO, Issues are likely to be questions about Task, Milestone, ActionItem, and Project entities and their attributes, although this will not necessarily be the case in other domains. We do not model these here. Rather, we model the decision-making processes which lead to their specification.

Resolving an Issue is equivalent to making a "decision". Resolving (and/or agreeing) an Issue which is a question about a Task or ActionItem will be the model for assigning that Task or ActionItem, so it is via this argumentative model that we expect to answer IET test questions about such assignments.

Subjects of Argumentation
Issue A Question (i.e. a Proposition with a piece of missing information to be provided, or whose truth or applicability is to be determined). More specific than a "topic", an Issue is the "content" of a argumentative discourse.
Alternative A whole or partial solution to the Issue Question. When an alternative is proposed by an agent, we take it to mean that the agent believes the Alternative "resolves" the Issue to some degree, either partially or completely.
Argument A piece of information which is meant to provide evidence either in favor or in opposition to an Alternative. In the real world, Alternatives and Arguments interact in the realm of human reasoning. In a computational model, the relationship depends on the IUN modelling strategy.

This allows us to classify the actions taken by participants in the course of an argumentative discourse. These actions imply relationships between the Arguments and Alternatives to which they refer.

Argumentative Actions Details
Introduce Introduces a new Issue directly. As well as adding a new Question to .qud (as all standard moves do), also introduces the same Question to .iun
Propose Introduce an Alternative which resolves the top Issue on .iun. The question of whether to accept this Alternative is added to .iun
Accept (= "agree"). Assert acceptance of the currently maximal Alternative on .iun, removing it (and the underlying Issue if that now becomes resolved).
Reject (= "disagree"). Assert rejection of the currently maximal Alternative on .iun, removing it.
Provide Assert an Argument which provides evidence in favour of the currently maximal Alternative on .iun
Challenge Assert an Argument which provides evidence in opposition to the currently maximal Alternative on .iun

For simplicity's sake, we currently assume that these issue-relevant rhetorical-relations operate orthogonally to the question-under-discussion model: Communicate acts will be classified both according to their utterance-level, QUD-relevant effects, and their thread/issue-level, IUN-relevant effects. The Information of the Communicate act will likewise be classified both according to its short-term QUD-relevant assertion/asking content, and its longer-term IUN-relevant argumentative content. As far as the ontological model goes, this seems a reasonable first approximation.

In reality, the two models will not be independent: certain low-level communicative classes will be used to perform certain argumentative roles. For example, the raising of an issue for discussion will often coincide with the introduction of a QUD through the utterance of a question. But this is not always the case, as in the utterance "We really need to figure out who's going to buy the computer." We hope to model their interdependence in future, and express it via constraints on the relevant classes.

Physical Acts of Communication

In performing an individual Communicate action, agents Embody their Message into some physical Signal, to be Transmitted over some Medium. Correspondingly, other agents may Sense the Signal, relating it to (their version of) the Message. We can classify Embody actions by the Medium which they use and the corresponding instrument used; this allows us to specify classes for some intuitive multimodal dialogical actions. We can classify the corresponding Sense actions similarly, though these are not currently used.

Embody sub-classes

A classification of Embody actions based on the Medium trasmission
Speak An Embody action of which the medium is Sound, with the added distinction that the agent is a Person. This is a vocalization, which may or may not be verbal, bounded by a pause or breath. In CALO, a Speak event would most likely be posited by a speech-recognition front-end, which would analyze an audio signal and determine that the user of the microphone spoke something (whether verbal or nonverbal).
Gesticulate An Embody action which uses the medium of Light, with the added distinction that the instrument is a PhysicalEntity (usually a body part, but not necessarily). This is a pointing, signing, or looking action. A body tracker may posit an instance of this class.
Sketch An Embody action which uses the medium of Ink. This would include the action of writing text or drawing figures.

NOTE: In some cases, Sensors are simply turned on at the beginning of a meeting and then turned off at the end. Their continuous recording can be represented as a single Record action performed by the Sensor, whose outcome is a Recording. In the case of a far-field microphone, this will be a single (long) raw audio signal, and will not correspond directly to the Signal role in any single Communicate event. In other cases, like the whiteboard, recording events may not produce single long signals, but rather series of sensed strokes. In these cases, we will be able to make the assumption that the Recording does play the Signal role in a Communicate act being peformed by the person writing. This might also hold with close-talking microphones, where we can probably assume that Recordings will only be recorded when the Person is producing them as the Signal role part of a Communicate act. This may not be possible in a far-field or mic-array situation where we will have to calculate which parts of the (possibly multiple) signal(s) correspond to Communicate events.

Sense sub-classes

We now turn to characterizing the Sense actions, again by Medium, and also by the different Sensors which can be instruments. In each case, the object of the Sense event will be a Recording, which can play the Signal role; the result will (eventually) be a Message. For completeness, the agent of these actions should be the software agent, or 'CALO' generally; however, we do not anticipate that these actions will be explicitly entered into MOKB at the moment -- rather, software agents will posit Signals and their associated Embody events, and the Sense action will be left implicit.

Note that this framework also allows us to posit Sense actions for meeting participants themselves (i.e. that actions could be posited with human agents). However, we do not currently intend to model the sensing side of human-human communication explicitly. This does not exclude attempting to understand who is the intended addressee (recipient) of a Communicate action, only attempting to detect whether that addressee actually perceived it. This might be possible through observation of grounding indications such as eye gaze, head movements in the future.

A classification of Sense actions based on the Medium of transmission
Hear A Sense action performed with a Microphone instrument using the Medium of Sound.
See  
Read  

The Signal (DIAGRAM)

In many cases, Sensors record Recordings which get saved in some external file for later access. However, we might also like to say that some sub-element of that Recording embodies information which can play the Signal role in a physical communicative act. For example, some subsegment of an AudioRecording may contain the information which we consider to be the Signal being transmitted as part of a Speak action. To account for this, consumers of information pertaining to the Signal role of communicative acts must reference the attributes of the Record action which generated the Recording as well as look at the timing information embodied in the Transmit event to reason about where in the file to get the desired information.

The Linguistic Representation

Common sense models of natural language justify intermediate representations between the physical and the cognitive. In the communication model, this is the role of the middle level and especially the Message concept: an entity which can play a part in associating meaning with a physical realization. In the following sections, we take several steps toward fleshing out this critical intermediate level. We derive some of our conceptualization from the General Ontology for Linguistic Description (GOLD). We also refer the reader to the SIL Linguistics Glossary.

As a first step, we posit a recursively-defined LinguisticUnit, which is the building-block of Messages and is a Message itself. It may be useful to conceive of the LinguisticUnit as one would conceive of a feature structure in HPSG. An instance of a LinguisticUnit is an instance of a Message or some subpart of that message. Units can be built into constructions through composition, generating the following classes of unit:

Linguistic units and message composition
LinguisticUnit .information-content, .base A recursively-defined generic container for linguistic units of all kinds. A LinguisticUnit is a Message. In addition, a LinguisticUnit may be composed of other LinguisticUnits through composition.
LinguisticConstituent .is-part-of One of two or more LinguisticUnits that enter into a LinguisticConstruction at any level.
LinguisticConstruction .has-part A collection of LinguisticUnits forming a larger LinguisticUnit through the use of composition.
LinguisticAtom   A LinguisticUnit which is not a LinguisticConstruction

We would like to model the fact that LinguisticUnits have the same base Language as their parent LinguisticUnit. MOKB does not do reasoning currently, therefore, we request that the Language be specified as the .base of the root Message of the communicative act.

Languages

Different languages are realizable in certain physical forms. For example, many natural languages like English are realizable both in Sound through speaking and through Ink through sketching. It is even possible for English to be realized in sign. Other languages like the "language of human gesture" are only realizable through Light and physical gesturing. LinguisticUnits are expressed in particular Languages, and this attribute of a Language helps us to determine how the LinguisticUnits "may" be realized.

Instances of Languages in CALO
Language realizable-in  
English   A natural language which is realizable-in Ink, Sound, and Light.
Gesture   The language of physical human gesture, realizable-in Light.

Units of Meaning

Meaning, or semantic content, in Messages is represented as the value of the .information-content attribute. At the top level, the .information-content of the overall Message (or multiple Messages in different Languages when combining multiple modalities into a single Communicate) will provide the .information-content of the Information being Communicated by the overall act. At lower levels, subordinate LinguisticConstituents will have their own .information-content attribute. In general, this can be filled by any Thing, but certain types of LinguisticUnit will be restricted to have certain types of content. For example, proper names will require that their denotation be an Entity (as might pointing gestures); verbs will denote Events. At a higher level, assertions and queries will require particular types of information, Propositions and Questions, which we take to be the basic units of which the overall Communicated Information will be made:

Basic Units of Meaning
Proposition The proposition that some Event holds or takes place. A subclass of Information, its information-content must be an Event. It should not be confused with the Event itself, as its truth value may be true or false (i.e. the the Event may or may not hold or take place). Propositions will generally be the Information associated with Communicate acts which are assertions or proposals.
Question An Information which can be taken as an abstracted proposition, with an undefined truth value. Its .information-content attribute is filled by the Proposition under question, and the .params attribute is a set of Information objects which are the parameters being queried/abstracted from the proposition. For a yes/no question, this set will be empty. For a standard wh-question, it will have a single member corresponding to the wh-element; multiple wh-questions have larger sets.

In theory, as the .information-content of a parent LinguisticConstruction will be a (compositional) function of the .information-contents of its child LinguisticConstituents, this composition might be performed by reasoning over the ontology and class properties. Give the currently restricted reasoning capabilities of MOKB, for now we assume that each level will have its .information-content specified by some external agent (e.g. a parser).

Natural Language

In this section we characterize the types of LinguisticUnits used to express Messages in natural languages. The list is currently based on the units which are produced by functional parts of the system. Future development will include the integration of the LinguisticUnits used in the Gemini natural language processing system, including clauses and other super-word-level constructions.

Types of LinguisticUnits used in Messages for natural languages
Word .has-transcription A type of LinguisticUnit. (see http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsAWord.htm)
Sentence .has-transcription A type of LinguisticUnit. (see http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsASentence.htm)
MultiSentence .has-transcription A LinguisticUnit composed of one or more Sentences.

Embodiment of Natural Language

As described in the section outlining LinguisticUnits and their .base, we mentioned that units of natural languages are realizable-in many modalities, which means that units of natural language like Word and Sentence can be physically realized in different manners. For example a word in English like "travel" has both a spoken realization and a written one. In the following sections we outline the portion of the ontology which ties the abstract linguistic units of natural language described above, to the events and entities which make up its physical realization.

One particular modelling strategy to focus on in this section is the tripartite division between the Message and its parts, the Embody action and its subevents, and the Signal and its parts. The conceptualization here which transcends modality is to say that the Message and its parts contain everything about the message which is completely independent of its physical realization. The Embody action and its subevents contain the information pertaining to actual embodying of the Message in time. The Signal and its parts contain the information pertaining to the embodiment which the Embody action created. In speech, the embodiment in a Signal is fleeting, and we need a recording device to "capture" this. But for ink, "capturing" of the Signal is a natural aspect of the action because the ink records the action. Thus, the Signal for an ink sketch may have a more intuitive feel as an Entity than one might for a Speak event.

Spoken Natural Language (DIAGRAM)

Much of the analysis here is driven by the capabilities and modelling strategy of the speech recognizer. Namely, we are interested in the segmentation of a Speak event into segments which represent the utterance of Sentences, Words, and Phones. The strategy for modelling the Speak event is to perform a multi-layer segmentation of the entire Speak event using recursion. We use the subevent and next-event slots to capture this structure. Note that for economy, the time information of the event should be given to the lowest level, which will allow timing information to be inferred for the higher-level events.

Speak .time-during For a Person to Embody a natural language LinguisticUnit into an AudioSignal
AudioSignal .pitch, .energy, .source An Entity which is the realization of a Speak action. These entities might have some measurements attachedas attributes. Those listed are currently just stubs. It will also need to reference an information resource (probably an AudioRecording which actual contains the info).
Phone   A type of AudioSignal which is the basic temporal unit of analysis in spoken language.

We also propose to subclass the Phone class into a phone set based on the IPA. This may need to be changed depending on the set of phones used by Sphinx. With this model, each occurence of a Phone as a part of an AudioSignal would be actually be an instance of one of the phones in the set (see diagram).

Written Natural Language

This section should parallel the previous, but with an explanation of hand-written text.

Physical Gesture

In this section, we conceptualize the linguistic units that play a role in Messages in the language of 3D gesture. Generally, I have left this section unspecified and have simply based the concepts on my knowledge of the abilities of the 3D gesture recognizer. In each case, we are looking at LinguisticUnits that combine to form Messages. These Messages are then realized by a Gesticulate version of an Embody event.

Some LinguisticUnits which are Conveyed using Gesticulate events
DeicticGesture A class of LinguisticUnits which makes a direct reference through physical and communicative context to some object in the world of discourse.
IconicGesture A class of LinguisticUnits which expresses information using iconic symbols.
Look A subclass of DeicticGesture which uses the eyes to reference some object or region.
Point A subclass of DeicticGesture which uses the hand or fingers to reference some object or region.

Sketch, Graphical Objects, and Charts

Here we intend to provide a characterization of linguistic units for other modalities, like sketch and gesture. This may involve the integration of some aspects of the chart ontology that is being written by Sanjeev and Jerry.

no concepts defined yet...

Topics, Hotspots, Focus, and Abstract and Statistical Features of Discourse

Some important concepts having to do with multi-party discourse are very difficult to model concretely. This section focuses on those elements of our analysis of discourse.

Abstract Phenomena
TopicDiscussion An Event which occurs as a sub-event of a Discourse. It represents a time-slice of the dialogue which is generally about a single domain concept. We characterize Discourses as always having some major topic of discussion, with the optional occurrence of a minor topic within those segments. The minor topic would be another TopicDiscussion event which is the subevent of a major topic. TopicDiscussion events will be posited by agents such as CMU's CAMSeg and the CSLI offline analysis suite; they relate directly to argumentative threads as described above, but while threads will be determined from low-level analysis of individual utterances, TopicDiscussion segments will be determined from high-level statistical analysis. Eventually, the two sources of information will be combined to provide a more reliable discourse thread structure.
Hotspot An Event which occurs as a sub-event of a TopicDiscussion. Again these segments will be determined from higher-level statistical analysis (particularly prosodic information); they will eventually be used to help provide a more reliable argumentative structure by identifying areas in which important issues are being introduced, agreed or disagreed.

Multi-agent communication, CALO as communicative agent

It should be noted that a single Communicate act is between two agents--an agent and a recipient. To model multi-agent discourse, we therefore must either allow Communicate actions to be tied to one-another through shared elements or sub-events, or (preferably) must change the model so that Communicate allows multiple recipients. We describe those mechanisms here. To be continued...

To keep open the possibility of modelling CALO as a participant in the meeting, we conceive of CALO (and thus the dialogue interpretation system) as being an agent at the same level of communicative participation as the human participants. To be continued...

Discrepancies with CLib, Questions, and Problems

  • A base is a relation from an event to and entity, but a message has a base!

  • The agent of a communicate act does not necessarily know the information being communicated. One might claim that they believe it though.

  • Does not make sense to label information as being of only one type. Some utterances relate to the task at hand while also having a direct relationship to the discourse itself.

  • We would argue that Write is a subclass of Embody, not Express. I can "express" information into a orthographic language without actually writing it.

Diagrams

Here is a list of all the diagrams currently available. I have attempted to follow the following guidelines in diagram construction:

  • Events are in light blue
  • Entities are in green
  • Roles are in yellow
  • literal xml-schema datatypes are in grey

For queries, we also wrap the assumed knowledge in a grey box, while providing the desired result in a magenta box.

Abstract Models:

Concrete Examples:

Examples, Examples, Examples

Examples have moved to MokbOntologyUsageExamples.

IET Test Questions

IET question descriptions have moved to IetQuestionsYearTwo?

-- JohnNiekrasz - 15 Feb 2005

Calo.MokbOntologyY2Specification moved from Calo.MokbOntologyOldVersion on 16 Nov 2005 - 23:00 by JohnNiekrasz - put it back
 

Semlab Home      
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Semlab? Send feedback