CALO-Specific NOMOS and OPI? Information
This document contains information on using
NOMOS and the
OPI? framework with CALO datasets. It is intended for for our CALO partners who wish to annotate or view the data using NOMOS.
Obtaining and Running NOMOS
Please see the
CSLI Software Manual for CALO Partners. Once you obtain the NOMOS software, run the NOMOS agent from
calo/run.
Importing the Y2 Data into NOMOS
Follow the importing instructions below for importing CALO media files and for creating NOMOS versions of the annotations (e.g. transcriptions, ASR output, etc). Instructions for loading LDC corpora like ICSI, NIST, and ISL are available in the
NOMOS Manual. This will allow you to use Y2 SRI recordings in NOMOS for viewing, querying and processing. The directions below follow the same basic process for importing corpora into NOMOS as that which is described in the
NOMOS Manual.
Configuration
The default NOMOS configuration properties (for the version provided in the SRI CVS) may be found in
calo/lib/config/calo.config. In the same manner described in the
NOMOS Manual, you will need to override some of these configurations using a
calo.local.config file.
All NOMOS import scripts require that the
corpora.orig.path configuration parameter is set. This points to the various locations on disk which contain the original files to import. Also, the
file.lookup.path key should be set to the same place as well, since this is used when finding files for media playback. Both of these should be absolutely specified.
Next, for each specific corpus that NOMOS can import (or in the CALO case, each main part of a corpus), a parameter should be set to hold the name of the directory used to hold the associated data. These directory names are not paths but are simply the names of the directories. The list of paths in
corpora.orig.path are then searched for the directory names given in the corpus-specific properties. The following is how we set up out=r config file here at CSLI for the Y2 data:
file.lookup.path=[/shared/corpora]
corpora.orig.path=[/shared/corpora]
corpora.orig.sricaloy2.mokbs=sri-calo-y2-mokbs
corpora.orig.sricaloy2.transcripts=sri-calo-y2-transcripts
corpora.orig.sricaloy2.recordings=sri-calo-y2-recordings
Preparation
The importing script expects the data to be on disk in generally the same allocation as that on bigtivo, except with the various sequences placed into a single directory. One should simply download the
.tgz files provided on the bigtivo web site, and construct a file structure like this:
[corpora.orig.path]/
[corpora.orig.sricaloy2.recordings]/
seq-C/
seq-D/
...
seq-H/
1117571796000/
1117572737000/
...
1117575878000/
MOKB/
charter.ink
CAMEO_130.107.94.66/
CAMEO_130.107.94.164/
...
[corpora.orig.sricaloy2.mokbs]/
seq-G-exper-mokb.n3
seq-G-inexper-mokb.n3
seq-H-exper-mokb.n3
seq-H-inexper-mokb.n3
[corpora.orig.sricaloy2.transcripts]/
Meeting Sequence G/
...
Meeting 1
...
Transcriptions
2005_05_24_14_38_26_065_jmarlow.trs
2005_05_24_14_38_26_336_jpark.trs
2005_05_24_14_38_26_662_john_pedersen.trs
trans-14.dtd
trans-13.dtd
Importing
Open NOMOS and choose
Run > Run Import Script.... Choose the class named
ImportCaloY2SriRecordings. This script will create datasets compatible with NOMOS into the directory which has been specified using the
opi.file.serialized parameter (typically this is called
nomosarchive).
Using the data in NOMOS
The meetings are now available as NOMOS sessions. Open the one you want in NOMOS, following the normal procedure for opening a session in NOMOS. See
NomosOpiAnnotationInfo for diagrams, schema definitions, and ontology info for available datasets.