The topic service has two versions: One is the
OnlineTopicSegmenterService? , and the other one is the
TopicSegmenterService? (which extends the online one).
The
TopicSegmenterService? is meant to be more accurate and provide more functionalities than the
OnlinetopicSegmenterService? , at the cost of a long precomputation time for each of the meetings. Using serialization, this precomputation is made only once. This explains that most of the methods can run very fast in some situations, but seem to take forever on the first time we run it.
To explain how things go, consider the schema below:
This shows the different cached states of a meeting:
- A corresponds to the original state: we know the name, and the content of the knowledge base; but nothing relatively to the topics.
- B corresponds to the state where we have done all the precomputations: All the data we compute about the meeting to be able to work with its topics.
- C correponds to the completely processed meeting: Additionaly to all the precomputations, we have already run the most time-consuming query: the segmentation of the meeting. And stored the results in the cache.
Each of the methods depends on a certain state of the meeting, and can be run on any of its children as well.
Only state A is required:
- csliSearchTopic(Topic) works with all the meetings at the same time, but does not require any heavy precomputation. Building a search index is necessary but this should be done only once each time a meeting is added to the corpus, and it is not too long.
- csliGetTopicSimilarity(Topic,Topic) does not need meetings at all, and therefore does not require more than state A.
Most of the advanced methods actually require no more than B:
- extractTopicBoundaries(String,XSDDateTime)
- locateTopic(String,Topic)
- extractTopic(String,XSDDateTime)
- extractTopic(String,Pair<XSDDateTime,XSDDateTime>)
- wasDiscussed(Topic,String)
- searchTopic(Topic)
- csliLocateTopic(String,Topic)
- csliExtractTopic(String,XSDDateTime)
- csliExtractTopic(String,Pair<XsdDateTime,XSDDateTime>)
- wasDiscussed(Topic,String)
And C is required for the remaining:
- getTopicBreaks(String)
- csliGetTopics(String)
--
StephaneLaidebeure - 06 Jun 2006