Creating New IET Test Question for Y3
We would like to define some sets of test questions that ask about a particular single aspect (output) of the system but at various levels of understanding - expecting that shallower levels will be quicker to show decent performance, but then deeper levels might show more improvement later (and perhaps more influence of LITW).
For action items, this might involve, from shallowest to deepest:
- Identification of action-item-related utterances
- Identification of utterance function: proposal/assignment of task, assignment of deadline, acceptance, etc.
- Identification of responsible party
- Identification of deadline
- Identification of task (semantic representation)
- Relation to previous action items/tasks from knowledge base (including previous meetings)
For topic identification & segmentation, from shallowest to deepest:
- Segmentation of the discourse into segments useful for browsing
- Production of relevant keyword summaries
- Relation of topics to agenda items
- Relation of topics to each other
- Relation of topics to known tasks/projects from knowledge base (including previous meetings)
For general decisions:
- Identification of decision-making-related utterances
- Identification of utterance function: (counter-)proposal, (dis)agreement etc.
- Identification of decision end-points
- Identification of issue under discussion (semantic representation)
- Identification of final decision (semantic representation)
- Relation of issue to previous issues from knowledge base
- Relation of issue to known projects, tasks from knowledge base
- Effect of decision on related project/tasks
For projects and tasks:
- Relation of broad topic segments to known projects/tasks
- Relation of narrow (minor topic/issue) segments to known projects/tasks
- Identification of new tasks (semantic representation)
- Effect of discussion (progress changes) on projects/tasks
User-Oriented Testing
It would be nice to have some element of testing which is representative of how an end user might perceive the system. Firstly it might allow us to test aspects (e.g. browsing, search) which can't be tested by the current IET approach (CATS scripts querying for specific knowledge base entries). Secondly, it might give us more useful feedback for our design than the curent IET regime. Thirdly, if performed at each stage in the five-meeting sequence, it could actually provide some LITW feedback for the test (this is reasonable, as this is how some LITW will be obtained in real usage). Using it as an evaluation test may not be acceptable to DARPA in principle (user testing necessarily involves a user so can't measure the system independently) - but if not, there is still a good argument for including user feedback as part of the test cycle (otherwise we would unfairly lose LITW opportunities) - so it might still make sense to use the results of that as our own internal evaluation.
Tasks that we'd like users to perform:
- Given a topic or agenda item, find the relevant section of the meeting and review it.
- Given a topic or agenda item, find which decisions were made on it
- Given a project or task, find which decisions were made on it
- Given an action item from a previous meeting: was progress on it reported?
- Find the new action items which person X (you?) is responsible for.
- Given a project or task: what is its progress? (there's an IET question like this)