|
Discourse Segmentation of Spoken Dialogue: An Empirical Approach MIT Laboratory for Computer Science Spoken Language Systems Group May 1998 Download: Thesis (PDF, 152 pages, 46 figures, 12 tables, 1.8 Mb) Nb discourse annotation tool (Zip, Tcl/Tk application with annotated examples, 221 kb) Kappa coefficient program (plain text source in C). In this thesis, the analysis of a corpus of information-seeking dialogues provides evidence about the differences between human-to-human telephone conversation and interactive voice response systems (IVRs) and question-answer systems (QAs). In IVRs and QAs interaction is necessarily limited a priori. In contrast, in natural conversation either speaker can take the initiative at all time. In spite of this lack of constraints, information-seeking dialogues such as getting theater showtimes and giving directions are highly structured. The goal of this thesis has been to determine empirically the extent to which structured discourse segment boundaries can be extracted from annotated transcriptions of spontaneous, natural dialogues. The contributions of this thesis are twofold. Firstly, we developed and evaluated the performance of a novel annotation tool called Nb and associated discourse segmentation instructions. Our findings indicate that it is possible to obtain reliable discourse segmentation when the annotation task is limited to choosing among few independent alternatives. The scores for the most reliable experiments are 83.9% recall, 85% precision, 0.82 kappa coefficient (22 dialogues, between 7 and 9 coders per dialogue). Secondly, the annotated data support cognitive theories of dialogue as a joint activity (Clark and Schaefer 1989, Grosz and Sidner 1990, among others) in which discourse segments are initiated by either speaker with the purpose of either repairing/preventing misunderstanding or co-operatively finding a mutually agreed upon solution to the task at hand. The data also support the hypothesis that a stack data structure can model spontaneous phenomena such as repairs, fresh starts and switches between multiple active purposes.
Thesis Topic Keywords: Discourse Analysis, Dialogue, Telephone Conversations, Natural Language Processing, Discourse Segments, Discourse Segmentation, Corpus Analysis, Discourse Annotation, Content Analysis, Kappa Coefficient. |