Word Distributions for Thematic Segmentation in a Support Vector Machine Approach

Maria Georgescul, Alexander Clark, Susan Armstrong

Research output: Chapter in Book/Report/Conference proceedingConference paper

11 Citations (Scopus)

Abstract

We investigate the appropriateness of using a technique based on support vector machines for identifying thematic structure of text streams. The thematic segmentation task is modeled as a binary-classification problem, where the different classes correspond to the presence or the absence of a thematic boundary. Experiments are conducted with this approach by using features based on word distributions through text. We provide empirical evidence that our approach is robust, by showing good performance on three different data sets. In particular, substantial improvement is obtained over previously published results of word-distribution based systems when evaluation is done on a corpus of recorded and transcribed multi-party dialogs.
Original languageUndefined/Unknown
Title of host publicationProceedings of CoNLL
Pages101-108
Number of pages8
Publication statusPublished - 2006

Cite this