UWSpace is currently experiencing technical difficulties resulting from its recent migration to a new version of its software. These technical issues are not affecting the submission and browse features of the site. UWaterloo community members may continue submitting items to UWSpace. We apologize for the inconvenience, and are actively working to resolve these technical issues.
 

Mining Topic Signals from Text

dc.contributor.authorAl-Halimi, Reem Khalilen
dc.date.accessioned2006-08-22T14:28:39Z
dc.date.available2006-08-22T14:28:39Z
dc.date.issued2003en
dc.date.submitted2003en
dc.description.abstractThis work aims at studying the effect of word position in text on understanding and tracking the content of written text. In this thesis we present two uses of word position in text: topic word selectors and topic flow signals. The topic word selectors identify important words, called <i>topic words</i>, by their spread through a text. The underlying assumption here is that words that repeat across the text are likely to be more relevant to the main topic of the text than ones that are concentrated in small segments. Our experiments show that manually selected keywords correspond more closely to topic words extracted using these selectors than to words chosen using more traditional indexing techniques. This correspondence indicates that topic words identify the topical content of the documents more than words selected using the traditional indexing measures that do not utilize word position in text. The second approach to applying word position is through <i>topic flow signals</i>. In this representation, words are replaced by the topics to which they refer. The flow of any one topic can then be traced throughout the document and viewed as a signal that rises when a word relevant to the topic is used and falls when an irrelevant word occurs. To reflect the flow of the topic in larger segments of text we use a simple smoothing technique. The resulting smoothed signals are shown to be correlated to the ideal topic flow signals for the same document. Finally, we characterize documents using the importance of their topic words and the spread of these words in the document. When incorporated into a Support Vector Machine classifier, this representation is shown to drastically reduce the vocabulary size and improve the classifier's performance compared to the traditional word-based, vector space representation.en
dc.formatapplication/pdfen
dc.format.extent844148 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10012/1165
dc.language.isoenen
dc.pendingfalseen
dc.publisherUniversity of Waterlooen
dc.rightsCopyright: 2003, Al-Halimi, Reem Khalil. All rights reserved.en
dc.subjectComputer Scienceen
dc.subjecttopic spreaden
dc.subjecttopic flow signalsen
dc.subjecttopic characterizationen
dc.subjecttopic wordsen
dc.subjecttopic word selectorsen
dc.subjecttopic relevance measuresen
dc.titleMining Topic Signals from Texten
dc.typeDoctoral Thesisen
uws-etd.degreeDoctor of Philosophyen
uws-etd.degree.departmentSchool of Computer Scienceen
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ralhalim2003.pdf
Size:
824.36 KB
Format:
Adobe Portable Document Format