Integrating Structure and Meaning: Using Holographic Reduced Representations to Improve Automatic Text Classification

dc.contributor.authorFishbein, Jonathan Michael
dc.date.accessioned2008-07-10T13:40:51Z
dc.date.available2008-07-10T13:40:51Z
dc.date.issued2008-07-10T13:40:51Z
dc.date.submitted2008
dc.description.abstractCurrent representation schemes for automatic text classification treat documents as syntactically unstructured collections of words (Bag-of-Words) or `concepts' (Bag-of-Concepts). Past attempts to encode syntactic structure have treated part-of-speech information as another word-like feature, but have been shown to be less effective than non-structural approaches. We propose a new representation scheme using Holographic Reduced Representations (HRRs) as a technique to encode both semantic and syntactic structure, though in very different ways. This method is unique in the literature in that it encodes the structure across all features of the document vector while preserving text semantics. Our method does not increase the dimensionality of the document vectors, allowing for efficient computation and storage. We present the results of various Support Vector Machine classification experiments that demonstrate the superiority of this method over Bag-of-Concepts representations and improvement over Bag-of-Words in certain classification contexts.en
dc.identifier.urihttp://hdl.handle.net/10012/3819
dc.language.isoenen
dc.pendingfalseen
dc.publisherUniversity of Waterlooen
dc.subjectHolographic Reduced Representationsen
dc.subjectVector Space Modelen
dc.subjectText Classificationen
dc.subjectParts of Speech Taggingen
dc.subjectRandom Indexingen
dc.subjectSupport Vector Machinesen
dc.subjectSyntactic Structureen
dc.subjectSemanticsen
dc.subject.programSystem Design Engineeringen
dc.titleIntegrating Structure and Meaning: Using Holographic Reduced Representations to Improve Automatic Text Classificationen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Applied Scienceen
uws-etd.degree.departmentSystems Design Engineeringen
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis.pdf
Size:
816.66 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
263 B
Format:
Item-specific license agreed upon to submission
Description: