Show simple item record

dc.contributor.authorShaban, Khaleden
dc.date.accessioned2007-05-08 13:47:02 (GMT)
dc.date.available2007-05-08 13:47:02 (GMT)
dc.date.issued2006en
dc.date.submitted2006en
dc.identifier.urihttp://hdl.handle.net/10012/2860
dc.description.abstractThe explosive growth in the number of documents produced daily necessitates the development of effective alternatives to explore, analyze, and discover knowledge from documents. Document mining research work has emerged to devise automated means to discover and analyze useful information from documents. This work has been mainly concerned with constructing text representation models, developing distance measures to estimate similarities between documents, and utilizing that in mining processes such as document clustering, document classification, information retrieval, information filtering, and information extraction. <br /><br /> Conventional text representation methodologies consider documents as bags of words and ignore the meanings and ideas their authors want to convey. It is this deficiency that causes similarity measures to fail to perceive contextual similarity of text passages due to the variation of the words the passages contain, or at least perceive contextually dissimilar text passages as being similar because of the resemblance of words the passages have. <br /><br /> This thesis presents a new paradigm for mining documents by exploiting semantic information of their texts. A formal semantic representation of linguistic inputs is introduced and utilized to build a semantic representation scheme for documents. The representation scheme is constructed through accumulation of syntactic and semantic analysis outputs. A new distance measure is developed to determine the similarities between contents of documents. The measure is based on inexact matching of attributed trees. It involves the computation of all distinct similarity common sub-trees, and can be computed efficiently. It is believed that the proposed representation scheme along with the proposed similarity measure will enable more effective document mining processes. <br /><br /> The proposed techniques to mine documents were implemented as vital components in a mining system. A case study of semantic document clustering is presented to demonstrate the working and the efficacy of the framework. Experimental work is reported, and its results are presented and analyzed.en
dc.formatapplication/pdfen
dc.format.extent1461362 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.rightsCopyright: 2006, Shaban, Khaled. All rights reserved.en
dc.subjectElectrical & Computer Engineeringen
dc.subjectDocument miningen
dc.subjectsemantic understandingen
dc.subjecttext representationen
dc.subjectsimilarity measureen
dc.subjectdocument clustering.en
dc.titleA Semantic Graph Model for Text Representation and Matching in Document Miningen
dc.typeDoctoral Thesisen
dc.pendingfalseen
uws-etd.degree.departmentElectrical and Computer Engineeringen
uws-etd.degreeDoctor of Philosophyen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages