Show simple item record

dc.contributor.authorYang, Jinqiu
dc.date.accessioned2013-05-02 19:10:47 (GMT)
dc.date.available2013-12-17 06:00:10 (GMT)
dc.date.issued2013-05-02T19:10:47Z
dc.date.submitted2013
dc.identifier.urihttp://hdl.handle.net/10012/7514
dc.description.abstractCode search is an integral part of software development and program comprehension. The difficulty of code search lies in the inability to guess the exact words used in the code. Therefore, it is crucial for keyword-based code search to expand queries with semantically related words, e.g., synonyms and abbreviations, to increase the search effectiveness. However, it is limited to rely on resources such as English dictionaries and WordNet to obtain semantically related words in software, because many words that are semantically related in software are not semantically related in English. On the other hand, many words that are semantically related in English are not semantically related in software. This thesis proposes a simple and general technique to automatically infer semantically re- lated words (referred to as rPairs) in software by leveraging the context of words in comments and code. In addition, we propose a ranking algorithm on the rPair results and study cross-project rPairs on two sets of software with similar functionality, i.e., media browsers and operating sys- tems. We achieve a reasonable accuracy in nine large and popular code bases written in C and Java. Our further evaluation against the state of art shows that our technique can achieve a higher precision and recall. In addition, the proposed ranking algorithm improves the rPair extraction accuracy by bringing correct rPairs to the top of the list. Our cross-project study successfully discovers overlapping rPairs among projects of similar functionality and finds that cross-project rPairs are more likely to be correct than project-specific rPairs. Since the cross-project rPairs are highly likely to be general for software of the same type, the discovered overlapping rPairs can benefit other projects of the same type that have not been anaylyzed.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectSemantically related wordsen
dc.subjectcode searchen
dc.subjectprogram comprehensionen
dc.titleSWordNet: Inferring Semantically Related Words from Software Contexten
dc.typeMaster Thesisen
dc.pendingtrueen
dc.subject.programElectrical and Computer Engineering (Software Engineering)en
dc.description.embargoterms1 yearen
uws-etd.degree.departmentElectrical and Computer Engineeringen
uws-etd.degreeMaster of Scienceen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages