SWordNet: Inferring Semantically Related Words from Software Context

dc.contributor.authorYang, Jinqiu
dc.date.accessioned2013-05-02T19:10:47Z
dc.date.available2013-12-17T06:00:10Z
dc.date.issued2013-05-02T19:10:47Z
dc.date.submitted2013
dc.description.abstractCode search is an integral part of software development and program comprehension. The difficulty of code search lies in the inability to guess the exact words used in the code. Therefore, it is crucial for keyword-based code search to expand queries with semantically related words, e.g., synonyms and abbreviations, to increase the search effectiveness. However, it is limited to rely on resources such as English dictionaries and WordNet to obtain semantically related words in software, because many words that are semantically related in software are not semantically related in English. On the other hand, many words that are semantically related in English are not semantically related in software. This thesis proposes a simple and general technique to automatically infer semantically re- lated words (referred to as rPairs) in software by leveraging the context of words in comments and code. In addition, we propose a ranking algorithm on the rPair results and study cross-project rPairs on two sets of software with similar functionality, i.e., media browsers and operating sys- tems. We achieve a reasonable accuracy in nine large and popular code bases written in C and Java. Our further evaluation against the state of art shows that our technique can achieve a higher precision and recall. In addition, the proposed ranking algorithm improves the rPair extraction accuracy by bringing correct rPairs to the top of the list. Our cross-project study successfully discovers overlapping rPairs among projects of similar functionality and finds that cross-project rPairs are more likely to be correct than project-specific rPairs. Since the cross-project rPairs are highly likely to be general for software of the same type, the discovered overlapping rPairs can benefit other projects of the same type that have not been anaylyzed.en
dc.description.embargoterms1 yearen
dc.identifier.urihttp://hdl.handle.net/10012/7514
dc.language.isoenen
dc.pendingtrueen
dc.publisherUniversity of Waterlooen
dc.subjectSemantically related wordsen
dc.subjectcode searchen
dc.subjectprogram comprehensionen
dc.subject.programElectrical and Computer Engineering (Software Engineering)en
dc.titleSWordNet: Inferring Semantically Related Words from Software Contexten
dc.typeMaster Thesisen
uws-etd.degreeMaster of Scienceen
uws-etd.degree.departmentElectrical and Computer Engineeringen
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Jinqiu_Yang.pdf
Size:
330.74 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
248 B
Format:
Item-specific license agreed upon to submission
Description: