Show simple item record

dc.contributor.authorArumugam, Lakshmanan 20:53:45 (GMT) 20:53:45 (GMT)
dc.description.abstractThe world is moving towards an age centered around digital artifacts created by individuals, not only are the digital artifacts being created at an alarming rate, also the software to manage such artifacts is increasing than ever. Majority of any software is infused with large number of source code files. Therefore, code search has become an intrinsic part of software development process today and the universe of source code is only growing. Although, there are many general purpose search engines such as Google, Bing and other web search engines that are used for code search, such search engines are not dedicated only for software code search. Moreover, keyword based search may not return relevant documents when the search keyword is not present in the candidate documents. And, it does not take into account the semantic and syntactic properties of software artifacts such as source code. Semantic search (in the context of software engineering) is an emerging area of research that explores the efficiency of searching a code base using natural language queries. In this thesis, we aim to provide developers with the ability to locate source code blocks/snippets through semantic search that is built using neural models. Neural models are capable of representing natural language using vectors that have been shown to carry semantic meanings and are being used in various NLP tasks. Specifically, we want to use Code2Vec, a model that learns distributed representations of source code called code embeddings, to evaluate its performance against the task of semantically searching code snippets. The main idea behind using Code2Vec is that source code is structurally different from natural language and a model that uses the syntactic nature of source code can be helpful in learning the semantic properties. We pair Code2Vec with other neural models that represents natural language through vectors to create a hybrid model that outperforms previous benchmark baseline models developed in the CodeSearchNet challenge. We also studied the impact of various metatdata (such as popularity of the repository, code snippet token length etc.,) on the retrieved code snippets with respect to its relevance.en
dc.publisherUniversity of Waterlooen
dc.subjectsemantic searchen
dc.subjectsource codeen
dc.titleSemantic code search using Code2Vec: A bag-of-paths modelen
dc.typeMaster Thesisen
dc.pendingfalse R. Cheriton School of Computer Scienceen Scienceen of Waterlooen
uws-etd.degreeMaster of Mathematicsen
uws.contributor.advisorNagappan, Meiyappan
uws.contributor.affiliation1Faculty of Mathematicsen

Files in this item


This item appears in the following Collection(s)

Show simple item record


University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages