Optimizing Automated Bug Localization for Practical Use

Chakraborty, Partha

Optimizing Automated Bug Localization for Practical Use

dc.contributor.advisor	Nagappan, Meiyappan
dc.contributor.author	Chakraborty, Partha
dc.date.accessioned	2024-12-13T18:54:38Z
dc.date.available	2024-12-13T18:54:38Z
dc.date.issued	2024-12-13
dc.date.submitted	2024-12-11
dc.description.abstract	A considerable share of resources and developers' efforts is focused on addressing software bugs. Identifying the root causes of these bugs within the codebase is crucial for their resolution. Automated tools for bug localization aim to assist in this process. However, their effectiveness is often limited, leading to low adoption rates. This low adoption rate indicates the disparity between research goals and developers' expectations, emphasizing the need for improvements in bug localization tools. This thesis explores and addresses the challenges faced by developers and tool-builders in implementing practical bug localization tools. Our research focuses on understanding developers' expectations and enhancing the tools' overall effectiveness. Initially, we conduct a mixed-method empirical study to understand developers' expectations. The study reveals that while developers are willing to use bug localization tools, they have concerns related to accuracy and potential leakage of intellectual property. We found that only 27.5% of developers are familiar with these tools. The study indicates that developers need more reliable performance, better integration, flexibility, transparency, and contextual understanding to increase adoption and effectiveness. We also examine performance issues in bug localization tools, particularly with their base—the embedding model. We found that key factors such as pre-training strategies, data familiarity, and input sequence length in embedding techniques significantly affect performance. Our findings show that using project-specific data and pre-training methods like ELECTRA can improve model performance by 25.9%. Additionally, we explore the use of reinforcement learning (RL) in bug localization and propose an RL agent called RLocator. RLocator learns from developer feedback, making it suitable for low-data environments. We also propose BLAZE, an efficient bug localization technique for cross-project and cross-language settings. By using dynamic chunking, a technique that dynamically adjusts the size of the input data to the model, and hard example learning, BLAZE achieves up to a 144% improvement in Mean Average Precision (MAP) compared to previous tools. In conclusion, our findings highlight the shortcomings in the adaptability and efficiency of current tools. We advocate for highly adaptable cross-language, cross-project bug localizers to enhance adoption rates among developers. By leveraging our observations, curated datasets, and proposed methods, tool builders can create more user-friendly bug localization tools for software developers, inspiring a new wave of innovation in this field.
dc.identifier.uri	https://hdl.handle.net/10012/21249
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	bug
dc.subject	fault
dc.subject	localization
dc.subject	embedding
dc.subject	deep learning
dc.subject	machine learning
dc.subject	bug reports
dc.subject	large language model
dc.subject	llm
dc.title	Optimizing Automated Bug Localization for Practical Use
dc.type	Doctoral Thesis
uws-etd.degree	Doctor of Philosophy
uws-etd.degree.department	David R. Cheriton School of Computer Science
uws-etd.degree.discipline	Computer Science
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Nagappan, Meiyappan
uws.contributor.affiliation1	Faculty of Mathematics
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Chakraborty_Partha.pdf
Size:: 21.05 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science