An Experimental Study of Selected Methods towards Achieving 100% Recall of Synonyms in Software Requirements Documents

Lan, XiaoyeAn Experimental Study of Selected Methods towards Achieving 100% Recall of Synonyms in Software Requirements DocumentsUniversity of Waterloo2015Natural language processing (Computer science)semantic computingSynonymsRequirements engineeringComputer ScienceMy UniversityMy UniversityBerry, Daniel2015-12-012015-12-0120152015enMaster Thesishttp://hdl.handle.net/10012/10019Software requirements documents written in natural language need to avoid the use of synonyms to reduce unnecessary confusion and ambiguity. In practice, synonyms are still common and are widely used in requirements documents. Lots of tools to identify synonyms have been developed. To evaluate these tools, two metrics are often used: recall and precision. Recall is the ratio of the number of relevant records retrieved to the total number of relevant records in the document. Precision is the fraction of retrieved records that are relevant. Industry practice leads us to believe that 100% recall is preferred over 100% precision for such tools. Available tools never actually achieve 100% recall. The goal of this thesis is to explore computational methods that could reach 100% recall in extracting synonyms from software requirements documents. This thesis compares six WordNet-based methods and two context-based algorithmic approaches to extract synonyms from two different types of requirement documents. The eight methods were compared by their recall. The experiments results showed that the word co-occurrence-based method achieved the best recall in identifying synonyms of the software requirements documents. Further experiments showed that setting the parameters of the word co-occurrence-based method impacts the results of the experiments as well. The thesis also discusses potential issues of the word co-occurrence-based method in the design of the experiments. The document author's personal factors could influence the experiment results, but this influence can be avoided with careful design.