A Framework for Logical Structure Extraction from Software Requirements Documents
MetadataShow full item record
General purpose rich-text editors, such as MS Word are often used to author software requirements specifications. These requirements specifications contain many different logical structures, such as use cases, business rules and functional requirements. Automated recognition and extraction of these logical structures is necessary to provide useful automated requirements management features, such as automated traceability, template conformance checking, guided editing and interoperability with sophisticated requirements management tools like Requisite Pro. The variability among instances of these logical structures and their attributes poses many challenges for their accurate recognition and extraction. The thesis provides a framework for the extraction of logical structures from software requirements documents. The framework models information about style, structure, and attributes of the logical structures and uses the defined meta-model to extract instances of logical structures. A meta-model also incorporates information about the variability present in the instances. The framework includes an extraction tool, ET, that reads the meta-model and extracts instances of modelled logical structures from the documents. The framework is evaluated on a collection of real-world software requirements documents. Using the framework, different logical structures can be extracted with high precision and recall, each close to 100%. The performance of the extraction tool is acceptable for fast extraction of logical structures from documents with extraction times ranging from a few milliseconds to a few seconds.