Show simple item record

dc.contributor.authorLin, Yuan
dc.date.accessioned2008-08-19 15:48:35 (GMT)
dc.date.available2008-08-19 15:48:35 (GMT)
dc.date.issued2008-08-19T15:48:35Z
dc.date.submitted2008-08-07
dc.identifier.urihttp://hdl.handle.net/10012/3865
dc.description.abstractThis thesis deals with fact extraction, which analyzes source code (and sometimes related artifacts) to produce extracted facts about the code. These facts may, for example, record where in the code variables are declared and where they are used, as well as related information. These extracted facts are typically used in software reverse engineering to reconstruct the design of the program. This thesis has two main parts, each of which deals with a formal approach to fact extraction. Part 1 of the thesis deals with the question: How can we demonstrate that a fact extractor actually does its job? That is, does the extractor produce the facts that it is supposed to produce? This thesis builds on the concept of semantic completeness of a fact extractor, as defined by Tom Dean et al, and further defines source, syntax and compiler completeness. One of the contributions of this thesis is to show that in particular important cases (when the extractor is deterministic and its front end is idempotent), there is an efficient algorithm to determine if the extractor is compiler complete. This result is surprising, considering that in general it is undecidable if two programs are semantically equivalent, and it would seem that source code and its corresponding extracted facts are each essentially programs that are to be proved to be equivalent or at least sufficiently similar. The larger part of the thesis, Part 2, presents Algebraic Refers-to Analysis (ARA), a new approach to fact extraction with emphasis on the Refers-to relation. ARA provides a framework for specifying fact extraction, based on a three-step pipeline: (1) basic (lexical and syntactic) extraction, (2) a normalization step and (3) a binding step. For practical programming languages, these three steps are repeated, in stages and phases, until the Refers-to relation is computed. During the writing of this thesis, ARA pipelines for C, Java, C++, Fortran, Pascal and Ada have been designed. A prototype fact extractor for the C language has been created. Validating ARA means to demonstrate that ARA pipelines satisfy the programming language standards such as ISO C++ standard. In other words, we show that ARA phases (stages and formulas) are correctly transcribed from the rules in the language standard. Comparing with the existing approaches such as Attribute Grammar, ARA has the following advantages. First, ARA formulas are concise, elegant and more importantly, insightful. As a result, we have some interesting discovery about the programming languages. Second, ARA is validated based on set theory and relational algebra, which is more reliable than exhaustive testing. Finally, ARA formulas are supported by existing software tools such as database management systems and relational calculators. Overall, the contributions of this thesis include 1) the invention of the concept of hierarchy of completeness and the automatic testing of completeness, 2) the use of the relational data model in fact extraction, 3) the invention of Algebraic Refers-to Relation Analysis (ARA) and 4) the discovery of some interesting facts of programming languages.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectfact extractoren
dc.subjectsemantic analysisen
dc.subjectname resolutionen
dc.subjectprogramming languagesen
dc.subjectcompilationen
dc.subjectattribute grammaren
dc.subjectreverse engineeringen
dc.subjectsoftware engineeringen
dc.titleCompleteness of Fact Extractors and a New Approach to Extraction with Emphasis on the Refers-to Relationen
dc.typeDoctoral Thesisen
dc.pendingfalseen
dc.subject.programComputer Scienceen
uws-etd.degree.departmentSchool of Computer Scienceen
uws-etd.degreeDoctor of Philosophyen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages