Computer Science
Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9930
This is the collection for the University of Waterloo's Cheriton School of Computer Science.
Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).
Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.
Browse
Browsing Computer Science by Issue Date
Now showing 1 - 20 of 1617
- Results Per Page
- Sort Options
Item Effcient Simulation of Message-Passing in Distributed-Memory Architectures(University of Waterloo, 1996) Demaine, ErikIn this thesis we propose a distributed-memory parallel-computer simulation system called PUPPET (Performance Under a Pseudo-Parallel EnvironmenT). It allows the evaluation of parallel programs run in a pseudo-parallel system, where a single processor is used to multitask the program's processes, as if they were run on the simulated system. This allows development of applications and teaching of parallel programming without the use of valuable supercomputing resources. We use a standard message-passing language, MPI, so that when desired (e. g. , development is complete) the program can be run on a truly parallel system without any changes. There are several features in PUPPET that do not exist in any other simulation system. Support for all deterministic MPI features is available, including collective and non-blocking communication. Multitasking (more processes than processors) can be simulated, allowing the evaluation of load-balancing schemes. PUPPET is very loosely coupled with the program, so that a program can be run once and then evaluated on many simulated systems with multiple process-to-processor mappings. Finally, we propose a new model of direct networks that ignores network traffic, greatly improving simulation speed and often not signficantly affecting accuracy.Item Simulated Overloading using Generic Functions in Scheme(University of Waterloo, 1997) Cox, AnthonyThis thesis investigates extending the dynamically-typed, functional programming language Scheme, with simulated overloading in order to permit the binding of multiple, distributed defnitions to function names. Overloading facilitates the use of an incremental style of programming in which functions can be defined with a base behaviour and then extended with additional behaviour as it becomes necessary to support new data types. A technique is demonstrated that allows existing functions to be extended, without modifcation, therefore improving code reuse. Using the primitives provided by Scheme, it is possible to write functions that perform like the generic routines (functions) of the programming language EL1. These functions use the type of their arguments to determine, at run-time, the computation to perform. It is shown that by gathering the definitions for an overloaded function and building a generic routine, the language appears to provide overloading. A language extension that adds the syntax necessary to instruct the system to gather the distributed set of definitions for an overloaded function and incrementally build an equivalently applicable generic function is described. A simple type inference algorithm, necessary to support the construction of generic functions, is presented and detailed. Type inference is required to determine the domain of an overloaded function in order to generate the code needed to perform run-time overload resolution. Some limitations and possible extensions of the algorithm are discussed.Item Folding Orthogonal Polyhedra(University of Waterloo, 1999) Sun, JulieIn this thesis, we study foldings of orthogonal polygons into orthogonal polyhedra. The particular problem examined here is whether a paper cutout of an orthogonal polygon with fold lines indicated folds up into a simple orthogonal polyhedron. The folds are orthogonal and the direction of the fold (upward or downward) is also given. We present a polynomial time algorithm to solve this problem. Next we consider the same problem with the exception that the direction of the folds are not given. We prove that this problem is NP-complete. Once it has been determined that a polygon does fold into a polyhedron, we consider some restrictions on the actual folding process, modelling the case when the polyhedron is constructed from a stiff material such as sheet metal. We show an example of a polygon that cannot be folded into a polyhedron if folds can only be executed one at a time. Removing this restriction, we show another polygon that cannot be folded into a polyhedron using rigid material.Item A Formalization of an Extended Object Model Using Views(University of Waterloo, 2000) Nova, Luis C. M.Reuse of software designs, experience and components is essential to making substantial improvements in software productivity, development cost, and quality. However, the many facets of reuse are still rarely used in the various phases of the software development lifecycle because of a lack of adequate theories, processes, and tools to support consistent application of reuse concepts. There is a need for approaches including definitions, models and properties of reuse that would provide explicit guidance to a software development team in applying reuse. In particular there is a need to provide abstractions that clearly separate the various functional concerns addressed in a software system. Separating concerns simplifies the identification of the software components that can benefit from reuse and can provide guidance on how reuse may be applied. In this thesis we present an extended model related to the separation of concerns in object-oriented design. The model, called views, indicates how an object-oriented design can be clearly separated into objects and their corresponding interfaces. In this model objects can be designed so that they are independent of their environment, because adaptation to the environment is the responsibility of the interface or view. The view can be seen as expressing the semantics for the 'glue' that joins components or objects together to create a software system. Informal versions of the views model have already been successfully applied to operational and commercial software systems. The objective of this thesis is to provide the views notion with a theoretical foundation to address reuse and separation of concerns. After clearly defining the views model we show the formal approach to combining the objects, interfaces (views), and their interconnection into a complete software system. The objects and interfaces are defined using an object calculus based on temporal logic, while the interconnections among object and views are specified using category theory. This formal framework provides the mathematical foundation to support the verification of the properties of both the components and the composite software system. We then show how verification can be mechanized by converting the formal version of the views model into higher-order logic and using PVS to support mechanical proofs.Item Hadez, a Framework for the Specification and Verification of Hypermedia Applications(University of Waterloo, 2000) Morales-Germán, DanielIn recent years, several methodologies for the development of hypermedia applications have been proposed. These methodologies are, primarily, guidelines to be followed during the design process. They also indicate what deliverables should be created at each of their stages. These products are usually informally specified - in the sense that they do not have formal syntax nor formally defined semantics - and they are not required to pass validity tests. Hadez formally specifies the design of a hypermedia application, supports the verification of properties of the specification, and promotes the reuse of design. Hadez is an object-oriented specification language with formal syntax and semantics. Hadez is based on the formal specification languages Z and Z++, with extensions unique to hypermedia. It uses set theory and first order predicate logic. It divides the specification of a hypermedia application into three main parts: its conceptual schema, which describes the domain-specific data and its relationships; its structural schema, which describes how this data is combined and gathered into more complex entities, called composites; and the perspective schema, which uses Abstract Design Perspectives (artifacts unique to Hadez) to indicate how these composites are mapped to hyperpages, and how the user interacts with them. Hadez provides a formal framework in which properties of a specification can be specified and answered. The specification of an application should not constrain its implementation and, therefore, it is independent of the platform in which the application is to be presented. As a consequence, the same design can be instantiated into different applications, each for a different hypermedia platform. Hadez can be further extended with design patterns. Patterns enable reuse by capturing good solutions to well-known problems. Hadez characterizes patterns and makes their use readily available to the designer. Furthermore, Hadez is process independent, and is intended to be used with any of the main hypermedia design methodologies: EROM, HDM, OOHDM or RMM.Item An Approximation Algorithm for Character Compatibility and Fast Quartet-based Phylogenetic Tree Comparison(University of Waterloo, 2000) Tsang, JohnPhylogenetic analysis, or the inference of evolutionary history is done routinely by biologists and is one of the most important problems in systematic biology. In this thesis, we study two computational problems in the area. First, we study the evolutionary tree reconstruction problem under the character compatibility (CC) paradigm and give a polynomial time approximation scheme (PTAS) for a variation of the formulation called fractional character compatibility (FCC), which has been proven to be NP-hard. We also present a very simple algorithm called the Ordinal Split Method (OSM) to generate bipartitions given sequence data, which can be served as a front-end to the PTAS. The performance of the OSM and the validity of the FCC formulation are studied through simulation experiments. The second part of this thesis presents an efficient algorithm to compare evolutionary trees using the quartet metric. Different evolutionary hypothesis arises when different data sets are used or when different tree inference methods are applied to the same data set. Tree comparisons are routinely done by biologists to evaluate the quality of their tree inference experiments. The quartet metric has many desirable properties but its use has been hindered by its relatively heavy computational requirements. We address this problem by giving the first O(n^2) time algorithm to compute the quartet distance between two evolutionary trees.Item Static Conflict Analysis of Transaction Programs(University of Waterloo, 2000) Zhang, ConnieTransaction programs are comprised of read and write operations issued against the database. In a shared database system, one transaction program conflicts with another if it reads or writes data that another transaction program has written. This thesis presents a semi-automatic technique for pairwise static conflict analysis of embedded transaction programs. The analysis predicts whether a given pair of programs will conflict when executed against the database. There are several potential applications of this technique, the most obvious being transaction concurrency control in systems where it is not necessary to support arbitrary, dynamic queries and updates. By analyzing transactions in such systems before the transactions are run, it is possible to reduce or eliminate the need for locking or other dynamic concurrency control schemes.Item A Framework for Machine-Assisted Software Architecture Validation(University of Waterloo, 2000) Lichtner, KurtIn this thesis we propose a formal framework for specifying and validating properties of software system architectures. The framework is founded on a model of software architecture description languages (ADLs) and uses a theorem-proving based approach to formally and mechanically establish properties of architectures. Our approach allows models defined using existing ADLs to be validated against properties that may not be expressible using the original notation and tool-set. The central component of the framework is a conceptual model of architecture description languages. The model formalizes a salient, shared set of design categories, relationships and constraints that are fundamental to these notations. An advantage of an approach based on a conceptual model is that it provides a uniform view of design information across a selection of languages. This allows us to construct alternate formal representations of design information specified using existing ADLs. These representations can then be mechanically validated to ensure they meet their specific formal requirements. After defining the model we embed it in the logic of the PVS theorem-proving environment and illustrate its utility with a case study. We first demonstrate how the elements of a design are specified using the model, and then show how this representation is validated using machine-assisted proof. Our approach allows the correctness of a design to be established against a wide range of properties. We illustrate with structural properties, behavioural properties, relationships between the structural and behavioural specification, and dynamic, or evolving aspects of a system's topology.Item A Scalable Partial-Order Data Structure for Distributed-System Observation(University of Waterloo, 2001) Ward, PaulDistributed-system observation is foundational to understanding and controlling distributed computations. Existing tools for distributed-system observation are constrained in the size of computation that they can observe by three fundamental problems. They lack scalable information collection, scalable data-structures for storing and querying the information collected, and scalable information-abstraction schemes. This dissertation addresses the second of these problems. Two core problems were identified in providing a scalable data structure. First, in spite of the existence of several distributed-system-observation tools, the requirements of such a structure were not well-defined. Rather, current tools appear to be built on the basis of events as the core data structure. Events were assigned logical timestamps, typically Fidge/Mattern, as needed to capture causality. Algorithms then took advantage of additional properties of these timestamps that are not explicit in the formal semantics. This dissertation defines the data-structure interface precisely, and goes some way toward reworking algorithms in terms of that interface. The second problem is providing an efficient, scalable implementation for the defined data structure. The key issue in solving this is to provide a scalable precedence-test operation. Current tools use the Fidge/Mattern timestamp for this. While this provides a constant-time test, it requires space per event equal to the number of processes. As the number of processes increases, the space consumption becomes sufficient to affect the precedence-test time because of caching effects. It also becomes problematic when the timestamps need to be copied between processes or written to a file. Worse, existing theory suggested that the space-consumption requirement of Fidge/Mattern timestamps was optimal. In this dissertation we present two alternate timestamp algorithms that require substantially less space than does the Fidge/Mattern algorithm.Item Cache Oblivious Data Structures(University of Waterloo, 2001) Ohashi, DarinThis thesis discusses cache oblivious data structures. These are structures which have good caching characteristics without knowing Z, the size of the cache, or L, the length of a cache line. Since the structures do not require these details for good performance they are portable across caching systems. Another advantage of such structures isthat the caching results hold for every level of cache within a multilevel cache. Two simple data structures are studied; the array used for binary search and the linear list. As well as being cache oblivious, the structures presented in this thesis are space efficient, requiring little additional storage. We begin the discussion with a layout for a search tree within an array. This layout allows Searches to be performed in O(log n) time and in O(log n/log L) (the optimal number) cache misses. An algorithm for building this layout from a sorted array in linear time is given. One use for this layout is a heap-like implementation of the priority queue. This structure allows Inserts, Heapifies and ExtractMaxes in O(log n) time and O(log nlog L) cache misses. A priority queue using this layout can be builtfrom an unsorted array in linear time. Besides the n spaces required to hold the data, this structure uses a constant amount of additional storage. The cache oblivious linear list allows scans of the list taking Theta(n) time and incurring Theta(n/L) (the optimal number) cache misses. The running time of insertions and deletions is not constant, however it is sub-polynomial. This structure requires e*n additional storage, where e is any constant greater than zero.Item Automated Analysis of Unified Modeling Language (UML) Specifications(University of Waterloo, 2001) Tanuan, Meyer C.The Unified Modeling Language (UML) is a standard language adopted by the Object Management Group (OMG) for writing object-oriented (OO) descriptions of software systems. UML allows the analyst to add class-level and system-level constraints. However, UML does not describe how to check the correctness of these constraints. Recent studies have shown that Symbolic Model Checking can effectively verify large software specifications. In this thesis, we investigate how to use model checking to verify constraints of UML specifications. We describe the process of specifying, translating and verifying UML specifications for an elevator example. We use the Cadence Symbolic Model Verifier (SMV) to verify the system properties. We demonstrate how to write a UML specification that can be easily translated to SMV. We propose a set of rules and guidelines to translate UML specifications to SMV, and then use these to translate a non-trivial UML elevator specification to SMV. We look at errors detected throughout the specification, translation and verification process, to see how well they reveal errors, ambiguities and omissions in the user requirements.Item Modeling Protein Secondary Structure by Products of Dependent Experts(University of Waterloo, 2001) Cumbaa, ChristianA phenomenon as complex as protein folding requires a complex model to approximate it. This thesis presents a bottom-up approach for building complex probabilistic models of protein secondary structure by incorporating the multiple information sources which we call experts. Expert opinions are represented by probability distributions over the set of possible structures. Bayesian treatment of a group of experts results in a consensus opinion that combines the experts' probability distributions using the operators of normalized product, quotient and exponentiation. The expression of this consensus opinion simplifiesto a product of the expert opinions with two assumptions: (1) balanced training of experts, i. e. , uniform prior probability over all structures, and (2) conditional independence between expert opinions,given the structure. This research also studies how Markov chains and hidden Markov models may be used to represent expert opinion. Closure properties areproven, and construction algorithms are given for product of hidden Markov models, and product, quotient and exponentiation of Markovchains. Algorithms for extracting single-structure predictions from these models are also given. Current product-of-experts approaches in machine learning are top-down modeling strategies that assume expert independence, and require simultaneous training of all experts. This research describes a bottom-up modeling strategy that can incorporate conditionally dependent experts, and assumes separately trained experts.Item Reliable Transport Performance in Mobile Environments(University of Waterloo, 2001) McSweeney, MartinExpanding the global Internet to include mobile devices is an exciting area of current research. Because of the vast size of the Internet, and because the protocols in it are already widely deployed, mobile devices must inter-operate with those protocols. Although most of the incompatiblities with mobiles have been solved, the protocols that deliver data reliably, and that account for the majority of Internet traffic, perform very poorly. A change in location causes a disruption in traffic, and disruption is dealt with by algorithms tailored only for stationary hosts. The Transmission Control Protocol (TCP) is the predominant transport-layer protocol in the Internet. In this thesis, we look at the performance of TCP in mobile environments. We provide a complete explanation for poor performance; we conduct a large number of experiments, simulations, and analyses that prove and quantify poor performance;and we propose simple and scalable solutions that address the limitations.Item Multi-dimensional Interval Routing Schemes(University of Waterloo, 2001) Ganjali, YasharRouting messages between pairs of nodes is one of the most fundamental tasks in any distributed computing system. An Interval Routing Scheme (IRS) is a well-known, space-efficient routing strategy for routing messages in a network. In this scheme, each node of the network is assigned an integer label and each link at each node is labeled with an interval. The interval assigned to a link l at a node v indicates the set of destination addresses of the messages which should be forwarded through l at v. When studying interval routing schemes, there are two main problems to be considered: a) Which classes of networks do support a specific routing scheme? b) Assuming that a given network supports IRS, how good are the paths traversed by messages? The first problem is known as the characterization problem and has been studied for several types of IRS. In this thesis, we study the characterization problem for various schemes in which the labels assigned to the vertices are d-ary integer tuples (d-dimensional IRS) and the label assigned to each link of the network is a list of d 1-dimensional intervals. This is known as Multi-dimensional IRS (MIRS) and is an extension of the the original IRS. We completely characterize the class of network which support MIRS for linear (which has no cyclic intervals) and strict (which has no intervals assigned to a link at a node v containing the label of v) MIRS. In real networks usually the costs of links may vary over time (dynamic cost links). We also give a complete characterization for the class of networks which support a certain type of MIRS which routes all messages on shortest paths in a network with dynamic cost links. The main criterion used to measure the quality of routing (the second problem) is the length of routing paths. In this thesis we also investigate this problem for MIRS and prove two lower bounds on the length of the longest routing path. These are the only known general results for MIRS. Finally, we study the relationship between various types of MIRS and the problem of drawing a hypergraph. Using some of our results we prove a tight bound on the number of dimensions of the space needed to draw a hypergraph.Item Topic-Oriented Collaborative Web Crawling(University of Waterloo, 2001) Chung, ChiasenA web crawler is a program that "walks" the Web to gather web resources. In order to scale to the ever-increasing Web, multiple crawling agents may be deployed in a distributed fashion to retrieve web data co-operatively. A common approach is to divide the Web into many partitions with an agent assigned to crawl within each one. If an agent obtains a web resource that is not from its partition, the resource will be transferred to the rightful owner. This thesis proposes a novel approach to distributed web data gathering by partitioning the Web into topics. The proposed approach employs multiple focused crawlers to retrieve pages from various topics. When a crawler retrieves a page of another topic, it transfers the page to the appropriate crawler. This approach is known as topic-oriented collaborative web crawling. An implementation of the system was built and experimentally evaluated. In order to identify the topic of a web page, a topic classifier was incorporated into the crawling system. As the classifier categorizes only English pages, a language identifier was also introduced to distinguish English pages from non-English ones. From the experimental results, we found that redundance retrieval was low and that a resource, retrieved by an agent, is six times more likely to be retained than a system that uses conventional hashing approach. These numbers were viewed as strong indications that topic-oriented collaborative web crawling system is a viable approach to web data gathering.Item Maintaining Quality of Service for Adaptive Mobile Map Clients(University of Waterloo, 2001) Abdelsalam, Wegdan Ahmad Elsay FouadMobile devices must deal with limited and dynamically varying resources, in particular, the network quality of service (QoS). In addition, wireless devices have other constraints such as limited memory, battery power, and physical dimensions. Applications that execute in such environments need to adapt to the dynamic operating conditions in order to preserve an acceptable level of service as close to 100% of the time as possible. Viewing and downloading digital spatial data on mobile devices has become more popular, especially with the availability of location-aware applications that exploit GPS (Global Positioning System) receivers integrated in many of today's mobile devices. Map client applications face many challenges in accessing data across a wireless network. Vector spatial data files tend to be large, and file sizes tend to increase unpredictably depending on the complexity of feature geometry. Due to the limited size of the mobile device display, viewing all the details of the map could cause unreasonable clutter and render the map useless. Even if it is feasible to transmit all the details from a QoS standpoint, this could pose a problem from a usability standpoint. This research effort aims to tackle the issues of QoS and usability on mobile devices through a client-proxy-server model where clients are on mobile devices. The proxy performs two functions. First, it supplies the client with vital data about the status of the system that allows the client to take adaptive decisions aimed at maintaining the QoS. Second, it performs the adaptive actions requested by the client. There are two types of adaptive actions performed by the proxy, activating and deactivating filters. When filters are activated, the amount of data transmitted from the server to the client is reduced. The client may decide to activate one or more filters either to maintain QoS or to limit clutter on the screen and enhance usability. The map client-server application and the proxy were developed in Java (tm), and a number of experiments and real-life scenarios were designed to determine the effectiveness and feasibility of the proposed adaptation model and to evaluate the performance of the proxy.Item Folding and Unfolding(University of Waterloo, 2001) Demaine, ErikThe results of this thesis concern folding of one-dimensional objects in two dimensions: planar linkages. More precisely, a planar linkage consists of a collection of rigid bars (line segments) connected at their endpoints. Foldings of such a linkage must preserve the connections at endpoints, preserve the bar lengths, and (in our context) prevent bars from crossing. The main result of this thesis is that a planar linkage forming a collection of polygonal arcs and cycles can be folded so that all outermost arcs (not enclosed by other cycles) become straight and all outermost cycles become convex. A complementary result of this thesis is that once a cycle becomes convex, it can be folded into any other convex cycle with the same counterclockwise sequence of bar lengths. Together, these results show that the configuration space of all possible foldings of a planar arc or cycle linkage is connected. These results fall into the broader context of folding and unfolding k-dimensional objects in n-dimensional space, k less than or equal to n. Another contribution of this thesis is a survey of research in this field. The survey revolves around three principal aspects that have received extensive study: linkages in arbitrary dimensions (folding one-dimensional objects in two or more dimensions, including protein folding), paper folding (normally, folding two-dimensional objects in three dimensions), and folding and unfolding polyhedra (two-dimensional objects embedded in three-dimensional space).Item Shortest Path Queries in Very Large Spatial Databases(University of Waterloo, 2001) Zhang, NingFinding the shortest paths in a graph has been studied for a long time, and there are many main memory based algorithms dealing with this problem. Among these, Dijkstra's shortest path algorithm is one of the most commonly used efficient algorithms to the non-negative graphs. Even more efficient algorithms have been developed recently for graphs with particular properties such as the weights of edges fall into a range of integer. All of the mentioned algorithms require the graph totally reside in the main memory. Howevery, for very large graphs, such as the digital maps managed by Geographic Information Systems (GIS), the requirement cannot be satisfied in most cases, so the algorithms mentioned above are not appropriate. My objective in this thesis is to design and evaluate the performance of external memory (disk-based) shortest path algorithms and data structures to solve the shortest path problem in very large digital maps. In particular the following questions are studied:What have other researchers done on the shortest path queries in very large digital maps?What could be improved on the previous works? How efficient are our new shortest paths algorithms on the digital maps, and what factors affect the efficiency? What can be done based on the algorithm? In this thesis, we give a disk-based Dijkstra's-like algorithm to answer shortest path queries based on pre-processing information. Experiments based on our Java implementation are given to show what factors affect the running time of our algorithms.Item Quantitative Testing of Probabilistic Phase Unwrapping Methods(University of Waterloo, 2001) Moran, JodiThe reconstruction of a phase surface from the observed principal values is required for a number of applications, including synthetic aperture radar (SAR) and magnetic resonance imaging (MRI). However, the process of reconstruction, called “phase unwrapping”, is an ill-posed problem. One class of phase-unwrapping algorithms uses smoothness prior models to remedy this situation. We categorize this class of algorithms according to the type of prior model used. Motivated by this categorization, we propose that phase-unwrapping algorithms be tested by generating phase surfaces from the prior models, and then quantifying the deviation of each reconstructed surface from the corresponding original surface. Finally, we present results of the new testing method on a selection of phase-unwrapping algorithms, including a new algorithm.Item COPIA: A New Software for Finding Consensus Patterns in Unaligned Protein Sequences(University of Waterloo, 2001) Liang, ChengzhiConsensus pattern problem (CPP) aims at finding conserved regions, or motifs, in unaligned sequences. This problem is NP-hard under various scoring schemes. To solve this problem for protein sequences more efficiently,a new scoring scheme and a randomized algorithm based on substitution matrix are proposed here. Any practical solutions to a bioinformatics problem must observe twoprinciples: (1) the problem that it solves accurately describes the real problem; in CPP, this requires the scoring scheme be able to distinguisha real motif from background; (2) it provides an efficient algorithmto solve the mathematical problem. A key question in protein motif-finding is how to determine the motif length. One problem in EM algorithms to solve CPP is how to find good startingpoints to reach the global optimum. These two questions were both well addressed under this scoring scheme,which made the randomized algorithm both fast and accurate in practice. A software, COPIA (COnsensus Pattern Identification and Analysis),has been developed implementing this algorithm. Experiments using sequences from the von Willebrand factor (vWF)familyshowed that it worked well on finding multiple motifs and repeats. COPIA's ability to find repeats makes it also useful in illustrating the internal structures of multidomain proteins. Comparative studies using several groups of protein sequences demonstrated that COPIA performed better than the commonly used motif-finding programs.