Computer Science
Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9930
This is the collection for the University of Waterloo's Cheriton School of Computer Science.
Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).
Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.
Browse
Browsing Computer Science by Title
Now showing 1 - 20 of 1617
- Results Per Page
- Sort Options
Item A 2-Approximation for the Height of Maximal Outerplanar Graph Drawings(University of Waterloo, 2016-08-18) Demontigny, PhilippeIn this thesis, we study drawings of maximal outerplanar graphs that place vertices on integer coordinates. We introduce a new class of graphs, called umbrellas, and a new method of splitting maximal outerplanar graphs into systems of umbrellas. By doing so, we generate a new graph parameter, called the umbrella depth (ud), that can be used to approximate the optimal height of a drawing of a maximal outerplanar graph. We show that for any maximal outerplanar graph G, we can create a flat visibility representation of G with height at most 2ud(G) + 1. This drawing can be transformed into a straight-line drawing of the same height. We then prove that the height of any drawing of G is at least ud(G) + 1, which makes our result a 2-approximation for the optimal height. The best previously known approximation algorithm gave a 4-approximation. In addition, we provide an algorithm for finding the umbrella depth of G in linear time. Lastly, we compare the umbrella depth to other graph parameters such as the pathwidth and the rooted pathwidth, which have been used in the past for outerplanar graph drawing algorithms.Item 3-D Reconstruction from Single Projections, with Applications to Astronomical Images(University of Waterloo, 2013-08-23T18:52:01Z) Cormier, MichaelA variety of techniques exist for three-dimensional reconstruction when multiple views are available, but less attention has been given to reconstruction when only a single view is available. Such a situation is normal in astronomy, when a galaxy (for example) is so distant that it is impossible to obtain views from significantly different angles. In this thesis I examine the problem of reconstructing the three-dimensional structure of a galaxy from this single viewpoint. I accomplish this by taking advantage of the image formation process, symmetry relationships, and other structural assumptions that may be made about galaxies. Most galaxies are approximately symmetric in some way. Frequently, this symmetry corresponds to symmetry about an axis of rotation, which allows strong statements to be made about the relationships between luminosity at each point in the galaxy. It is through these relationships that the number of unknown values needed to describe the structure of the galaxy can be reduced to the number of constraints provided by the image so the optimal reconstruction is well-defined. Other structural properties can also be described under this framework. I provide a mathematical framework and analyses that prove the uniqueness of solutions under certain conditions and to show how uncertainty may be precisely and explicitly expressed. Empirical results are shown using real and synthetic data. I also show a comparison to a state-of-the-art two-dimensional modelling technique to demonstrate the contrasts between the two frameworks and show the important advantages of the three-dimensional approach. In combination, the theoretical and experimental aspects of this thesis demonstrate that the proposed framework is versatile, practical, and novel---a contribution to both computer science and astronomy.Item 3D Online Multi-Object Tracking for Autonomous Driving(University of Waterloo, 2019-08-29) Balasubramanian, VenkateshwaranThis research work focuses on exploring a novel 3D multi-object tracking architecture: 'FANTrack: 3D Multi-Object Tracking with Feature Association Network' for autonomous driving, based on tracking by detection and online tracking strategies using deep learning architectures for data association. The problem of multi-target tracking aims to assign noisy detections to a-priori unknown and time-varying number of tracked objects across a sequence of frames. A majority of the existing solutions focus on either tediously designing cost functions or formulating the task of data association as a complex optimization problem that can be solved effectively. Instead, we exploit the power of deep learning to formulate the data association problem as inference in a CNN. To this end, we propose to learn a similarity function that combines cues from both image and spatial features of objects. The proposed approach consists of a similarity network that predicts the similarity scores of the object pairs and builds a local similarity map. Another network formulates the data association problem as inference in a CNN by using the similarity scores and spatial information. The model learns to perform global assignments in 3D purely from data, handles noisy detections and a varying number of targets, and is easy to train. Experiments on the challenging Kitti dataset show competitive results with the state of the art. The model is finally implemented in ROS and deployed on our autonomous vehicle to show the robustness and online tracking capabilities. The proposed tracker runs alongside the object detector utilizing the resources efficiently.Item 3D Pointing with Everyday Devices: Speed, Occlusion, Fatigue(University of Waterloo, 2015-07-24) Pietroszek, KrzysztofIn recent years, display technology has evolved to the point where displays can be both non-stereoscopic and stereoscopic, and 3D environments can be rendered realistically on many types of displays. From movie theatres and shopping malls to conference rooms and research labs, 3D information can be deployed seamlessly. Yet, while 3D environments are commonly displayed in desktop settings, there are virtually no examples of interactive 3D environments deployed within ubiquitous environments, with the exception of console gaming. At the same time, immersive 3D environments remain - in users' minds - associated with professional work settings and virtual reality laboratories. An excellent opportunity for 3D interactive engagements is being missed not because of economic factors, but due to the lack of interaction techniques that are easy to use in ubiquitous, everyday environments. In my dissertation, I address the lack of support for interaction with 3D environments in ubiquitous settings by designing, implementing, and evaluating 3D pointing techniques that leverage a smartphone or a smartwatch as an input device. I show that mobile and wearable devices may be especially beneficial as input devices for casual use scenarios, where specialized 3D interaction hardware may be impractical, too expensive or unavailable. Such scenarios include interactions with home theatres, intelligent homes, in workplaces and classrooms, with movie theatre screens, in shopping malls, at airports, during conference presentations and countless other places and situations. Another contribution of my research is to increase the potential of mobile and wearable devices for efficient interaction at a distance. I do so by showing that such interactions are feasible when realized with the support of a modern smartphone or smartwatch. I also show how multimodality, when realized with everyday devices, expands and supports 3D pointing. In particular, I show how multimodality helps to address the challenges of 3D interaction: performance issues related to the limitations of the human motor system, interaction with occluded objects and related problem of perception of depth on non-stereoscopic screens, and user subjective fatigue, measured with NASA TLX as perceived workload, that results from providing spatial input for a prolonged time. I deliver these contributions by designing three novel 3D pointing techniques that support casual, "walk-up-and-use" interaction at a distance and are fully realizable using off-the-shelf mobile and wearable devices available today. The contributions provide evidence that democratization of 3D interaction can be realized by leveraging the pervasiveness of a device that users already carry with them: a smartphone or a smartwatch.Item 5G RAN/MEC Slicing and Admission Control using Deep Reinforcement Learning(University of Waterloo, 2023-01-19) Moayyedi, ArashThe 5G RAN functions can be virtualized and distributed across the radio unit (RU), distributed unit (DU), and centralized unit (CU) to facilitate flexible resource management. Complemented by multi-access edge computing (MEC), these components create network slices tailored for applications with diverse quality of service (QoS) requirements. However, as the requests for various slices arrive dynamically over time and the network resources are limited, it is non-trivial for an infrastructure provider (InP) to optimize its long-term revenue from real-time admission and embedding of slice requests. Prior works have leveraged Deep Reinforcement Learning (DRL) to address this problem, however, these solutions either do not scale to realistic topologies, require re-training of the DRL agents when facing topology changes, or do not consider the slice admission and embedding problems jointly. In this thesis, we use multi-agent DRL and Graph Attention Networks (GATs) to address these limitations. Specifically, we propose novel topology-independent admission and slicing agents that are scalable and generalizable to large and different metropolitan networks. Results show that the proposed approach converges faster and achieves up to 35.2% and 20% gain in revenue compared to heuristics and other DRL-based approaches, respectively. Additionally, we demonstrate that our approach is generalizable to scenarios and substrate networks previously unseen during training, as it maintains superior performance without re-training or re-tuning. Finally, we extract the attention maps of the GAT, and analyze them to detect potential bottlenecks and efficiently improve network performance and InP revenue through eliminating them.Item A Longitudinal Analysis Of Replicas in the Wild Wild Android(University of Waterloo, 2024-09-24) Abbas Zaidi, Syeda MashalIn this thesis, we report and study a phenomenon that contributes to Android API sprawls. We observe that OEM developers introduce private APIs that are composed by copy-paste-editing full or partial code from AOSP and other OEM APIs – we call such APIs, Replicas. To quantify the prevalence of Replicas in the wildly fragmented Android ecosystem, we perform the first large-scale (security) measurement study, aiming at detecting and evaluating Replicas across 342 ROMs, manufactured by 10 vendors and spanning 7 versions. Our study is motivated by the intuition that Replicas contribute to the production of bloated custom Android codebases, add to the complexity of the Android access control mechanism and updates process, and hence may lead to access control vulnerabilities. Our study is facilitated by RepFinder, a tool we develop. It infers the core functionality of an API and detects syntactically and semantically similar APIs using static program paths. RepFinder reveals that Replicas are commonly introduced by OEMs and more importantly, they unnecessarily introduce security enforcement anomalies. Specifically, RepFinder reports an average of 141 Replicas per the studied ROMs, accounting for 9% to 17% of custom APIs – where 37% (on average) are identified as under-protected. Our study thus points to the urgent need to debloat Replicas.Item A Security Analysis of the Multi-User Ecosystem in Android Framework(University of Waterloo, 2024-10-23) Khan, Muhammad Shahpar NafeesThe Android framework’s multi-user ecosystem introduces significant security challenges, particularly in the enforcement of user-specific access control checks. While previous research has highlighted flaws in Android’s access control mechanism, these efforts often overlook the complexities introduced by vendor customization and the unique demands of a multi-user environment. In this thesis, we conduct a systematic analysis of the Android Open Source Project (AOSP), identifying key patterns regulating multi-user access control implementations. We use these patterns to develop MVP, a static analysis tool that examines vendor ROMs for missing user-specific access control checks in custom ROMs. For example, our analysis reveals that Android’s multi-user environment is susceptible to cross-user attacks; sensitive data can be shared between profiles, and non-privileged users can manipulate privileged system settings. These findings underscore the need for rigorous enforcement of access control mechanisms to mitigate security risks in Android’s multi-user environment.Item A+ Indexes: Highly Flexible Adjacency Lists in Graph Database Management Systems(University of Waterloo, 2019-09-17) Khaliq, ShahidAdjacency lists are the most fundamental storage structure in existing graph database management systems (GDBMSs) to index input graphs. Adjacency lists are universally linked-list like per-vertex structures that allow access to a set of edges that are all adjacent to a vertex. In several systems, adjacency lists can also allow efficient access to subsets of a vertex’s adjacent edges that satisfy a fixed set of predicates, such as those that have the same label, and support a fixed set of ordering criteria, such as sorting by the ID of destination vertices of the edges. This thesis describes a highly-flexible indexing subsystem for GDBMSs, which consists of two components. The primary component called A+ indexes store adjacency lists, which compared to existing adjacency lists, provide flexibility to users in three aspects: (1) in addition to per-vertex adjacency lists, users can define per-edge adjacency lists; (2) users can define adjacency lists for sets of edges that satisfy a wide range of predicates; and (3) provide flexible sorting criteria. Indexes in existing GDBMS, such as adjacency list, B+ tree, or hash indexes, index as elements the vertices or edges in the input graph. The second component of our indexing sub-system is secondary B+ tree and bitmap indexes that index aggregate properties of adjacency lists in A+ indexes. Therefore, our secondary indexes effectively index adjacency lists as elements. We have implemented our indexing sub-system on top of the Graphflow GDBMS. We describe our indexes, the modifications we had to do to Graphflow’s optimizer, and our implementation. We provide extensive experiments demonstrating both the flexibility and efficiency of our indexes on a large suite of queries from several application domains.Item Accelerating and Privatizing Diffusion Models(University of Waterloo, 2023-08-17) Dockhorn, TimDiffusion models (DMs) have emerged as a powerful class of generative models. DMs offer both state-of-the-art synthesis quality and sample diversity in combination with a robust and scalable learning objective. DMs rely on a diffusion process that gradually perturbs the data towards a normal distribution, while the neural network learns to denoise. Formally, the problem reduces to learning the score function, i.e., the gradient of the log-density of the perturbed data. The reverse of the diffusion process can be approximated by a differential equation, defined by the learned score function, and can therefore be used for generation when starting from random noise. In this thesis, we give a thorough and beginner-friendly introduction to DMs and discuss their history starting from early work on score-based generative models. Furthermore, we discuss connections to other statistical models and lay out applications of DMs, with a focus on image generative modeling. We then present CLD: a new DM based on critically-damped Langevin dynamics. CLD can be interpreted as running a joint diffusion in an extended space, where the auxiliary variables can be considered "velocities" that are coupled to the data variables as in Hamiltonian dynamics. We derive a novel score matching objective for CLD-based DMs and introduce a fast solver for the reverse diffusion process which is inspired by methods from the statistical mechanics literature. The CLD framework provides new insights into DMs and generalizes many existing DMs which are based on overdamped Langevin dynamics. Next, we present GENIE, a novel higher-order numerical solver for DMs. Many existing higher-order solvers for DMs built on finite difference schemes which break down in the large step size limit as approximations become too crude. GENIE, on the other hand, learns neural network-based models for higher-order derivatives whose precision do not depend on the step size. The additional networks in GENIE are implemented as small output heads on top of the neural backbone of the original DM, keeping the computational overhead minimal. Unlike recent sampling distillation methods that fundamentally alter the generation process in DMs, GENIE still solves the true generative differential equation, and therefore naturally enables applications such as encoding and guided sampling. The fourth chapter presents differentially private diffusion models (DPDMs), DMs trained with strict differential privacy guarantees. While modern machine learning models rely on increasingly large training datasets, data is often limited in privacy-sensitive domains. Generative models trained on sensitive data with differential privacy guarantees can sidestep this challenge, providing access to synthetic data instead. DPDMs enforce privacy by using differentially private stochastic gradient descent for training. We thoroughly study the design space of DPDMs and propose noise multiplicity, a simple yet powerful modification of the DM training objective tailored to the differential privacy setting. We motivate and show numerically why DMs are better suited for differentially private generative modeling than one-shot generators such as generative adversarial networks or normalizing flows. Finally, we propose to distill the knowledge of large pre-trained DMs into smaller student DMs. Large-scale DMs have achieved unprecedented results across several domains, however, they generally require a large amount of GPU memory and are slow at inference time, making it difficult to deploy them in real-time or on resource-limited devices. In particular, we propose an approximate score matching objective that regresses the student model towards predictions of the teacher DM rather than the clean data as is done in standard DM training. We show that student models outperform the larger teacher model for a variety of compute budgets. Additionally, the student models may also be deployed on GPUs with significantly less memory than was required for the original teacher model.Item Accelerating the Training of Convolutional Neural Networks for Image Segmentation with Deep Active Learning(University of Waterloo, 2020-01-23) Chen, Wei TaoImage semantic segmentation is an important problem in computer vision. However, Training a deep neural network for semantic segmentation in supervised learning requires expensive manual labeling. Active learning (AL) addresses this problem by automatically selecting a subset of the dataset to label and iteratively improve the model. This minimizes labeling costs while maximizing performance. Yet, deep active learning for image segmentation has not been systematically studied in the literature. This thesis offers three contributions. First, we compare six different state-of-the-art querying methods, including uncertainty, Bayesian, and out-of-distribution methods, in the context of active learning for image segmentation. The comparison uses the standard dataset Cityscapes, as well as randomly generated data, and the state-of-the-art image segmentation architecture DeepLab. Our results demonstrate subtle but robust differences between the querying methods, which we analyze and explain. Second, we propose a novel way to query images by counting the number of pixels with acquisition values above a certain threshold. Our counting method outperforms the standard averaging method. Lastly, we demonstrate that the previous two findings remain consistent for both whole images and image crops. Furthermore, we provide an in-depth discussion of deep active learning and results from supplementary experiments. First, we studied active learning in the context of image classification with the MNIST dataset. We observed an interesting phenomenon where active learning querying methods perform worse than random sampling in the early cycles but overtake random sampling at a break-even point. This break-even point can be controlled by varying model capacity, sample diversity, and temperature scaling. The difference in performances of the six querying methods is larger than in the case of image segmentation. Second, we attempt to explore the theoretical optimal query by querying samples with the lowest accuracy and querying with a trained expert model. Although they turned out to be suboptimal, their results would hopefully shed light on the subject. Lastly, we present the experiment results from using SegNet and FCN. With these architectures, our querying methods did not perform any better than random sampling. Nevertheless, those negative results demonstrate some of the difficulties of active learning for image segmentation.Item Access Control Administration with Adjustable Decentralization(University of Waterloo, 2007-09-12T15:50:52Z) Chinaei, Amir HosseinAccess control is a key function of enterprises that preserve and propagate massive data. Access control enforcement and administration are two major components of the system. On one hand, enterprises are responsible for data security; thus, consistent and reliable access control enforcement is necessary although the data may be distributed. On the other hand, data often belongs to several organizational units with various access control policies and many users; therefore, decentralized administration is needed to accommodate diverse access control needs and to avoid the central bottleneck. Yet, the required degree of decentralization varies within different organizations: some organizations may require a powerful administrator in the system; whereas, some others may prefer a self-governing setting in which no central administrator exists, but users fully manage their own data. Hence, a single system with adjustable decentralization will be useful for supporting various (de)centralized models within the spectrum of access control administration. Giving individual users the ability to delegate or grant privileges is a means of decentralizing access control administration. Revocation of arbitrary privileges is a means of retaining control over data. To provide flexible administration, the ability to delegate a specific privilege and the ability to revoke it should be held independently of each other and independently of the privilege itself. Moreover, supporting arbitrary user and data hierarchies, fine-grained access control, and protection of both data (end objects) and metadata (access control data) with a single uniform model will provide the most widely deployable access control system. Conflict resolution is a major aspect of access control administration in systems. Resolving access conflicts when deriving effective privileges from explicit ones is a challenging problem in the presence of both positive and negative privileges, sophisticated data hierarchies, and diversity of conflict resolution strategies. This thesis presents a uniform access control administration model with adjustable decentralization, to protect both data and metadata. There are several contributions in this work. First, we present a novel mechanism to constrain access control administration for each object type at object creation time, as a means of adjusting the degree of decentralization for the object when the system is configured. Second, by controlling the access control metadata with the same mechanism that controls the users’ data, privileges can be granted and revoked to the extent that these actions conform to the corporation’s access control policy. Thus, this model supports a whole spectrum of access control administration, in which each model is characterized as a network of access control states, similar to a finite state automaton. The model depends on a hierarchy of access banks of authorizations which is supported by a formal semantics. Within this framework, we also introduce the self-governance property in the context of access control, and show how the model facilitates it. In particular, using this model, we introduce a conflict-free and decentralized access control administration model in which all users are able to retain complete control over their own data while they are also able to delegate any subset of their privileges to other users or user groups. We also introduce two measures to compare any two access control models in terms of the degrees of decentralization and interpretation. Finally, as the conflict resolution component of access control models, we incorporate a unified algorithm to resolve access conflicts by simultaneously supporting several combined strategies.Item Accurate viscous free surfaces for buckling, coiling, and rotating liquids(Association for Computing Machinery, 2008-07) Batty, Christopher; Bridson, RobertWe present a fully implicit Eulerian technique for simulating free surface viscous liquids which eliminates artifacts in previous approaches, efficiently supports variable viscosity, and allows the simulation of more compelling viscous behaviour than previously achieved in graphics. Our method exploits a variational principle which automatically enforces the complex boundary condition on the shear stress at the free surface, while giving rise to a simple discretization with a symmetric positive definite linear system. We demonstrate examples of our technique capturing realistic buckling, folding and coiling behavior. In addition, we explain how to handle domains whose boundary comprises both ghost fluid Dirichlet and variational Neumann parts, allowing correct behaviour at free surfaces and solid walls for both our viscous solve and the variational pressure projection of Batty et al. [BBB07].Item Achieving Performance Objectives for Database Workloads(University of Waterloo, 2010-08-30T21:22:49Z) Mallampalli, AnushaIn this thesis, our goal is to achieve customer-specified performance objectives for workloads in a database management system (DBMS). Competing workloads in current DBMSs have detrimental effects on performance. Differentiated levels of service become important to ensure that critical work takes priority. We design a feedback-based admission differentiation framework, which consists of three components: workload classifier, workload monitor and adaptive admission controller. The adaptive admission controller uses the workload management capabilities of IBM DB2’s Workload Manager (WLM) to achieve the performance objectives of the most important workload by applying admission control on the rest of the work, which is less important and may or may not have performance objectives. The controller uses a feedback-based technique to automatically adjust the admission control on the less important work to achieve performance objectives for the important workload. The adaptive admission controller is implemented on an instance of DB2 to the test the effectiveness of the controller.Item Active Learning with Semi-Supervised Support Vector Machines(University of Waterloo, 2007-05-22T16:23:10Z) Chinaei, LeilaA significant problem in many machine learning tasks is that it is time consuming and costly to gather the necessary labeled data for training the learning algorithm to a reasonable level of performance. In reality, it is often the case that a small amount of labeled data is available and that more unlabeled data could be labeled on demand at a cost. If the labeled data is obtained by a process outside of the control of the learner, then the learner is passive. If the learner picks the data to be labeled, then this becomes active learning. This has the advantage that the learner can pick data to gain specific information that will speed up the learning process. Support Vector Machines (SVMs) have many properties that make them attractive to use as a learning algorithm for many real world applications including classification tasks. Some researchers have proposed algorithms for active learning with SVMs, i.e. algorithms for choosing the next unlabeled instance to get label for. Their approach is supervised in nature since they do not consider all unlabeled instances while looking for the next instance. In this thesis, we propose three new algorithms for applying active learning for SVMs in a semi-supervised setting which takes advantage of the presence of all unlabeled points. The suggested approaches might, by reducing the number of experiments needed, yield considerable savings in costly classification problems in the cases when finding the training data for a classifier is expensive.Item Active Sensing for Partially Observable Markov Decision Processes(University of Waterloo, 2013-01-21T19:46:56Z) Koltunova, VeronikaContext information on a smart phone can be used to tailor applications for specific situations (e.g. provide tailored routing advice based on location, gas prices and traffic). However, typical context-aware smart phone applications use very limited context information such as user identity, location and time. In the future, smart phones will need to decide from a wide range of sensors to gather information from in order to best accommodate user needs and preferences in a given context. In this thesis, we present a model for active sensor selection within decision-making processes, in which observational features are selected based on longer-term impact on the decisions made by the smart phone. This thesis formulates the problem as a partially observable Markov decision process (POMDP), and proposes a non-myopic solution to the problem using a state of the art approximate planning algorithm Symbolic Perseus. We have tested our method on a 3 small example domains, comparing different policy types, discount factors and cost settings. The experimental results proved that the proposed approach delivers a better policy in the situation of costly sensors, while at the same time provides the advantage of faster policy computation with less memory usage.Item Ad-hoc Holistic Ranking Aggregation(University of Waterloo, 2012-12-06T18:38:03Z) Saleeb, MinaData exploration is considered one of the major processes that enables the user to analyze massive amount of data in order to find the most important and relevant informa- tion needed. Aggregation and Ranking are two of the most frequently used tools in data exploration. The interaction between ranking and aggregation has been studied widely from different perspectives. In this thesis, a comprehensive survey about this interaction is studied. Holistic Ranking Aggregation which is a new interaction is introduced. Finally, various algorithms are proposed to efficiently process ad-hoc holistic ranking aggregation for both monotone and generic scoring functions.Item Adapting Component Analysis(University of Waterloo, 2012-05-18T17:30:35Z) Dorri, FatemehA main problem in machine learning is to predict the response variables of a test set given the training data and its corresponding response variables. A predictive model can perform satisfactorily only if the training data is an appropriate representative of the test data. This intuition is reflected in the assumption that the training data and the test data are drawn from the same underlying distribution. However, the assumption may not be correct in many applications for various reasons. For example, gathering training data from the test population might not be easily possible, due to its expense or rareness. Or, factors like time, place, weather, etc can cause the difference in the distributions. I propose a method based on kernel distribution embedding and Hilbert Schmidt Independence Criteria (HSIC) to address this problem. The proposed method explores a new representation of the data in a new feature space with two properties: (i) the distributions of the training and the test data sets are as close as possible in the new feature space, (ii) the important structural information of the data is preserved. The algorithm can reduce the dimensionality of the data while it preserves the aforementioned properties and therefore it can be seen as a dimensionality reduction method as well. Our method has a closed-form solution and the experimental results on various data sets show that it works well in practice.Item Adapting to Data Drift in Encrypted Traffic Classification Using Deep Learning(University of Waterloo, 2023-01-12) Malekghaini, NavidDeep learning models have shown to achieve high performance in encrypted traffic classification. However, when it comes to production use, multiple factors challenge the performance of these models. The emergence of new protocols, especially at the application-layer, as well as updates to previous protocols affect the patterns in input data, making the model's previously learn patterns obsolete. Furthermore, proposed model architectures are usually tested on datasets collected in controlled settings, which makes the reported performances unreliable for production use. In this thesis, we start by studying how the performances of two high-performing state-of-the-art encrypted traffic classifiers change on multiple real-world datasets collected over the course of two years from a major ISP's network, Orange telecom. We investigate the changes in traffic data patterns highlighting the extent to which these changes, a.k.a. data drift, impact the performance of the two models in service-level and application-level classification. We propose best practices to manually adapt model architectures and improve their accuracy in the face of data drift. We show that our best practices are generalizable to other encryption protocols and different levels of labeling granularity. However, designing efficient model architectures and manual architectural adaptations is time-consuming and requires domain expertise. Neural architecture search (NAS) algorithms have been shown to automatically discover efficient models in other domains, such as image recognition and natural language processing. However, NAS's application is rather unexplored in Encrypted Traffic Classification. We propose AutoML4ETC, a tool to automatically design efficient and high-performing neural architectures for Encrypted Traffic Classification, given a target dataset and corresponding features. We define three powerful search spaces tailored specifically for the prominent categories of features in the Encrypted Traffic Classification state-of-the-art, i.e., packet raw bytes, flow time-series, and flow statistics. We show that a simple search strategy over AutoML4ETC’s search spaces can generate model architectures that outperform the state-of-the-art Encrypted Traffic Classification models on several benchmark datasets, including real-world datasets of TLS and QUIC traffic collected from a major ISP network. In addition to being more accurate, the AutoML4ETC’s architectures are significantly more efficient and lighter in terms of the number of parameters. We further showcase the potential of AutoML4ETC by experimenting with state-of-the-art NAS techniques and model ensembles generated from different search spaces. We also use AutoML4ETC to analyze the state of adoption of the QUIC protocol.Item Adaptive Algorithms for Weighted Queries on Weighted Binary Relations and Labeled Trees(University of Waterloo, 2007-07-30T15:25:46Z) Veraskouski, AlehKeyword queries are extremely easy for a user to write. They have become a standard way to query for information in web search engines and most other information retrieval systems whose users are usually laypersons and might not have knowledge about the database schema or contained data. As keyword queries do not impose any structural constraints on the retrieved information, the quality of the obtained results is far from perfect. However, one can hardly improve it without changing the ways the queries are asked and the methods the information is stored in the database. The purpose of this thesis is to propose a method to improve the quality of the information retrieving by adding weights to the existing ways of keyword queries asking and information storing in the database. We consider weighted queries on two different data structures: weighted binary relations and weighted multi-labeled trees. We propose adaptive algorithms to solve these queries and prove the measures of the complexity of these algorithms in terms of the high-level operations. We describe how these algorithms can be implemented and derive the upper bounds on their complexity in two specific models of computations: the comparison model and the word-RAM model.Item Adaptive Comparison-Based Algorithms for Evaluating Set Queries(University of Waterloo, 2004) Mirzazadeh, MehdiIn this thesis we study a problem that arises in answering boolean queries submitted to a search engine. Usually a search engine stores the set of IDs of documents containing each word in a pre-computed sorted order and to evaluate a query like "computer AND science" the search engine has to evaluate the union of the sets of documents containing the words "computer" and "science". More complex queries will result in more complex set expressions. In this thesis we consider the problem of evaluation of a set expression with union and intersection as operators and ordered sets as operands. We explore properties of comparison-based algorithms for the problem. A proof of a set expression is the set of comparisons that a comparison-based algorithm performs before it can determine the result of the expression. We discuss the properties of the proofs of set expressions and based on how complex the smallest proofs of a set expression E are, we define a measurement for determining how difficult it is for E to be computed. Then, we design an algorithm that is adaptive to the difficulty of the input expression and we show that the running time of the algorithm is roughly proportional to difficulty of the input expression, where the factor is roughly logarithmic in the number of the operands of the input expression.