Testing, Learning, Sampling, Sketching

dc.contributor.authorHarms, Nathaniel
dc.date.accessioned2022-08-22T20:11:51Z
dc.date.available2022-08-22T20:11:51Z
dc.date.issued2022-08-22
dc.date.submitted2022-08-19
dc.description.abstractWe study several problems about sublinear algorithms, presented in two parts. Part I: Property testing and learning. There are two main goals of research in property testing and learning theory. The first is to understand the relationship between testing and learning, and the second is to develop efficient testing and learning algorithms. We present results towards both goals. - An oft-repeated motivation for property testing algorithms is to help with model selection in learning: to efficiently check whether the chosen hypothesis class (i.e. learning model) will successfully learn the target function. We present in this thesis a proof that, for many of the most useful and natural hypothesis classes (including halfspaces, polynomial threshold functions, intersections of halfspaces, etc.), the sample complexity of testing in the distribution-free model is nearly equal to that of learning. This shows that testing does not give a significant advantage in model selection in this setting. - We present a simple and general technique for transforming testing and learning algorithms designed for the uniform distribution over {0, 1}^d or [n]^d into algorithms that work for arbitrary product distributions over R d . This leads to an improvement and simplification of state-of-the-art results for testing monotonicity, learning intersections of halfspaces, learning polynomial threshold functions, and others. Part II. Adjacency and distance sketching for graphs. We initiate the thorough study of adjacency and distance sketching for classes of graphs. Two open problems in sublinear algorithms are: 1) to understand the power of randomization in communication; and 2) to characterize the sketchable distance metrics. We observe that constant-cost randomized communication is equivalent to adjacency sketching in a hereditary graph class, which in turn implies the existence of an efficient adjacency labeling scheme, the subject of a major open problem in structural graph theory. Therefore characterizing the adjacency sketchable graph classes (i.e. the constant-cost communication problems) is the probabilistic equivalent of this open problem, and an essential step towards understanding the power of randomization in communication. This thesis gives the first results towards a combined theory of these problems and uses this connection to obtain optimal adjacency labels for subgraphs of Cartesian products, resolving some questions from the literature. More generally, we begin to develop a theory of graph sketching for problems that generalize adjacency, including different notions of distance sketching. This connects the well-studied areas of distance sketching in sublinear algorithms, and distance labeling in structural graph theory.en
dc.identifier.urihttp://hdl.handle.net/10012/18608
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectproperty testingen
dc.subjectcommunication complexityen
dc.subjectadjacency labelingen
dc.subjectdistance sketchingen
dc.subjectlearning theoryen
dc.subjectvc dimensionen
dc.subjectdistance labelingen
dc.subjecttesting monotonicityen
dc.subjecthalfspacesen
dc.titleTesting, Learning, Sampling, Sketchingen
dc.typeDoctoral Thesisen
uws-etd.degreeDoctor of Philosophyen
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0en
uws.contributor.advisorBlais, Eric
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Harms_Nathaniel.pdf
Size:
1.54 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: