Halfway to Halfspace Testing

Harms, NathanielHalfway to Halfspace TestingUniversity of Waterloo2017property testingsublinear algorithmscomputer sciencetheoretical computer sciencealgorithms and complexityhalfspaceslinear threshold functionsmachine learningMy UniversityMy UniversityBlais, Eric2017-10-182017-10-182017-10-182017-09-26enMaster Thesishttp://hdl.handle.net/10012/12557In this thesis I study the problem of testing halfspaces under arbitrary probability distributions, using only random samples. A halfspace, or linear threshold function, is a boolean function f : Rⁿ → {±1} defined as the sign of a linear function; that is, f(x) = sign(Σᵢ wᵢxᵢ - θ) where we refer to w ∈ Rⁿ as a weight vector and θ ∈ R as a threshold. These functions have been studied intensively since the middle of the 20th century; they appear in many places, including social choice theory (the theory of voting rules), circuit complexity theory, machine learning theory, hardness of approximation, and the analysis of boolean functions. The problem of testing halfspaces, in the sense of property testing, is to design an algorithm that, with high probability, decides whether an unknown function f is a halfspace function or far from a halfspace, using as few examples of labelled points (x, f (x)) as possible. In this work I focus on the problem of testing halfspaces using only random examples drawn from an arbitrary distribution, and the algorithm cannot choose the points it receives. This is in contrast with previous work on the problem, where the algorithm can query points of its choice, and the distribution was assumed to be uniform over the boolean hypercube. Towards a solution to this problem I present an algorithm that works for rotationally invariant probability distributions (under reasonable conditions), using roughly O(√n) random examples, which is close to the known lower bound of Ω(√n/ √log n) . I further develop the algorithm to work for mixtures of two such rotationally invariant distributions and provide a partial analysis. I also survey related machine learning results, and conclude with a survey of the theory of halfspaces over the boolean hypercube, which has recently received much attention.