Niceness assumptions for Learning Algorithms

Kushagra, Shrinu

Niceness assumptions for Learning Algorithms

Files

Kushagra_Shrinu.pdf (402.93 KB)

Date

2014-08-01

Authors

Kushagra, Shrinu

Publisher

University of Waterloo

Abstract

Various machine learning algorithms like Neural Networks, Linear Regression, Feature Learning etc. are being employed successfully in a wide variety of applications like computer vision, speech recognition, bioinformatics etc. However, many of these learning algorithms have been shown to be NP-Hard. Furthermore some of these algorithms are even hard NP-Hard approximate. The intuition behind the success of these algorithms is that in practical applications the input data is not `worst-case' and has certain `nice' properties. In this thesis, we take a few small steps towards bridging the apparent gap between what is predicted by theory and what is actually happening in practice. We consider two different niceness assumptions. The notion of Metric Distortion is fairly common for dimensionality reduction techniques. The goal is to obtain reduction techniques such that the distortion is small for all pairs of points. We show via an example that Metric Distortion is not good at modeling dimensionality reduction techniques which would perform quite well in practice. We introduce Retaining Distances, a probabilistic notion for modeling dimensionality reduction techniques which preserve most of the inter-point distances. Retaining Distance can viewed as a relaxation of Metric Distortion. We prove that common techniques like PCA can be modeled by our notion. Another niceness assumption inherent in many machine learning algorithms is that `close points tend to have same labels'. A notion of Probabilistic Lipschitzness (PL) was introduced by Urner et. al [28] to capture this intuition. In this work, we propose a new definition of PL. We show that both these definitions are orthogonal to one another, in the sense that, one is not implied by (or a relaxation of) the other. We give sample complexity upper bounds for Nearest Neighbor under this new definition. The crux of the thesis is combining the two notions to show that information (niceness) is preserved across dimensions. We prove that if we have PL in a higher dimension and any dimensionality-reduction technique retains distances then we have PL in reduced dimension as well. That is, a distance retaining reduction preserves PL. In other words, the niceness properties that existed in the original dimension also exist in reduced dimension space. Towards the end, we validate both our notions experimentally. We show how our notion of retaining distance maybe employed in practice to capture the `usefulness' of a reduction technique. We also perform experiments to show how the two notions of PL compare in practice.