Nonlinear Dimensionality Reduction by Manifold Unfolding
Khajehpour Tadavani, Pooyan
MetadataShow full item record
Every second, an enormous volume of data is being gathered from various sources and stored in huge data banks. Most of the time, monitoring a data source requires several parallel measurements, which form a high-dimensional sample vector. Due to the curse of dimensionality, applying machine learning methods, that is, studying and analyzing high-dimensional data, could be difficult. The essential task of dimensionality reduction is to faithfully represent a given set of high-dimensional data samples with a few variables. The goal of this thesis is to develop and propose new techniques for handling high-dimensional data, in order to address contemporary demand in machine learning applications. Most prominent nonlinear dimensionality reduction methods do not explicitly provide a way to handle out-of-samples. The starting point of this thesis is a nonlinear technique, called Embedding by Affine Transformations (EAT), which reduces the dimensionality of out-of-sample data as well. In this method, a convex optimization is solved for estimating a transformation between the high-dimensional input space and the low-dimensional embedding space. To the best of our knowledge, EAT is the only distance-preserving method for nonlinear dimensionality reduction capable of handling out-of-samples. The second method that we propose is TesseraMap. This method is a scalable extension of EAT. Conceptually, TesseraMap partitions the underlying manifold of data into a set of tesserae and then unfolds it by constructing a tessellation in a low-dimensional subspace of the embedding space. Crucially, the desired tessellation is obtained through solving a small semidefinite program; therefore, this method can efficiently handle tens of thousands of data points in a short time. The final outcome of this thesis is a novel method in dimensionality reduction called Isometric Patch Alignment (IPA). Intuitively speaking, IPA first considers a number of overlapping flat patches, which cover the underlying manifold of the high-dimensional input data. Then, IPA rearranges the patches and stitches the neighbors together on their overlapping parts. We prove that stitching two neighboring patches aligns them together; thereby, IPA unfolds the underlying manifold of data. Although this method and TesseraMap have similar approaches, IPA is more scalable; it embeds one million data points in only a few minutes. More importantly, unlike EAT and TesseraMap, which unfold the underlying manifold by stretching it, IPA constructs the unfolded manifold through patch alignment. We show this novel approach is advantageous in many cases. In addition, compared to the other well-known dimensionality reduction methods, IPA has several important characteristics; for example, it is noise tolerant, it handles non-uniform samples, and it can embed non-convex manifolds properly. In addition to these three dimensionality reduction methods, we propose a method for subspace clustering called Low-dimensional Localized Clustering (LDLC). In subspace clustering, data is partitioned into clusters, such that the points of each cluster lie close to a low-dimensional subspace. The unique property of LDLC is that it produces localized clusters on the underlying manifold of data. By conducting several experiments, we show this property is an asset in many machine learning tasks. This method can also be used for local dimensionality reduction. Moreover, LDLC is a suitable tool for forming the tesserae in TesseraMap, and also for creating the patches in IPA.