Interactive Visualization and Exploration of High-Dimensional Data

Loading...
Thumbnail Image

Date

2016-01-21

Authors

Waddell, Adrian

Advisor

Oldford, R. Wayne

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Visualizing data is an essential part of good statistical practice. Plots are useful for revealing structure in the data, checking model assumptions, detecting outliers and finding unanticipated patterns. Post-analysis visualization is commonly used to communicate the results of statistical analyses. The availability of good statistical visualization software is key in effectively performing data analysis and in exploring and developing new methods for data visualization. Compared to static visualization, interactive visualization adds natural and powerful ways to explore the data. With interactive visualization an analyst can dive into the data and quickly react to visual clues by, for example, re-focusing and creating interactive queries of the data. Further, linking visual attributes of the data points such as color and size allows the analyst to compare different visual representations of the data such as histograms and scatterplots. In this thesis, we explore and develop new interactive data visualization and exploration tools for high-dimensional data. The original focus of our research was a software implementation of navigation graphs. Navigation graphs are navigational infrastructures for controlled exploration of high-dimensional data. As part of this thesis, we developed the first interactive implementation of these navigation graphs called RnavGraph. With RnavGraph we explored various features for enhancing the usability of navigation graphs. We concluded that a powerful interactive scatterplot display and methods to deal with large graphs were two areas that would add great value to the navigation graph framework. RnavGraph's scatterplot display proved to be particularly useful for data analysis and we continued our research with the design and implementation of a general-purpose interactive visualization toolkit called loon. The core contributions of loon are as follows. loon implements a general design for interactive statistical graphic displays that supports layering of visual information such as point objects, lines and polygons. These displays further support zooming, panning and selection, and modification and deactivation of plot elements and layers. Interactions with plots are provided with mouse and keyboard gestures as well as via command line control and with inspectors. These inspectors provide graphical user interfaces for modifying and overseeing the plots. loon also implements a novel dynamic linking mechanism that can be used to assign the plots that are to be linked and the linking rules at run time. Additionally, loon's design provides several different types of event bindings to add and customize functionality of loon's displays. In this thesis, we discuss loon's design and framework by giving concrete examples that show how these design choices can be used to efficiently explore and visualize data interactively. These examples revolve around loon's statistical interactive displays such as histograms, scatterplots and graph displays. We also illustrate how loon's design can be used to layer on plots relevant statistical information and model fits such as density estimates, contours, regression lines and geographical maps for spatial data analysis. loon is implemented in Tcl and Tk and we explore the integration of loon's framework into a complete statistical computing environment such as R. We show examples of statistical analyses performed in R that are enhanced with interactivity using loon. loon also implements a number of new tools for high-dimensional data exploration. The serialaxes display represents the data using parallel or radial coordinates. The scatterplot display supports high-dimensional point glyphs such as serialaxes glyphs, polygons and images. loon's navigation graphs allow for multiple navigators and for direct manipulation of a graph which includes deactivating nodes and their adjoining edges. To deal with large graphs, we propose and implement environments for creating navigation graphs interactively by filtering the nodes with respect to some node-associated relevant measures. Such measures include the correlation of variable pairs and the graph-based scagnostics measures. We use sliders, histograms and scatterplot matrices to interactively filter the nodes based on the value of their associated measure. Measures are kept generic and can be recalculated for the subset of selected data points. As another tool for exploring high-dimensional data, we introduce a setup that allows the analyst to select points and have their k-nearest neighboring points highlighted automatically. The space to calculate the inter-point distances that determine the k-nearest neighbors can be chosen dynamically. Finally, we propose a new high-dimensional point glyph called the spiro glyph. While some of loon's interaction features have appeared in part or in whole in statistical systems in the past 40 years (e.g. brushing, panning, zooming, linking plots, etc.), no other equally rich system has provided (or continues to provide) an interactive data visualization system integrated with a widely available and stable statistical system like R. Both Tcl and R are well suited for rapid prototyping of software and statistical methods; with loon rapid prototyping of interactive data visualization tools and methods become possible as well.

Description

Keywords

Interactive Data Visualization, High-dimensional Data, Statistical Visualization

LC Keywords

Citation