Feature Identification
Loading...
Date
2020-08-04
Authors
Shaw, Justin
Advisor
Stastna, Marek
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
We present several methods for identifying time periods of interest (features) in a wide range of data sets.
The gamma method is a computationally inexpensive, flexible feature identification method which uses a comparison of time series to identify a rank-ordered set of features in geophysically-sourced data sets. Many physical phenomena perturb multiple physical variables nearly simultaneously, and so features are identified as time periods in which there are local maxima of absolute deviation in all time series.  Unlike other available methods, this method allows the analyst to tune the method using their knowledge of the physical context.  The method is applied to a data set from a moored array of instruments deployed in the coastal environment of Monterey Bay, California, and a data set from sensors placed within the submerged Yax Chen Cave System in Tulum, Quintana Roo, Mexico. These example data sets demonstrate that the method allows for the automated identification of features which are worthy of further study.  The gamma method appeared in Heliyon as `Feature identification in time series data sets' (Shaw et al. 2019).
The EOF error map method is a feature identification method for time-indexed model output.   The method is used as a diagnostic to quickly focus the attention on a subset of the data before further analysis methods are applied. Mathematically, the infinity norm errors of empirical orthogonal function (EOF) reconstructions are calculated for each time output.  The result is an EOF reconstruction error map which clearly identifies features as changes in the error structure over time.  The ubiquity of EOF-type methods in a wide range of disciplines reduces barriers to comprehension and implementation of the method.   We apply the error map method to three different Computational Fluid Dynamics (CFD) data sets as examples: the development of a spontaneous instability in a large amplitude internal solitary wave, an internal wave interacting with a density profile change, and the collision of two waves of different vertical mode.  In all cases the EOF error map method identifies relevant features which are worthy of further study. The EOF error map method appeared in PLoS ONE as `Feature identification in time-indexed model output' (Shaw and Stastna 2019).  Together, the gamma and EOF error map methods allow feature identification in an extremely wide variety of data sets. 
 While the associated methods papers required brevity and specificity, the thesis is written from the perspective of the overarching research program.  This thesis expands the twenty pages or so of material in the Heliyon and PLoS ONE papers to a detailed, over 100 page account of how and why the methods were developed.  It includes a much more comprehensive framing of the general problem both methods solve, much more motivation, discussion, and mathematical background, an entire section on ensemble data sets, including another method for feature identification, examples of the methods applied to full scale data sets, and an appendix of related work. This is the definitive guide to our methodology and results.
Description
Keywords
PCA, EOF, SVD, eigenvalues, covariance, geophysical fluid dynamics, EOF error map, gamma method, first eigenvalue series, time series analysis, event detection, feature identification, geophysics, oceanography, atmospheric science, environmental science, geology, hydrology