Subseries Join and Compression of Time Series Data Based on Non-uniform Segmentation
MetadataShow full item record
A time series is composed of a sequence of data items that are measured at uniform intervals. Many application areas generate or manipulate time series, including finance, medicine, digital audio, and motion capture. Efficiently searching a large time series database is still a challenging problem, especially when partial or subseries matches are needed. This thesis proposes a new denition of subseries join, a symmetric generalization of subseries matching, which finds similar subseries in two or more time series datasets. A solution is proposed to compute the subseries join based on a hierarchical feature representation. This hierarchical feature representation is generated by an anisotropic diffusion scale-space analysis and a non-uniform segmentation method. Each segment is represented by a minimal polynomial envelope in a reduced-dimensionality space. Based on the hierarchical feature representation, all features in a dataset are indexed in an R-tree, and candidate matching features of two datasets are found by an R-tree join operation. Given candidate matching features, a dynamic programming algorithm is developed to compute the final subseries join. To improve storage efficiency, a hierarchical compression scheme is proposed to compress features. The minimal polynomial envelope representation is transformed to a Bezier spline envelope representation. The control points of each Bezier spline are then hierarchically differenced and an arithmetic coding is used to compress these differences. To empirically evaluate their effectiveness, the proposed subseries join and compression techniques are tested on various publicly available datasets. A large motion capture database is also used to verify the techniques in a real-world application. The experiments show that the proposed subseries join technique can better tolerate noise and local scaling than previous work, and the proposed compression technique can also achieve about 85% higher compression rates than previous work with the same distortion error.