Distance Measures for Probabilistic Patterns
Loading...
Date
2020-01-07
Authors
Kennedy, Ian
Advisor
Fieguth, Paul
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Numerical measures of pattern dissimilarity are at the heart of pattern recognition and
classification. Applications of pattern recognition grow more sophisticated every year, and
consequently we require distance measures for patterns not easily expressible as feature
vectors. Examples include strings, parse trees, time series, random spatial fields, and
random graphs [79] [117].
Distance measures are not arbitrary. They can only be effective when they incorporate
information about the problem domain; this is a direct consequence of the Ugly Duckling
theorem [37].
This thesis poses the question: how can the principles of information theory and statistics guide us in constructing distance measures? In this thesis, I examine distance functions
for patterns that are maximum-likelihood model estimates for systems that have random
inputs, but are observed noiselessly. In particular, I look at distance measures for histograms, stationary ARMA time series, and discrete hidden Markov models.
I show that for maximum likelihood model estimates, the L2 distance involving the
information matrix at the most likely model estimate minimizes the type II classification
error, for a fixed type I error. I also derive explicit L2 distance measures for ARMA(p, q)
time series and discrete hidden Markov models, based on their respective information
matrices.
Description
Keywords
pattern recognition, information matrix, distance measure, maximum likelihood, time series, hidden Markov model