The k-best paths in Hidden Markov Models. Algorithms and Applications to Transmembrane  Protein Topology Recognition.

Golod, Daniil

The k-best paths in Hidden Markov Models. Algorithms and Applications to Transmembrane Protein Topology Recognition.

Files

thesis.pdf (1.23 MB)

Date

2009-08-26T19:18:35Z

Authors

Golod, Daniil

Publisher

University of Waterloo

Abstract

Traditional algorithms for hidden Markov model decoding seek to maximize either the probability of a state path or the number of positions of a sequence assigned to the correct state. These algorithms provide only a single answer and in practice do not produce good results. The most mathematically sound of these algorithms is the Viterbi algorithm, which returns the state path that has the highest probability of generating a given sequence. Here, we explore an extension to this algorithm that allows us to ﬁnd the k paths of highest probabilities. The naive implementation of k best Viterbi paths is highly space-inefficient, so we adapt recent work on the Viterbi algorithm for a single path to this domain. Out algorithm uses much less memory than the naive approach. We then investigate the usefulness of the k best Viterbi paths on the example of transmembrane protein topology prediction. For membrane proteins, even simple path combination algorithms give good explanations, and if we look at the paths we are combining, we can give a sense of conﬁdence in the explanation as well. For proteins with two topologies, the k best paths can give insight into both correct explanations of a sequence, a feature lacking from traditional algorithms in this domain.