Raymond Pang's Home

BioInformatics

UNDER CONSTRUCTION !!!

Dynamic Programming (for gene alignment)

HMM (hidden Markov Model)

In simple wordings, its a Finite Morkov Model (a FSM with probability defined on transition edges) with output (emission) for each state.

State transitions in a hidden Markov model (example)
- x — States of the Markov model
- a — Transition probabilities
- b — Output probabilities
- y — Observable outputs

Viterbi's Algorithm
It is a dynamic programming algorithm for finding the most likely sequence of hidden states – known as the Viterbi path. ie. It is used to find the most probable path in a HMM from just the observed data sequence.

As its a dynamic programming algorithm, in each step, the most probable path is computed based on the the previous best path + all possible edges, and the most probable path of each state is kept. In other words, for each source state, the probability of the Viterbi path to that state is known. This too is multiplied with the emission and transition probabilities and replaces valmax if it is greater than its current value.

The process repeats until the whole sequence is evaluated. Here is a slide from the corresponding lecture notes :

 

Maximum Likelihood (ML) Parameter Estimation for HMM

An HMM is defined by the Transition (ai) and Emission Parameteres(ei), we will let θ denote all these parameters. In most of the cases, we have no knowledge on these parameters, so we have to obtain them from a set of observed data.

To determine θ, use a training set = {x1,...,xn}, where each xj is a sequence which is assumed to fit the model. Given the parameters θ, each sequence xj has an assigned probability p(xj|θ).

As the elements of the training set {x1,...,xn} are assumed to be independent,


p(x1,..., xn|θ) = Πj p (xj|θ).
• ML parameter estimation looks for θ which maximizes the above

There are 2 cases, one is that we know the complete structure of each sequence in the
training set {x1,...,xn}. Another is we only know the values of the xi’s of the input
sequences but not the state paths.

For the first case,

For the second case, we have to use some EM (expectation maximization) algorithm like Baum-Welch introduced below.

Baum-Welch algorithm

It is used to find the unknown parameters of a hidden Markov model (HMM). It is also known as the forward-backward algorithm. It is also known to be an EM (expectation-maximization) algorithm .

The algorithm has two steps:

  1. Calculating the forward probability and the backward probability for each HMM state;
  2. On the basis of this, determining the frequency of the transition-emission pair values and dividing it by the probability of the entire string. This amounts to calculating the expected count of the particular transition-emission pair. Each time a particular transition is found, the value of the quotient of the transition divided by the probability of the entire string goes up, and this value can then be made the new value of the transition.

The algorithm was introduced in the paper:

L. E. Baum, T. Peterie, G. Souled, and N. Weiss, "A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains," Ann. Math. Statist., vol. 41, no. 1, pp. 164--171, 1970.

 

 


Home    About Myself    Publication    Links    Raybase
© 2005 Raymond Pang