The match-state clustering approach. We start with a library of N HMMs. First, the match states are clustered into K classes according to their amino-acid distribution. Next, the known domain occurrences of the target species and its close relatives are aligned to the states of the corresponding HMMs. These alignments are used to compute an amino-acid distribution for each state class. For example, if we suppose that class 2 only involves the second state of HMM 1 and the 5th state of HMM N, the distribution estimated from the represented alignments is 3/6 for T, 1/6 for S, 1/6 for A, and 0 for all other amino acids. Finally, the new HMM library is built by mixing the original distribution of each match state with the estimated distribution of its class.