Skip to main content
Figure 2 | BMC Bioinformatics

Figure 2

From: EVEREST: automatic identification and classification of protein domains in all protein sequences

Figure 2

The EVEREST process. The turquoise arrows represent the steps of the procedure as detailed in the text. The panels represent the state of the data between the steps. Red arrows connect two manifestations of the same object. 0 Input: a database of protein sequences. 1 A non-redundant sequence database is created. 2 Internal repeats are removed from sequences. 3 Segments recurring in the database are identified using pairwise sequence comparison. 4 Within each protein, segments are grouped by position into putative domains. 5 Putative domains are clustered into candidate domain families. 6 Machine learning is used to select the best of the candidate domain families. 7 An HMM is built for each selected domain family. 8 The input database is scanned by each HMM, recreating a segments database. The segments defined by each HMM are considered a domain family. 9 Steps 4–8 are iterated three times. The domain families defined by the third iteration HMMs are clustered into sets of overlapping families. 10 Final domain families are defined by a voting of all the HMMs of each set. See section The EVEREST Process for further details.

Back to article page