Fig. 4
From: In silico approach to designing rational metagenomic libraries for functional studies

Flow diagram of the classification of proteins of the GOS dataset into families and representatives. First, all protein sequences were annotated using HMM-profiles obtained from PFAM and TIGRFAMs. Proteins that did not match HMMs with scores below the selected threshold were clustered de novo using MCL. Resulting MCL-based classes of small size were excluded. For each HMM-based and MCL-based family that contained sufficient complete sequences, a representative was defined. In this way 9,771 representatives standing in for 4,969,723 proteins were assigned. This set of representatives can then be used to create a custom expression library, which can be screened for a desired target activity