The combine suffix refers to Top-n-grams combining Top-1-grams and Top-2-grams. N-grams are the set of all possible subsequences of a fixed length 3, so the total words of N-grams are 8000 (203) . Patterns are extracted by TEIRESIAS  and totally 71009 patterns are extracted . Through χ2 selection , 8000 patterns are selected as the characteristic words . The MEME/MAST system  is used to discover motifs and search databases. Totally, 3231 motifs are extracted . The optimized probability threshold 0.13 is used to convert the protein sequence frequency profiles into binary profiles and 1087 words are obtained . Top-1-grams and Top-2-grams have 20 and 400 words respectively, so the total words of Top-n-gram-combine are 420 (20+400).