Outline of the analysis process illustrating basic steps from an unknown protein sequence towards a functional interpretation. (1) Starting with the unknown CELO sequence, significantly homologous sequences featuring relatively high identity/similarity are searched. Usually, only sequences from related avian adenoviruses could be found at this step. This results in a set of homologous proteins likely to have the same or at least similar function. The following steps are carried out for each of these sequences. This comparative approach can bring up additional information which might be missed if only one sequence is analyzed. (2) Intrinsic sequence features are investigated. This includes a statistical analysis of amino acid contents, the search for low complexity regions (LCRs), coiled coil domains, transmembrane domains (TM), amino- and carboxy-terminal signal sequences and internal repeats. An important output of this step is the rough discrimination between globular and non-globular regions in the protein. (3) The globular regions are further analyzed. These domains present the most useful level on which to understand protein function and their identification is, therefore, one of the major issues during the whole analysis process. Comparison to different databases using various algorithms (see Material and Methods) can either find significant homologs, or proposes a set of candidate domains with borderline statistical significance. In the latter case (4), those hits must be further verified or excluded be additional investigations (conservation of critical functional or structural residues, secondary structure prediction, fold recognition, consensus of different methods, consensus of prediction results within the group of close homologs,...). (5) Finally, all the results are integrated and can be interpreted in the context of the CELO infection cycle.