Sequence analysis revealed a significantly conserved region, hence novel domain DCD. The DCD domain is an approximately 130 amino acid long stretch that contains several mostly invariable motifs (Fig. 1). These include a FGLP and a LFL motif at the N-terminus and a PAQV and a PLxE motif towards the C-terminus of the domain. Several amino acids are positionally conserved in all members with a DCD domain indicating a critical role of these residues in structure and function (Fig. 1). In particular three cysteines are almost generally (red asterisks in Fig 1) or subfamily specifically (green asterisks in Fig. 1) conserved, which putatively possess a metal binding feature. The predicted secondary structure is mostly composed of beta strands and confined by an alpha-helix at the N- and at the C-terminus. Using the metaserver 3D-Jury [7] no similarities to any other known structural folds could be assigned. The modular nature of the DCD domain is supported by the presence in several protein families with different domain architecture (Fig. 2). The DCD domain is only found in plant proteins but absent from bacteria, fungi and animals. The two fully sequenced plant genomes from rice and Arabidopsis contain 11 and 7 members with a DCD domain, respectively. At least four subgroups of proteins can be identified by phylogenetic comparison of the DCD domain each having members in the rice and in the Arabidopsis genome (Fig. 2). A similar picture emerges from the analysis of plant EST-sequences, which also cluster to the different subgroups (data not shown). The four subgroups differ in the architecture where the DCD domain is located within the protein (Fig. 2). Whereas in subgroup I the DCD domain is found in the C-terminus of the protein, it is found more towards the middle of the protein in subgroup II. The third (III) subgroup is more variable; the proteins are mostly characterized by a DCD domain at the N-terminus and in one case it is found subsequent to a ParB domain. The fourth (IV) subgroup shares a DCD domain at the N-terminus but contains several KELCH repeats at the C-terminal part of the protein.
Whereas the majority of DCD domains (families II, III. IV) contain a second conserved cysteine, directly following the N-terminal one, family I possess a putative functional substitute in the "central loop" of the domain (Fig. 1, green asterisks on the top of the alignment).
We could only identify the DCD domain in a variety of plants, using PSI-BLAST. The domain seems to be present in ESTs from dicots (e.g. Arabidopsis), monocots (e.g. rice), gymnosperm trees (e.g. pine), ferns, and mosses (e.g. Physcomitrella). The available sequences from algae are very limited, but the recently sequenced diatom Thalassiosira pseudonana [8] contains a distant member of this domain in a hypothetical protein (Fig. 2). At least the DCD domain is present from early in plant evolution before the separation of diatoms and green algae, leading to higher plants, occurred about 1 billion years ago.
For three of the proteins with a DCD domain, all clustering into group I, some biological data have been published. These proteins include the B2-protein from carrot, which was found to be strongly and early induced during the developmental shift from undifferentiated cell cultures to somatic embryogenesis [9]. Though the exact function of the protein still has to be elucidated a role in developmental processes is supported by the finding from Arabidopsis transcript profiling with microarrays. Here the DCD containing protein At2g32910 is only weakly expressed throughout the whole life cycle of Arabidopsis except during embryogenic development. A similar pattern is observed for the gene At5g01660, which has several KELCH repeats next to the DCD domain. This gene is most abundantly expressed in embryos but also in the meristem of the shoot apex.
A second protein with a DCD domain was identified by [10] in pea. Here the so called Gda1 gene is strongly expressed in peas during the vegetative phase but rapidly disappears after shifting the plants into the reproductive phase. The transition is mediated by a change of the light period from short to long days. Interestingly the GDA1 gene can be rapidly induced by the phytohormone gibberellic acid, a key player in the developmental change from the vegetative to the reproductive phase in plants. The GDA1 transcript accumulated only 15 min after application of gibberellic acid, indicating that the GDA1 gene is a primary response gene to this phytohormone.
A third protein with a DCD domain was isolated by [6]. This protein was named N-rich protein (NRP) because of the extreme high content of asparagine (~25%) in the N-terminal half in front of the DCD domain. The NRP-gene is rapidly induced during programmed cell death in soybean, caused by inoculation with avirulent bacteria. Isogenic bacteria, lacking a single Avr-gene are not recognized by soybean cells and neither trigger programmed cell death nor the induction of the NRP gene. The gene is induced early in the cell death program well before the cells lose control of their membrane integrity. Using Phytophthora as a fungal pathogen to inoculate soybean plants, the same response was found as with bacteria, indicating that the NRP-gene is responding to the cell death program rather than to specific molecules from a particular pathogen. The putative Arabidopsis ortholog (At5g42050) is induced by several stress conditions including ozone, osmotic and cold stress as indicated by publicly available transcript profiling data (Genevestigator: https://www.genevestigator.ethz.ch/). Ozone treatment leads to small lesions with cell death similar to a hypersensitive reaction caused by avirulent pathogens. A similar set of genes is activated by both inducers of programmed cell death.
The DCD domain is quite well conserved on the amino acid level throughout the plant kingdom. The domain is present in proteins with different architectures. Some of these proteins contain additional recognizable motifs, like the KELCH repeats or the ParB domain. The latter domain has been attributed to the partitioning of plasmids and chromosomes in bacteria and has a nuclease activity [11].
KELCH motifs are typically composed of ~50 amino acid long stretches which form a beta sheet [12]. They occur as 5 to 7 repeats that form a beta propeller tertiary structure. KELCH motifs are widespread and have been identified in viruses, plants, fungi and mammals. Most of the characterized KELCH motifs are interfaces for protein protein interaction, often by interaction with proteins from the cytoskeleton [13].