The Mycoplasma conjunctivae genome sequencing, annotation and analysis
© Calderon-Copete et al. 2009
Published: 16 June 2009
Skip to main content
© Calderon-Copete et al. 2009
Published: 16 June 2009
The mollicute Mycoplasma conjunctivae is the etiological agent leading to infectious keratoconjunctivitis (IKC) in domestic sheep and wild caprinae. Although this pathogen is relatively benign for domestic animals treated by antibiotics, it can lead wild animals to blindness and death. This is a major cause of death in the protected species in the Alps (e.g., Capra ibex, Rupicapra rupicapra).
The genome was sequenced using a combined technique of GS-FLX (454) and Sanger sequencing, and annotated by an automatic pipeline that we designed using several tools interconnected via PERL scripts. The resulting annotations are stored in a MySQL database.
The annotated sequence is deposited in the EMBL database (FM864216) and uploaded into the mollicutes database MolliGen http://cbi.labri.fr/outils/molligen/ allowing for comparative genomics.
We show that our automatic pipeline allows for annotating a complete mycoplasma genome and present several examples of analysis in search for biological targets (e.g., pathogenic proteins).
Mycoplasmas (class Mollicutes) are among the smallest microorganisms capable of self-replication and autonomous life . The genus Mycoplasma includes a large number of highly genomically-reduced species which in nature are associated with hosts either commensally or pathogenically . General features of the class Mollicutes are small genome, lack of cell wall and low GC content.
Indeed, the Mycoplasma species have genomes of 0.6 to 1.3 Mbp. Weisburg et al. (1989)  and Woese et al. (1980)  revealed that Mycoplasma have evolved from more classical bacteria of the firmicutes taxon by a so-called regressive evolution that resulted in massive genome reduction  and minimal metabolic activities. Consequently, they adopted a strict parasitic life style, mainly occurring as extracellular parasites often restricted to a living host, with some species having the ability to invade host cells as described by Sirand-Pugnet et al. (2007) , Rosengarten et al. (2000)  and Citti et al. (2005) . They have a predilection for the mucosal surfaces, where they successfully compete for nutrients with many other organisms, establishing chronic infections . They do not show specific virulence factor as known in other bacteria, instead they seem to use toxic metabolic intermediates that they secrete and translocate to the host cells as virulence factors . Additionally, due to the lack of cell wall, they are not affected by some antibiotics which target synthesis of cell wall such penicillin or other beta-lactam antibiotics making these organisms particularly interesting in medicine.
Mycoplasma conjunctivae is considered as the major etiological agent of Infectious KeratoConjunctivitis (IKC) for both domestic and wild caprinae species. In the European Alps it affects several species such as alpine ibex (Capra ibex ibex), alpine chamois (Rupicapra rupicapra rupicapra), and mouflon (Ovis orientalis musimon), as well as in domestic sheep and goat . In Switzerland, M. conjunctivae is known to be the primary cause of this disease .
The implied role of M. conjunctivae is based on the frequent isolation of this organism from inflamed eyes and on limited attempts to induce ocular disease experimentally showing that M. conjunctivae is one agent responsible for epidemic keratoconjunctivitis . Nonetheless, even if the molecular epidemiology has been well described by Belloy et al. (2003) , the molecular infection mechanism is still not established and remains a mystery.
M. conjunctivae type strain HRC/581T (NCTC10147)  was grown on standard mycoplasma broth medium enriched with 20% horse serum, 2.5% yeast extract and 1% glucose (Axcell Biotechnologies). The cells were harvested by centrifugation at 13,000 × g for 20 min, washed three times in TES buffer (10 mM Tris-HCl, 1 mM EDTA, 0.8% NaCl, pH 7.5), and then re-suspended in TES buffer to a concentration of approximately 109 bacteria/ml. DNA was extracted by the guanidium thiocyanate method , extracted 3 times with PCIA (Phenol: CHCl3: Isoamylalcohol = 49.5: 49.5: 1) and 3 times with CIA (CHCl3: Isoamylalcohol = 99: 1), precipitated with 50% isoproanol, washed 2 times with 70% ethanol to remove salt, dried in the air for 15 min and re-suspended in double distilled H2O at a concentration of 500 μg/ml.
Sequencing and assembly of the genome was carried out by Microsynth AG. The quality of the isolated genomic DNA was verified by gel electrophoresis and displayed a pure high molecular weight DNA. The DNA was sheared by passing it several times through a needle, in order to construct two different libraries: a plasmid library and a fosmid library. For the plasmid library (2–12 Kbp inserts), the genomic DNA was passed 30 times through a 30-Gauge needle and sonicated for 10 seconds (sonication strength 3 on a Digital Sonifier 450 from Branson Ultrasonics corp, Danbury, CT, USA). For the fosmid library (32 Kbp inserts), the genomic DNA was passed 10 times through a 23-Gauge needle without sonication.
Small fragments were ligated with a linker, fractionated twice through 0.8% agarose gels. Fractions of 6 different sizes (from 2 to 12 Kbp) were cut out from the gel and cloned into vector pOTW12 (Sanger Institute). Moreover, the large fragments were fractionated using a CHEF-DR II System (BIORAD). Fragments of 32 Kbp were cut out from the gel and ligated into pCC1Fos (Epicentre Biotechnology Inc.).
From the plasmid library 11'300 clones and from the fosmid library 384 clones were end-sequenced on an ABI 3730 capillary sequencer. A second part of the small fragments were sequenced using 454 Life Science FLX technology leading to 263'163 reads that were reduced to 78'498 reads covering 20'569'079 bp after applying a quality cut-off filter (approx. 22× coverage).
The assembly was carried out using the SeqMan module of the DNASTAR Lasergene version 7 combining both classical Sanger sequences (ABI3730) and 454 FLX reads. A check was conducted with "amosvalidator" of the AMOS package , allowing identifying suspicious regions in the assembly. To help in the assembly process the 384 fosmids paired-end reads were aligned to the final sequence. The reads display a nice spreading at regular intervals except for 2 clones that were absent from the results. The 2 regions were analyzed for the presence of potentially lethal genes for E. coli. The first region contains a homologue of the gene lepA that is known to be lethal when overexpressed in E. coli . This region also contains a restriction enzyme that might cut E. coli genome. The second region contains a transposase and some phage genes. This might explain the toxicity of these two fosmids in E. coli.
The automatic annotation pipeline was entirely built locally using available software and linking them with Perl scripts.
Gene prediction was carried out using Glimmer 3.02  and the genetic code specific for Mycoplasma (e.g., UGA encodes a tryptophane). The interpolated context models (ICM) were calculated by self-training on the long ORFs of the contigs. The RNAs were predicted using Infernal with models obtained from RFAM , tRNAscan-SE , and blastn for 16S and 23S .
Predicted coding sequences (CDS) were translated using the EMBOSS package (extractseq, transeq, revseq)  and a similarity search was run by blastp against the UniProt/Swiss-Prot knowledgebase (Release 56.2 of 23-Sep-2008: 398181 entries) . The CDS were also scanned against the HAMAP families  to identify orthologous protein families. In addition the CDS were searched for potential known domains using InterProScan , and for biased compositional regions with SEG  and Marcoil .
The biological interest of an annotation project is to identify the gene products by designating a descriptive common name for the protein and its function with as much specificity as the evidence supports. We use homology-based annotation transfer to assign the name and associated information of gene product: Gene symbol, EC number if protein is identified as an enzyme and other features.
Functional assignment criteria
Blastp (Mycoplasma DB)
Evalue < e-20
>e-20 and <e-4
> e-4 or No match
Blastp (Swissprot DB)
Evalue < e-20
>e-20 and <e-4
> e-4 or No match
Confident protein family match
No confident protein family match
If no HAMAP family match. Support evidence from at least one InterPro member database
Support evidence from one or none Interpro member database
Results obtained for M. hyopneumoniae 232 annotation
Total of M. hyopneumoniae 232 known genes
Total annotated genes using pipeline
Total correct predictions *
Total known genes not predicted *
Total correct gene annotations
Total predicted genes incorrectly or differently annotated
Total predicted genes in additionally annotated by pipeline
Evaluation of the sensitivity and the specificity of the pipeline based on re-annotation of the M. hyopneumoniae 232 genome.
Total genes detected
A total of 734 genes have been computationally predicted. We found both 23S and 16S ribosomal RNAs in unique copies located next to each other. The 5S ribosomal RNA is located remotely of 23S and 16S genes. We identified 28 transfer RNAs covering all 20 amino acids. Other non-coding RNAs were found: bacterial RNase P class B, TPP riboswitch (THI element), tmRNA (proteolysis signal) and the bacterial signal recognition particle RNA.
Summary of M. conjunctivae genome features
G + C content (mole%)
Functionally assigned protein CDSs
Putative protein CDSs
Hypothetical protein CDSs
Bacterial RNase P class B
TPP riboswitch (THI element)
Bacterial signal recognition particle RNA
Transfer RNA genes
Top 20 biological processes. Relevant information with a biological meaning was searched in priority. We list the top 20 of biological process that are accomplished by the newly annotated genes.
tRNA aminoacylation for protein translation
carbohydrate metabolic process
phosphoenolpyruvate-dependent sugar phosphotransferase system
regulation of transcription
ATP synthesis coupled proton transport
nucleoside metabolic process
It is important to note that our method (homology based annotation) does not allow to distinguish between close homologues having different functions.
These findings are in contrast to the multiple dnaA-boxes found in the intergenic regions surrounding dnaA in other mollicutes . In addition to the presence of dnaA-box motifs, replication origins can also frequently be identified by looking for biases in strand composition through measures such as the cumulative GC skew [28–30]. For M. conjunctivae, we found no significant asymmetries that can be readily detected with GC skew. The lack of a clear bias in M. conjunctivae is similar to that observed for the M. hyopneumoniae . Therefore, the only significant feature of the M. conjunctivae genome that provides any possible indication of the location of the origin of replication is the presence of the dnaA gene. Otherwise, there are no features that allow definitive mapping of the origin to the intergenic region upstream of the dnaA gene, as seen in other bacteria.
Bacteria have many ways to produce virulence that reside in the ability to adhere, invade and cause damage to host cells. Various strategies of pathogenicity such as cytolysins, toxins and invasins enable other bacteria to produce infection. In Mycoplasma species no such typical primary virulence genes have been found. Mycoplasmas seem rather to use intrinsic metabolic and catabolic functions to cause disease in the affected host and to ensure the microbe's survival. Our efforts to identify genes involved in the pathogenicity of Mycoplasma conjunctivae were concentrated on the one hand, try to find those primary virulence genes, toxins principally, rare in other mycoplasmas. On the other hand, on metabolic pathways that has been proposed by studies carried out in other mycoplasmas .
We found using manual blastp queries by an expert, the genes for a glycerol-3-phosphate dehydrogenase (glpO), a glycerol kinase (glpK), a glycerol uptake facilitator protein (glpF) and an ABC transporter system (Sn-glycerol-3-phosphate transport system permease) that are implicated in the glycerol metabolism producing cell damage, inflammation and disease in Mycoplasma mycoides subsp. mycoides Small Colony (SC) .
The pathway starts with the assimilation of glycerol by the ABC glycerol transporter (gtsA, gtsB and gtsC). Afterwards, the glycerol is phosphorylated into glycerol-3-phosphate, then oxidized by GlpO in presence of O2 into dihydroxyactone-phosphate (DHAP) producing one molecule of H2O2. H2O2 is released directly inside the host cells by the transmembrane GlpO protein leading to cell death . The absence of any gene having a catalase or dismutase activity favors this hypothesis.
The identification of those genes in Mycoplasma conjunctivae constitutes an important discovery given that a relationship between the glycerol metabolism and cytotoxicity is established in the laboratory. Further work to validate this hypothesis in M. conjunctivae is required and has been started in collaboration with a laboratory of the Institute for Veterinary Bacteriology (University of Bern).
Toxins constitute an important type of virulence factors in several bacteria. Thereby, we searched for toxins in M. conjunctivae and we found 3 proteins highly similar with toxins of Treponema hyodysenteriae (Brachyspira hyodysenteriae). Those proteins are Hemolysin A (hlyA), Hemolysin B (hlyB) and Hemolysin C (hlyC). The 3 genes are scattered on the genome.
Those proteins are present in other mycoplasmas, particularly M. hyopneumoniae and M. capricolum, and even if in those species, these toxins are not essential for pathogenicity mechanisms, it can not be excluded that these toxins contribute to the pathogenicity of M. conjunctivae.
Insertion sequences (IS) are short DNA elements that function as simple transposable elements by coding for proteins implicated in the transposition activity. Transposase and other regulatory protein are the proteins generally coded by IS elements: The transposase catalyses the enzymatic reaction allowing the IS to move. Regulatory proteins act by enhancing or inhibiting the transposition activity. The coding region in an insertion sequence is usually flanked by inverted repeats .
List of M. conjunctivae transposases detected.
Condition of the sequence
Same transposase split in two ORFs
Genome comparison. Functional classification of proteins of 5 sequenced mycoplasma genomes
Mycoplasma. hyopneumoniae 232
Translation, ribosomal structure and biogenesis
DNA replication, recombination and repair
Posttranslational modification, protein turnover
Energy production and conversion
Carbohydrate transport and metabolism
Amino acid transport and metabolism
Nucleotide transport and metabolism
Coenzyme transport and metabolism
Lipid transport and metabolism
Inorganic ion transport and metabolism
No known function
Mycoplasma conjunctivae is the fourteenth genome of a mycoplasma species that has been fully sequenced. Phylogenetically, the closest relative among the sequenced mycoplasmas is M. hyopneumoniae reflected by the high similarity of most of the proteins identified in M. conjunctivae.
Genome comparison. Genome size of 15 sequenced genomes of species belonging to Mycoplasma genus http://www.ebi.ac.uk/genomes/bacteria.html, including M. conjunctivae.
Mycoplasma agalactiae PG2 chromosome
Mycoplasma arthritidis 158L3-1
Mycoplasma capricolum subsp. capricolum ATCC 27343
Mycoplasma gallisepticum R
Mycoplasma genitalium G37
Mycoplasma hyopneumoniae 232
Mycoplasma hyopneumoniae 7448
Mycoplasma hyopneumoniae J
Mycoplasma mobile 163 K
Mycoplasma mycoides subsp. mycoides SC str. PG1
Mycoplasma penetrans HF-2
Mycoplasma pneumoniae M129
Mycoplasma pulmonis UAB CTIP
Mycoplasma synoviae 53
Globally the mycoplasma genomes have a characteristically low G+C content within the range of 23.8 to 40 mol% (Table 8). The highest G+C content found in M. pneumoniae and the lowest in M. capricolum. Regarding Mycoplasma conjunctivae, G+C content has a typical value of about 29%. The codon usage is similar to that of M. hyopneumoniae and opposite to that of M. capricolum and M. mycoides [31, 34, 35].
The presence of repeats across the genome was the principal difficulty for finishing the genome assembly. Insertion sequence (IS) elements are reported in the majority of mycoplasmas and in M. conjunctivae we found transposases for IS-elements in the genome. Some of those transposases genes are complete sequences and some other are fragmented showing a predicted length of less than 1000 bp. Since those insertion elements are nearly identical they created difficulties for assembling the genome.
The findings highlighted by this project, principally the glycerol pathway, require further experimental confirmation. In particular, the hypothesis for damaging the host cells by the glycerol metabolism need to be confirmed by demonstrating the localization of GlpO in the membrane and the release of H2O2 outside the cell. If this hypothesis can be verified, the possibility to block at any stage the glycerol pathway could constitute a candidate target for controlling the disease.
In conclusion, we created an automatic pipeline to annotate a prokaryotic genome sequence using various tools for the prediction and the identification of the genes. This pipeline is customized for handling sequences of mycoplasma species.
We deposited the Mycoplasma conjunctivae genome fully annotated in the EMBL database (FM864216). Data stored into our local database can be searched and genome can be visualized through our website http://myconj.vital-it.ch. Analysis of annotated genes gives new insights about potential mechanisms of pathogenicity as well as the possibility to go deeper into the knowledge of Mycoplasma conjunctivae and the IKC disease and opens the way to finding methods to prevent M. conjunctivae infections of domestic animals as reservoir for this pathogen and hence prevent IKC in wild animals.
We thank for support the "Programmes actions intégrées PAI" – Germaine de Staël. We would like to particularly thank Mrs Denise Schmidheini for her kind help in initiating and supporting this sequencing project.
We are grateful to the Vital-IT platform for offering calculation time on their computing cluster, and in particular to Mr Volker Flegel for his help in installing and debugging the necessary software.
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 6, 2009: European Molecular Biology Network (EMBnet) Conference 2008: 20th Anniversary Celebration. Leading applications and technologies in bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S6.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.