- Methodology article
- Open Access
A computational approach for identifying pathogenicity islands in prokaryotic genomes
© Yoon et al; licensee BioMed Central Ltd. 2005
Received: 16 May 2005
Accepted: 21 July 2005
Published: 21 July 2005
Pathogenicity islands (PAIs), distinct genomic segments of pathogens encoding virulence factors, represent a subgroup of genomic islands (GIs) that have been acquired by horizontal gene transfer event. Up to now, computational approaches for identifying PAIs have been focused on the detection of genomic regions which only differ from the rest of the genome in their base composition and codon usage. These approaches often lead to the identification of genomic islands, rather than PAIs.
We present a computational method for detecting potential PAIs in complete prokaryotic genomes by combining sequence similarities and abnormalities in genomic composition. We first collected 207 GenBank accessions containing either part or all of the reported PAI loci. In sequenced genomes, strips of PAI-homologs were defined based on the proximity of the homologs of genes in the same PAI accession. An algorithm reminiscent of sequence-assembly procedure was then devised to merge overlapping or adjacent genomic strips into a large genomic region. Among the defined genomic regions, PAI-like regions were identified by the presence of homolog(s) of virulence genes. Also, GIs were postulated by calculating G+C content anomalies and codon usage bias. Of 148 prokaryotic genomes examined, 23 pathogenic and 6 non-pathogenic bacteria contained 77 candidate PAIs that partly or entirely overlap GIs.
Supporting the validity of our method, included in the list of candidate PAIs were thirty four PAIs previously identified from genome sequencing papers. Furthermore, in some instances, our method was able to detect entire PAIs for those only partial sequences are available. Our method was proven to be an efficient method for demarcating the potential PAIs in our study. Also, the function(s) and origin(s) of a candidate PAI can be inferred by investigating the PAI queries comprising it. Identification and analysis of potential PAIs in prokaryotic genomes will broaden our knowledge on the structure and properties of PAIs and the evolution of bacterial pathogenesis.
PAIs are distinct genetic elements of pathogens encoding various virulence factors such as protein secretion systems, host invasion factors, iron uptake systems, and toxins [1, 2]. PAIs are a subset of genomic islands which have been transferred by horizontal gene transfer (HGT) event and confer virulence upon the recipient. PAIs can be identified by features such as the presence of virulence genes, biased G+C content and codon usage, carriage of mobile sequence elements, and/or association with tRNA genes or repeated sequences at their boundaries .
Identification of PAIs is essential in understanding the development of disease and the evolution of bacterial pathogenesis . As complete genome sequences rapidly accumulate, various in silico methods have been developed to detect HGT [4–7]. Most of the methods were based on the detection of genomic regions having atypical G+C content, patterns of codon usage bias, or dinucleotide anomaly. However, compositional approaches may generate many false positives due to other factors such as selection and mutation bias [8, 9], and a lot of false negatives owing to adjustment of the transferred sequence in its composition by amelioration . In fact, these methods detect different sets of ORFs as foreign origin when applied to the genome of Escherichia coli K-12 . Thus, combining multiple lines of evidence can be beneficial to determine whether a gene or a group of genes has been acquired by HGT.
While studies on detecting horizontally transferred genes or GIs in genome sequences have been intensively carried out, little has been reported for PAIs. Considering that a PAI is a GI encoding virulence factors, compositional criteria such as G+C content and codon usage is not sufficient for identifying PAIs because genomic approaches can only lead to the identification of GIs . In this work, we designed a computational method for identifying PAIs in sequenced genomes by combining a homology-based method and detection of abnormalities in genomic composition. To do this, we collected published PAI data and checked virulence genes on the PAI loci. We applied this approach to 148 prokaryotic genomes and identified 77 candidate PAIs. Detected regions contain virulence genes and relics of the HGT event.
Genomic islands in bacterial genomes
As for the 157 chromosomes examined (Table 1S [see Additional file 1]), the length proportion of GIs to the chromosome averaged 10.1%. Nanoarchaeum equitans, the smallest genome of any sequenced microbes, contained the smallest proportion of GIs, which is only 2.9%. Leptospira intrerrogans, which is responsible for worldwide water-borne zoonosis leptospirosis, contained the largest, 34.7% for chromosome I and 32.2% for chromosome II. The genome of L. interrogans was reported to have the biggest number of proteins with structural similarity to eukaryal and archaeal proteins as compared to other bacteria . In general, larger proportions of GIs in pathogens than those in related nonpathogenic species were observed, e.g., 15.7% for Corynebacterium diphtheriae versus 7.6% for C. glutamicum, 12.3% for E. coli CFT073 versus 8.9% for E. coli K-12.
A shortened list of Part of PAI loci mentioned in the text. (see supplementary Table 2S for the complete list of 207 collected PAI loci.) [see Additional file 2]
Accession number (length in kb)a
Escherichia coli 536
Hemolysin, P fimbriae
E. coli 536
E. coli 536
Hemolysin, P fimbriae
E. coli CFT073
AF081283(10.2), AF081284, AF081285(13.7), AF081286, AF003741-2
E. coli CFT073
Attaching and effacing, TTSS, invasion
E. coli O157:H7 EDL933; E2348/69; 4797/97; 83/39; RDEC-1
AF071034(45.3)b, AF022236(35.6)b, AJ278144(37.7)b, AF453441(60.4)b, AF200363(37.9)b
TTSS, invasion into epithelial cells, apoptosis
Salmonella typhimurium SL1344
AF148689, U16278, U16303
TTSS, invasion into monocytes
S. typhimurium SL1344; LT2; RF333
AF020808, AJ224978(12.1), Z95891, X99944-5, AJ224892, U51927, Y09357
Invasion, survival in monocytes
S. typhimurium 14028s; S. enterica subsp. enterica serovar Rachaburi & serovar Dublin
AF106566(17.0)b, Y13864, M57715, AJ000509, AY144489, AY144490(10.1)
S. flexneri M90T & SA100
S. flexneri 2a YSH6000
Yersinia enterocolitica Ye 8081 & WA314
X94452, X95298, AJ132668, AJ132945(14.0), Y12527(13.6)
Y. pseudotuberculosis PB1 & IP32637; Y. pestis KIM10+
AJ236887, AJ009592, AJ009988
Toxin-coregulated pilus (Tcp) adhesin, regulator
Vibrio cholerae 395; N16961; others
AF325733(41.3)b, AF325734(41.3)b, AF034434(12.9), X64098(13.8), U39068(15.0), AF208385, AF319954, AF306795-8, AF319652-5, AF378526, AF452570-80
Type IV secretion, cytotoxing-associated gene (cag) antigen
AF282853(20.2)b, AF282852(21.3)b, U60177, AY136637-46
Pseudomonas syringae DC3000 & others
AF232004(52.5)b, AF232005(11.0), U25812-3, AF232003, AF069650-2, L41862, U03854-5, U07346, AF051694, L11582, AY147017-28
Pseudomonas aeruginosa X24509 & PA14
P. luminescens W14
PAIs in prokaryotic chromosomes (see supplementary Table 3S for the complete information) [see Additional file 3]
Δ G+C (%)a
Evidence of GIc
Bacillus halodurans C-125d
Bacillus subtilis 168d
Bordetella bronchiseptica RB50
Hemin transport system
Bordetella pertussis Tohama I
Bradyrhizobium japonicum USDA 110d
Chromobacterium violaceum ATCC 12472
Enterococcus faecalis V583
NN in E. faecalis e
Escherichia coli CFT073
tRNA, integrase, IS
tRNA, transposase, phage genes
F1C and S fimbrial protein, iron uptake
tRNA, integrase, transposase
PAI I CFT073 e
ISEc8, antigen 43 precursor, fimbrial protein
PAI II CFT073 e
Escherichia coli K12d
Integrase, putative transposase
Citrate-dependent iron transport
Escherichia coli O157:H7 EDL933
Pilin subunit, transporter and member of exoprotein
Putative transposase, IS proteins
Glycosyl transferase, IS1 proteins
tRNA, integrase, phage genes
Escherichia coli O157:H7 Sakai
Ferric enterochelin esterase
Helicobacter pylori 26695
Glutamate racemase (glr)
cag PAI e
Helicobacter pylori J99
Glutamate racemase (murI)
cag PAI e
Mesorhizobium loti MAFF303099d
TTSS, nodulation protein
Nitrosomonas europaea ATCC 19718d
Transmembrane sensors, outer membrane efflux
Photorhabdus luminescens subsp. laumondii TTO1
Putative fimbrial proteins
tRNA, IS, transposase
TTSS locus e
tc locus e
Salmonella enterica Typhi Ty2
Salmonella enterica Typhi CT18 (Salmonella enterica Typhi Typhi)
Salmonella typhimurium LT2 (S. enterica serovar Typhimurium LT2)
Flagellar synthesis, siderophore receptor protein
Shigella flexneri 2a 2457T
Shigella flexneri 2a 301
Enterochelin esterase, oxidoreductase (Fe-S subunit)
Oxidoreductases (Fe-S subunit)
tRNA, integrase, transposase
tRNA, integrase, transposase
tRNA, transposase, integrase
Staphylococcus aureus Mu50
Staphylococcus aureus MW2
Staphylococcus aureus N315
Vibrio cholerae N16961
CTX locus g
Vibrio parahaemolyticus RIMD 2210633 chromosome I
TTSS, iron transport
Vibrio parahaemolyticus RIMD 2210633 chromosome II
Xanthomonas campestris pv. campestris ATCC 33913
Hrp PAI e
Yersinia pestis CO92
Iron transport system
Fimbrial protein, secreted protein
Yersinia pestis KIM
Iron/siderophore ABC transporters, antigen chaperone
Among the 77 cPAIs, 34 matched to PAIs which have been described in genome sequencing papers (Table 2, Figure 2). 27 cPAIs entirely matched to known PAIs – a PAI (in Enterococcus faecalis), PAI I, IICFT073 (E. coli CFT073), LEE (E. coli O157 EDL933 and Sakai), cag PAI (Helicobacter pylori 26695 and J99), the TTSS and tc loci (Photorhabdus luminescens), SPI-2,4,5 (Salmonella enterica serovar Typhi Ty2 and CT18, and serovar Typhimurium LT2), SPI-3 (S. typhimurium LT2), SHI-1, 2 (Shigella flexneri 2a 2457T and 301), VPI (Vibrio cholerae), Hrp PAI (Xanthomonas campestris), and HPI (Yersinia pestis CO92 and KIM). One end of PAIs – SPI-1 (in three S. enterica strains), SaPIm3 (S. aureus Mu50), and SaPIn3 (S. aureus N315) – were found in 5 cPAIs, and the other end of the PAIs were found in seemingly backbone sequences. νSaβ in S. aureus MW2 and CTX locus in V. cholerae N16961 were partly matched. Nine cPAIs span the TTSS loci which were not annotated as PAIs in the genome sequencing data.
In most cases, distribution of the regions homologous to the PAIs from other enterobacteria such as VPI of Vibrio cholerae, cag PAI of Helicobacter pylori, SaPI1 of Staphylococcus aureus strains were restricted to their host strains. However, widespread distribution in different species was evident for PAGI-1 of Pseudomonas aeruginosa and the Hrp PAI of P. syringae, Xanthomonas spp., Burkholderia pseudomallei, and Ralstonia solanacearum. Variations of cPAIs were observed for EDL933 and Sakai, which belong to the same E. coli O157 group (Table 2). This discrepancy results from the different distribution of prophages in the two genomes. Also, different ORF prediction by different research groups affected the determination of GIs.
PAI-like regions that did not meet the criteria
164 PAI-like regions in 57 prokaryotes including 16 non-pathogenic bacteria and one archaeon did not overlap GIs (supplementary Table 4S) [see Additional file 4]. Their sizes ranged from 1.9 to 50.6 kb and were averaged 9.5 kb. Most of them encoded flagellar/fimbrial biosynthesis or iron uptake systems. Among these regions, 14 were PAIs published in the genome sequencing papers. Six PAIs – Hrp PAI (in Pseudomonas syringae pv. tomato DC3000), SPI-3 (S. enterica serovar Typhi strains Ty2 and CT18), SaPIm1 (in S. aureus Mu50), SaPIn1 (S. aureus N315) and νSa3 (S. aureus MW2) – entirely matched, and 5 counterparts of the PAIs that partly match to the cPAIs that overlap GIs were found in these regions. Parts of LIPI-1 in Listeria innocua and two regions of internalins in L. monocytogenes EGD were found. In fact, the Hrp PAI and LIPI-1 have DNA compositions similar to the core genomes, and are suggested to have been acquired a long time ago [15, 16].
By analyzing structures of many microbial genomes, it became obvious that HGT is an important mechanism for bacterial evolution, let alone genome complexity and plasticity . GIs, which are large genomic segments and most likely transferred by HGT, contribute to the survival of the hosting bacterial strain in a particular environment and sometimes to virulence. These two kinds of GIs, of which the former can be referred as 'fitness islands', are often hardly distinguishable from each other because the role of a GI may vary in different ecological niches and the physiology of the bacterium. Up to now, attempts to identify PAIs [5, 6, 17] have been made by detecting genomic regions which only differ from the rest of the genome in their base composition and codon usage. In this study, we identified "candidate PAIs (cPAIs)" that reflect potential PAIs with anomalous composition, probably due to their recent acquisition. Among the 148 sequenced strains searched in this study, 17 were the strains closely related to the hosts carrying queried PAI loci. From the reports of their genome sequencing projects, 27 PAIs have been described. Among them, 23 PAIs were found in the list of cPAIs and the accuracy of our method can be considered as 85% (Table 2, supplementary Table 4S [see Additional file 4]).
The presence of virulence factors could be a useful criterion for discerning PAIs from other genomic islands. Clusters consisting of only hypothetical genes and/or elements involved in the transfer mechanism (e.g. IS elements, tRNA genes, integrase, and prophage) were filtered out, leaving only 46% of the genomic regions containing virulence factors. Widespread distribution of conserved elements of many PAIs in different species and in even non-pathogens is due to their complex mosaic structures consisting of elements of different origins. PAI I~ III536 in E. coli 536 have mosaic-like structures consisting of many DNA fragments that show high similarities to the chromosomal regions of other pathogenic E. coli strains and Shigella flexneri. SPI-2 is a fusion of at least two genetic elements – a 25-kb region encoding the TTSS with a low G+C content and a 15-kb region encoding metabolic functions with a G+C content similar to the rest of the genome , and the Hrp PAI of Pseudomonas syringae has a tripartite structure .
Some virulence factors in PAIs are homologous to seemingly backbone genes. As shown in Figure 4, PAIs having extensive mosaic structures showed highly frequent occurrence in various species, and clusters of seemingly backbone genes could be removed from the list of the cPAIs by checking the presence of a GI in a PAI-like region. Many Gram-negative bacterial pathogens cause diseases by secreting and injecting virulence proteins (effectors) into the host cell via a specialized protein secretion mechanism (TTSS) . They are evolutionarily related to flagellar systems and often hard to distinguish when based only on homology searches . However, TTSSs are frequently transferred laterally between Gram-negative bacteria while flagellar systems are mainly inherited by vertical descent. This fact explains why many regions encoding flagellar biosynthesis genes have hits to PAI-like regions not showing anomalies in DNA composition (supplementary Table 4S) [see Additional file 4], while PAI-like regions overlapping GIs contain lots of TTSSs (Table 2). Iron uptake systems are important for bacterial survival as well as virulence . Many PAIs such as HPI of Yersinia species, SHI-2 of S. flexneri, and SRL of S. flexneri 2a YSH6000 carry genes encoding various siderophore systems that produce and secrete low-molecular-weight siderophores with extremely high affinities for ferric iron. Clusters of homologs of ferric dicitrate transport system (fecABCDEIR, Fec) of SRL  were widely distributed in the backbone genomic regions of various species, which implies that Fec might be the most ancient siderophore system (Figure 4, Table 2, supplementary Table 4S [see Additional file 4]). Interestingly, a 7.1-kb fecCDE-homologous region can be found even in Halobacterium sp. NRC-1, the only archaeon possessing the PAI-like region in this study. This region is inserted by a 6-phosphogluconate dehydrogenase gene, 3 hypothetical proteins and tRNA-Arg gene.
One of the difficulties when dubbing potential PAIs in the sequenced genomes is to determine the boundaries. A PAI may have a number of genes which have undergone many evolutionary stages and thus compositionally indistinguishable from the rest of the genome [2, 23]. This might be due to some parts highly adjusted to the base composition of the recipient's genome or to the backbone genomic segments added later in evolution . We found that the length proportion of transferred regions contained in the known chromosomal PAIs – 28.7 kb of LEE in E. coli O157 Sakai, 36.2 kb of Cag PAI in H. pylori 26695, 61.2 kb of VPI-2 in V. cholerae, and 137.5 kb of PAI in Enterococcus faecalis – vary from 0.19 to 0.65. Thus, compositional approaches cannot predict the boundaries of the detected PAI because they only detect atypical genomic region. To solve this problem, we detected genomic segments homologous to each known PAI, which were then clumped into a large genomic region. This procedure is somewhat like the process of fragment assembly in which a contiguous region (contig) is made from overlapping fragments in shotgun sequencing . Like the conserved sequences of TTSS structural genes , PAIs often share conserved regions. In addition, PAIs frequently carry relics of HGT event such as mobile sequence elements and association with tRNA genes at their boundaries . Islander , a database of potential integrative islands in prokaryotic genomes, detects GIs by identifying tRNAs or tmRNA genes, and candidate integrase genes. Although many GIs reported from the database were in accordance with our results, large portion was not annotated as cPAIs mainly due to the absence of homologs of virulence genes in known PAIs and PAIs that are not located at the tRNA loci. As illustrated in Figure 3, frequent distribution of conserved regions between PAIs allows our method to find the entire region of a PAI in a sequenced genome even though its similar sequence is partially known.
A typical genome sequencing team uses genes in the gene cluster or the genome sequence of interest as a query to search for any similar genes in the databases. Then, homologs of pathogenicity/virulence genes are inferred by checking whether descriptions of the retrieved genes have any indications that suggest virulence/pathogenicity or they are from pathogens. Because this approach depends on the examiner's knowledge on known PAIs or pathogenicity/virulence genes and entry descriptions of the retrieved genes often are not informative to infer the function, it is never sure whether the searches thoroughly picked up all the genes associated with PAIs or pathogenicity/virulence. To avoid this uncertainty on the robustness of the open-ended search, we first collected all the reported PAI loci and used them as a query to search for homologs in the complete prokaryotic genomes. Our method guarantees that all the potential PAIs related to the known PAIs were searched without the intervention of human interpretation.
In completely sequenced genomes, we detected cPAIs that are homologous to the published PAIs and show anomaly in DNA composition. The methodology we developed in this study has a limitation in that the detected cPAIs are limited by the query data set of the known PAIs. This caveat, however, can be advantageous when the researchers only concern a specific set of PAIs. Furthermore, this approach can be easily extended to identify various genomic islands (e.g. fitness, metabolism, and resistance islands). Among the cPAIs detected in this study, omission of several well-known PAIs such as Hrp PAI of P. syringae and LIPI-1 of L. innocua is due to their DNA compositions similar to the core genomes which may caused by horizontal transfer from closely related strains or very ancient HGT event. Thus, patterns of best matches of each gene to different species, lineage-specific genes or transferred genes from phylogenetically distant species would be helpful in improving the possibility of finding GIs and PAIs. Also, accumulation of PAI sequence data in bacterial families other than the Enterobacteriaceae will lead to detection of more putative PAIs across various taxa. Finally, it should be noted that the identity of cPAIs as bona fide PAIs need to be confirmed by further experimental verification. We are currently improving the detection scheme and are developing a database for cPAIs in sequenced genomes.
We present the first computational framework combining feature-based analyses and similarity-based analyses. As shown in Figure 3, the similarity-based analysis that is reminiscent of the sequence-assembly procedure was proven to be an efficient method for demarcating the potential PAIs in our study. Also, the function(s) and origin(s) of a cPAI can be inferred by investigating the PAI queries comprising it. With the availability of rapidly increasing complete genome sequences  as well as PAI data, the proposed method will be useful in identifying potential PAIs in microbial genomes.
Collection of complete genomes and PAI Data
The sequence files of 148 prokaryotic complete genomes consisting of 157 chromosomes, including 17 archaeal ones as of January 2004 were downloaded from the NCBI FTP server (ftp://ftp.ncbi.nih.gov, supplementary Table 1S) [see Additional file 1]. We searched the GenBank database and literature [3, 23] for any descriptions of the "pathogenicity island". Forty five kinds of PAIs and 207 GenBank accessions containing either part or all of the reported PAI loci in 120 pathogenic bacteria, are summarized in Table 1. (see supplementary Table 2S for the complete information) [see Additional file 2]. The definition of virulence genes is difficult as their function may depend on growth conditions and host niches. Thus, we attributed this to the biologists who identified PAI loci, and virulence genes of PAI loci were identified by literature survey. Many PAIs, 29 out of 45 kinds of PAIs, came from Enterobacteriaceae. Thirty four PAI loci are completely sequenced ones ranging from 6.8 kb to 153.6 kb (average: 41.3 kb), and the remains are part of PAI. It should be noted that the collected sets do not contain PAIs which were reported from genome sequencing papers.
Detection of GIs in genome sequences
To detect GIs in a chromosome, we first identified horizontally transferred genes (H) based on the algorithm developed by Garcia-Vallve et al. . To alleviate false positives caused by applying single criterion for identifying HGT regions, we considered a gene as H only if both G+C content and codon usage are aberrant. For each genome, we have computed total G+C content ([G+C]T) and G+C contents at the first and third codon positions ([G+C]1 and [G+C]3) of every ORF. The compositional bias at the first and third positions were reported to be positively correlated to expressivity and genomic G+C content, respectively [10, 27]. Extraneous origin of the gene in terms of G+C content was considered if its [G+C]T deviates over 1.5 σ or if deviations of [G+C]1 and [G+C]3 are of the same sign and at least one of them is over 1.5 σ. Mahalanobis distance (dM) was used to evaluate deviation of the codon usage of a gene and mean of the genome . dM is a statistic in unit of standard deviation from the mean of 61 codon frequencies and can be calculated as follows:
d M (X, X mean ) = (X - X mean ) T S-1(X - X mean )
Where X and Xmean correspond to vectors having relative frequencies of the 61 codons for a gene and the mean values for a genome, respectively. S-1 is the inverse of variance-covariance matrix (S) of all the 61 codon frequencies. The higher this value is the more deviation in codon usage . If Xs are normally distributed, dMs can be converted to p-values using the χ2 distribution function. We considered a gene as extraneous in codon usage if its p-value was less than 0.05. It should be noted that genes longer than 300 bp were used for calculating the mean and standard deviation (σ) of G+C contents and dMs. This is from the observation that genes having shorter than 300 bp have much higher chance of anomalies in G+C content and codon usage.
We ran a genome scan of a 10-gene window and identified regions containing four or more H. This threshold frequency of 0.4 was inferred from the observation that the frequencies of H in known PAIs such as LEE of E. coli O157 Sakai, cag PAI of Helicobacter pylori 26695, VPI-2 of Vibrio cholerae, and a PAI of Enterococcus faecalis, were averaged 0.35. Neighbouring regions were merged into larger regions which were referred to as GIs in this study. Some genomic regions had highly biased G+C content compared to the whole G+C content of the chromosome, while their codon usage were not biased. For example, 46.4 kb genomic region ranging from 2,647,129 bp in Yersinia pestis KIM, which contains yersiniabactin genomic island  has considerably higher G+C content (55.7% versus 47.6% average for the whole genome), but showed a similar codon usage for the genes contained in this region. Thus, among genomic regions made from genes anomalous in G+C content, the region was added to GIs if its G+C(T) deviates more than 1.5 σ.
Identification of candidate PAIs
The detection scheme for the regions of cPAIs is outlined in Figure 1. Each ORF from PAI locus was used as the query in BLASTP searches  against the set of ORFs from each of the 148 completely sequenced genomes using PAM250 as scoring matrix for retrieving homologous genes in evolutionary distant strains. Likewise, homologs of ORFs, RNA genes and repeat regions of PAI locus on the nucleotide level were searched using BLAT, a modified BLAST alignment program which can stitch matched regions into a larger one . If the identity of the resulting hit is over 80% for DNA sequence or 25% for protein sequence and the aligned region is both over 70% of lengths of query and the hit, the pair of sequences was considered as a homolog. Genomic strips corresponding to each PAI locus were then obtained by identifying the regions containing four or more homologs of the genes from the same PAI accession and by merging the neighboring regions. Overlapping or adjacent genomic strips corresponding to the same or different kind of PAI loci were fused into a large region. Among these regions, PAI-like regions were identified by checking the presence of at least one gene homologous to a virulence gene on the PAI loci. We considered a candidate PAI (cPAI) only if the PAI-like region partly or entirely spans the GI.
We thank Drs. Seung-Hwan Park and Doil Choi for their heartful support to the project. This work was funded by the 21C Frontier Microbial Genomics and Applications Center Program, Ministry of Science and Technology, Republic of Korea
- Dobrindt U, Hochhut B, Hentschel U, Hacker J: Genomic islands in pathogenic and environmental microorganisms. Nat Rev Microbiol 2004, 2(5):414–424. 10.1038/nrmicro884View ArticlePubMedGoogle Scholar
- Schmidt H, Hensel M: Pathogenicity islands in bacterial pathogenesis. Clin Microbiol Rev 2004, 17(1):14–56. 10.1128/CMR.17.1.14-56.2004PubMed CentralView ArticlePubMedGoogle Scholar
- Hacker J, Kaper JB: Pathogenicity islands and the evolution of pathogenic microbes. Berlin , Springer-Verlag; 2002.Google Scholar
- Garcia-Vallve S, Romeu A, Palau J: Horizontal gene transfer in bacterial and archaeal complete genomes. Genome Res 2000, 10(11):1719–1725. 10.1101/gr.130000PubMed CentralView ArticlePubMedGoogle Scholar
- Karlin S: Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes. Trends Microbiol 2001, 9(7):335–343. 10.1016/S0966-842X(01)02079-0View ArticlePubMedGoogle Scholar
- Tu Q, Ding D: Detecting pathogenicity islands and anomalous gene clusters by iterative discriminant analysis. FEMS Microbiol Lett 2003, 221(2):269–275. 10.1016/S0378-1097(03)00204-0View ArticlePubMedGoogle Scholar
- Merkl R: SIGI: score-based identification of genomic islands. BMC Bioinformatics 2004, 5(1):22. 10.1186/1471-2105-5-22PubMed CentralView ArticlePubMedGoogle Scholar
- Eisen JA: Horizontal gene transfer among microbial genomes: new insights from complete genome analysis. Curr Opin Genet Dev 2000, 10(6):606–611. 10.1016/S0959-437X(00)00143-XView ArticlePubMedGoogle Scholar
- Wang B: Limitations of compositional approach to identifying horizontally transferred genes. J Mol Evol 2001, 53(3):244–250. 10.1007/s002390010214View ArticlePubMedGoogle Scholar
- Lawrence JG, Ochman H: Amelioration of bacterial genomes: rates of change and exchange. J Mol Evol 1997, 44(4):383–397.View ArticlePubMedGoogle Scholar
- Ragan MA: On surrogate methods for detecting lateral gene transfer. FEMS Microbiol Lett 2001, 201(2):187–191. 10.1016/S0378-1097(01)00262-2View ArticlePubMedGoogle Scholar
- Ren SX, Fu G, Jiang XG, Zeng R, Miao YG, Xu H, Zhang YX, Xiong H, Lu G, Lu LF, Jiang HQ, Jia J, Tu YF, Jiang JX, Gu WY, Zhang YQ, Cai Z, Sheng HH, Yin HF, Zhang Y, Zhu GF, Wan M, Huang HL, Qian Z, Wang SY, Ma W, Yao ZJ, Shen Y, Qiang BQ, Xia QC, Guo XK, Danchin A, Saint Girons I, Somerville RL, Wen YM, Shi MH, Chen Z, Xu JG, Zhao GP: Unique physiological and pathogenic features of Leptospira interrogans revealed by whole-genome sequencing. Nature 2003, 422(6934):888–893. 10.1038/nature01597View ArticlePubMedGoogle Scholar
- Kaneko T, Nakamura Y, Sato S, Asamizu E, Kato T, Sasamoto S, Watanabe A, Idesawa K, Ishikawa A, Kawashima K, Kimura T, Kishida Y, Kiyokawa C, Kohara M, Matsumoto M, Matsuno A, Mochizuki Y, Nakayama S, Nakazaki N, Shimpo S, Sugimoto M, Takeuchi C, Yamada M, Tabata S: Complete genome structure of the nitrogen-fixing symbiotic bacterium Mesorhizobium loti. DNA Res 2000, 7(6):331–338.View ArticlePubMedGoogle Scholar
- Dobrindt U, Agerer F, Michaelis K, Janka A, Buchrieser C, Samuelson M, Svanborg C, Gottschalk G, Karch H, Hacker J: Analysis of genome plasticity in pathogenic and commensal Escherichia coli isolates by use of DNA arrays. J Bacteriol 2003, 185(6):1831–1840. 10.1128/JB.185.6.1831-1840.2003PubMed CentralView ArticlePubMedGoogle Scholar
- Alfano JR, Charkowski AO, Deng WL, Badel JL, Petnicki-Ocwieja T, van Dijk K, Collmer A: The Pseudomonas syringae Hrp pathogenicity island has a tripartite mosaic structure composed of a cluster of type III secretion genes bounded by exchangeable effector and conserved effector loci that contribute to parasitic fitness and pathogenicity in plants. Proc Natl Acad Sci U S A 2000, 97(9):4856–4861. 10.1073/pnas.97.9.4856PubMed CentralView ArticlePubMedGoogle Scholar
- Vazquez-Boland JA, Kuhn M, Berche P, Chakraborty T, Dominguez-Bernal G, Goebel W, Gonzalez-Zorn B, Wehland J, Kreft J: Listeria pathogenesis and molecular virulence determinants. Clin Microbiol Rev 2001, 14(3):584–640. 10.1128/CMR.14.3.584-640.2001PubMed CentralView ArticlePubMedGoogle Scholar
- Lio P, Vannucci M: Finding pathogenicity islands and gene transfer events in genome data. Bioinformatics 2000, 16(10):932–940. 10.1093/bioinformatics/16.10.932View ArticlePubMedGoogle Scholar
- Dobrindt U, Blum-Oehler G, Nagy G, Schneider G, Johann A, Gottschalk G, Hacker J: Genetic structure and distribution of four pathogenicity islands (PAI I536 to PAI IV536) of uropathogenic Escherichia coli strain 536. Infect Immun 2002, 70(11):6365–6372. 10.1128/IAI.70.11.6365-6372.2002PubMed CentralView ArticlePubMedGoogle Scholar
- Hensel M, Nikolaus T, Egelseer C: Molecular and functional analysis indicates a mosaic structure of Salmonella pathogenicity island 2. Mol Microbiol 1999, 31(2):489–498. 10.1046/j.1365-2958.1999.01190.xView ArticlePubMedGoogle Scholar
- Hueck CJ: Type III protein secretion systems in bacterial pathogens of animals and plants. Microbiol Mol Biol Rev 1998, 62(2):379–433.PubMed CentralPubMedGoogle Scholar
- Kim JF: Revisiting the chlamydial type III protein secretion system: clues to the origin of type III protein secretion. Trends Genet 2001, 17(2):65–69. 10.1016/S0168-9525(00)02175-2View ArticlePubMedGoogle Scholar
- Luck SN, Turner SA, Rajakumar K, Sakellaris H, Adler B: Ferric dicitrate transport system (Fec) of Shigella flexneri 2a YSH6000 is encoded on a novel pathogenicity island carrying multiple antibiotic resistance genes. Infect Immun 2001, 69(10):6012–6021. 10.1128/IAI.69.10.6012-6021.2001PubMed CentralView ArticlePubMedGoogle Scholar
- Kaper JB, Hacker J: Pathogenicity islands and other mobile virulence elements. Washington, DC , American Society for Microbiology Press; 1999 .Google Scholar
- Myers G: Whole-genome DNA sequencing. Comput Sci Eng 1999, 1: 33–43. 10.1109/5992.764214View ArticleGoogle Scholar
- Mantri Y, Williams KP: Islander: a database of integrative islands in prokaryotic genomes, the associated integrases and their DNA site specificities. Nucleic Acids Res 2004, 32(Database issue):D55–8. 10.1093/nar/gkh059PubMed CentralView ArticlePubMedGoogle Scholar
- Fraser CM, Eisen JA, Salzberg SL: Microbial genome sequencing. Nature 2000, 406(6797):799–803. 10.1038/35021244View ArticlePubMedGoogle Scholar
- Gutierrez G, Marquez L, Marin A: Preference for guanosine at first codon position in highly expressed Escherichia coli genes. A relationship with translational efficiency. Nucleic Acids Res 1996, 24(13):2525–2527. 10.1093/nar/24.13.2525PubMed CentralView ArticlePubMedGoogle Scholar
- Deng W, Burland V, Plunkett III G, Boutin A, Mayhew GF, Liss P, Perna NT, Rose DJ, Mau B, Zhou S, Schwartz DC, Fetherston JD, Lindler LE, Brubaker RR, Plano GV, Straley SC, McDonough KA, Nilles ML, Matson JS, Blattner FR, Perry RD: Genome sequence of Yersinia pestis KIM. J Bacteriol 2002, 184(16):4601–4611. 10.1128/JB.184.16.4601-4611.2002PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Kent WJ: BLAT-the BLAST-like alignment tool. Genome Res 2002, 12(4):656–664. 10.1101/gr.229202. Article published online before March 2002PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.