Proteome-wide analysis of Coxiella burnetii for conserved T-cell epitopes with presentation across multiple host species

Coxiella burnetii is the Gram-negative bacterium responsible for Q fever in humans and coxiellosis in domesticated agricultural animals. Previous vaccination efforts with whole cell inactivated bacteria or surface isolated proteins confer protection but can produce a reactogenic immune responses. Thereby a protective vaccine that does not cause aberrant immune reactions is required. The critical role of T-cell immunity in control of C. burnetii has been made clear, since either CD8+ or CD4+ T cells can empower clearance. The purpose of this study was to identify C. burnetii proteins bearing epitopes that interact with major histocompatibility complexes (MHC) from multiple host species (human, mouse, and cattle). Of the annotated 1815 proteins from the Nine Mile Phase I (RSA 493) assembly, 402 proteins were removed from analysis due to a lack of inter-isolate conservation. An additional 391 proteins were eliminated from assessment to avoid potential autoimmune responses due to the presence of host homology. We analyzed the remaining 1022 proteins for their ability to produce peptides that bind MHCI or MHCII. MHCI and MHCII predicted epitopes were filtered and compared between species yielding 777 MHCI epitopes and 453 MHCII epitopes. These epitopes were further examined for presentation by both MHCI and MHCII, and for proteins that contained multiple epitopes. There were 31 epitopes that overlapped positionally between MHCI and MHCII across host species. Of these, there were 9 epitopes represented within proteins containing ≥ 5 total epitopes, where an additional 24 proteins were also epitope dense. In all, 55 proteins were found to contain high scoring T-cell epitopes. Besides the well-studied protein Com1, most identified proteins were novel when compared to previously studied vaccine candidates. These data represent the first proteome-wide evaluation of C. burnetii peptide epitopes. Furthermore, the inclusion of human, mouse, and bovine data capture a range of hosts for this zoonotic pathogen plus an important model organism. This work provides new vaccine targets for future vaccination efforts and enhances opportunities for selecting multiple T-cell epitope types to include within a vaccine.


Introduction
The obligate intracellular bacterium Coxiella burnetii is the causative agent of Q fever in humans [1][2][3]. Centers for Disease Control and Prevention identified this bacterium as a category B agent due to the low infectious dose, environmental stability, and aerosolized spread of the bacterium [2,4,5]. Humans infected with C. burnetii may present with a variety of different symptoms, ranging from asymptomatic to acute and further to chronic disease [3,6]. Acute disease is typically characterized by flu-like symptoms, consisting of fever, fatigue, and chills [6]. Individuals which progress to chronic disease most commonly have endocarditis with culture negative blood, where hepatitis and chronic fatigue syndrome have also been described. C. burnetii is endemic worldwide, except for New Zealand, and most human outbreaks are blamed on domestic agricultural animals acting as reservoirs of the bacterium [3,6,7]. Cows, sheep, and goats represent the main animals of interest, where these animals also contract disease when exposed to C. burnetii [1,5,6,8]. Coxiellosis in the small ruminant species, goats and sheep, tends to present with late-term abortions [8,9]. While cattle may present with late-term abortions, they are more frequently affected by a decrease in calf birthweight or subclinical mastitis [8]. C. burnetii is found in large numbers within the placenta of aborted neonates but detection of the bacterium in the urine, milk, uterine fluid, vaginal mucus, and feces of parenteral animals has also occurred [7,8,10,11].
The most widely accepted vaccines against Q fever, or coxiellosis, are known as Q-vax and Coxevac, where the vaccine contains either the Henzerling or Nine Mile Phase I (RSA 493) isolate of C. burnetii fixed with formalin [1,7,10,[12][13][14]. These vaccines are not available within the United States [1,13]. Q-vax is used for human vaccination in Australia and is known to cause adverse side effects in individuals which have had previous exposure to the bacterium [12,13]. Contrastingly, Coxevac is exploited in Europe for vaccination of agricultural species, wherein this vaccine was used to attempt containment of the 2007-2010 Netherlands outbreak [7,10]. Either of these vaccination techniques require the producer to culture large amounts of a category B bacterium, a process that is both costly and hazardous [10,12]. Therefore, investigation into new vaccines has been initiated through isolation of surface antigens or identification of seroreactive proteins [15,16]. While surface isolated proteins can confer protection, it does not eliminate the cost or safety concerns during product generation.
A clear need exists for low cost, broadly applicable vaccines and especially those that can be produced in safer biosafety level 2 conditions. Subunit vaccines can meet this need, and a new generation of work on C. burnetii vaccines has begun based on specific epitope definition. Multiple studies have identified small numbers of epitopes used in human or mouse immune responses, and a few studies have produced subunit vaccines [13,14,[17][18][19]. The general conclusion of such work has been that multiple epitopes will be needed to achieve protective immunity [13,19]. The next challenge is to achieve comprehensive, genome-wide evaluation of potential key epitopes coupled with optimization to achieve broad protection across the multiple host species of this zoonotic pathogen.
Bioinformatic tools have been developed to more quickly and cost effectively assess proteins as host antigens [20][21][22][23]. This strategy is known as reverse vaccination development, wherein in silico methods cut down the number of initial screening experiments required to identify putative stimulants of the adaptive immune response [20,24,25]. In silico techniques assess the antigenic ability of peptides by modeling their potential immune system interactions as T-or B-cell epitopes [20,22]. Identification of T-cell epitopes typically evaluates the ability of peptides to be loaded into major histocompatibility complexes, either MHCI or MHCII, wherein both play an important role in the adaptive immune response [21,22]. MHCI molecules are present on all nucleated host cells and define whether a host cell has been compromised by an invading pathogen [26]. On the other hand, MHCII molecules decorate antigen presenting cells, which function to aid in the initiation of an organized adaptive immune response [21,22,27].
Success in the use of T-cell epitope predictors has been seen in rapidly mutating viruses, like HIV and influenza, and in fastidious bacteria [1,20]. More specifically, the Brucella mellintensis protein Omp31 has been of major study during multi-subunit vaccine development against this bacterial agent [28][29][30]. Research looking into peptide recognition by human monoclonal antibodies isolated similar peptide fragments as B-cell epitope bioinformatic predictors [28,29]. Additionally, random peptide generation from the Omp31 amino acid sequence allowed for IFN-γ production by T-cells in sheep, wherein the major epitope of interest was bioinformatically determined to be a T-cell epitope in humans later on [29,30].
For C. burnetii, addition of either CD4 + or CD8 + T lymphocytes alone to infected SCID mice was sufficient to achieve immune control of C. burnetii [31]. C. burnetii clearance by macrophages has been shown to rely on IFN-γ production by T-cells during the adaptive immune response, which requires accurate loading of antigenic peptides into MHCII molecules for T-cell presentation [13,15,21,32]. Accompanying these data are knockout mouse models that promote the importance of CD8 + T-cells in controlling bacterial replication and host tissue pathology, suggesting that MHCI peptide loading also plays an important role during C. burnetii infection [27,31]. Furthermore, it is presumed that cytotoxic T-cells acting on infected host cells degrades availability of the intracellular niche required by this bacterium [27]. While B-cell depletion suggests a role in tissue pathology during C. burnetii infection, the inability to link humoral immune responses to restricted bacterial replication suggests that B-cells are not a major player in the control of disease [31,33]. Thus, this work will focus on identification of T-cell epitopes supporting these beneficial immune responses. Many previous works investigating C. burnetii epitopes have focused on known type IV secretion system (T4SS) effectors or proteins eliciting antibody response [14,17,19]. The following work will provide the first comprehensive analysis of C. burnetii T-cell epitopes on a proteome-wide scale. This will also be one of the few applications to investigate a bacterial proteome, since most prior work has focused on smaller viral proteomes [34]. Furthermore, we will incorporate data from a range of C. burnetii isolates to identify conserved epitopes with broad utility and leverage predictions from human, mouse, and ruminant hosts to facilitate development of optimally useful vaccines for this zoonotic pathogen.

Conserved Coxiella burnetii proteome
C. burnetii isolates are genetically diverse, wherein they secrete different type four secretion system effectors, contain antigenic variation, and form a plethora of genomic groups based on multiple loci variable number of tandem repeats analysis (MVLA) [6,16,[35][36][37]. For this reason, a proteome-wide comparison between Coxiella isolates was completed to ensure pursuit of epitopes within conserved proteins. Nine Coxiella burnetii isolates were referenced against Nine Mile Phase I (RSA 493) during proteome-wide comparison. Each strain, with its genomic grouping, tissue of isolation, characteristic of interest, and human virulence, if known, are listed in Table 1. Two genomic group four isolates were chosen based on the observation that this genomic group contains the highest amount of genomic variance between contained isolates [37].
The tested isolate with the highest percent identity to Nine Mile Phase I (RSA 493) is Ohio 314 (RSA 270) (Fig. 1). This is expected as both isolates belong to genomic group I, indicated by Hemsley et al. [37]. The isolates demonstrating the lowest percent identity compared to Nine Mile Phase I (RSA 493) are Dugway 5J108-111, MSU Goat Q177, Schperling, and CbuG_Q212. The prior strains come from genomic groups IV to VI and represent more divergent isolates as compared to Ohio 314 (RSA 270). Analysis of the overall number of absent or low conservation proteins compared to Nine Mile Phase I (RSA 493) revealed variation between C. burnetii isolates ( Table 2). In agreement with  the pictorial representation of the proteome-wide comparison, less related genomic groups trended towards an increase in the number of absent and unconserved proteins. One exception to this trend was genomic group II-b isolate Z3055, which was missing 201 proteins when compared to Nine Mile Phase I (RSA 493), similar to genomic groups IV-VI. Previous examination of Z3055 has demonstrated that this isolate has an increase in the number of non-synonymous mutations, insertions, and deletions [38,41]. A total of 352 proteins were removed upon the basis that the Nine Mile Phase I (RSA 493) proteome lacked a homolog in one of the nine isolates aligned. These predominantly consisted of hypothetical proteins and transposases as opposed to better studied proteins. Overall, proteome-wide comparison between C. burnetii isolates and Nine Mile Phase I (RSA 493) resulted in the identification of 1,413 conserved proteins.

Determination of host homologs in Coxiella burnetii
During epitope identification, and future vaccine generation, it is necessary to avoid sensitizing the host's immune system against itself. Therefore, the resultant protein list was queried using Blastp analysis against the host species of interest (cow, sheep, goat, and human) and the murine disease model for C. burnetii. BlastGrabber analysis determined that 391 of 1,413 C. burnetii conserved proteins shared homology with species of interest [45]. Thus, the final list of C. burnetii proteins for further analysis consisted of 1022 proteins and an overview of the protein selection process can be seen in Fig. 2 (Additional File 1).

Human and Murine MHCII Epitopes Present in C. burnetii
Once a list was generated that contained conserved C. burnetii proteins, which lacked host homology, it was possible to exploit NetMHCIIpan 4.0 to define MHCII epitopes. While every murine allele was tested, there were an abundance of human alleles known. To mitigate the number of human alleles, allelic frequency, geographical abundance, and phylogenetic distance were considered (Methods and Additional file 2A/B). In the end, 206 human allelic pairings were chosen to represent common alleles within major clades for MHCII epitope inquiry. Proteome-wide analysis of program derived 15mer peptides returned a total of 293,520 peptides tested. Of these, there were 67,528 peptides that did not bind any of the human alleles. Furthermore, there were 184,615 peptides that did not bind any of the murine alleles. After screening previously identified epitopes to harmonize quality control metrics (Additional files 3 and 4), we found an average binding score  of 186 (90%) or strong interaction with 93 (45%) allelic pairings examined during human analysis. On the other hand, the comparison between the datasets for murine analysis delineated an average of 8 (100%) bound alleles or 5 (65%) alleles with strong peptide interaction. Use of these defined numbers to filter the output data returned 1217 and 4072 MHCII epitopes for human and mouse, respectively (Additional file 5). A composite list highlighting MHCII epitopes recognized by both species may be found in Additional file 6 and Fig. 2 summarizes the generation of the composite list. Epitopes that were less than seven amino acids apart were treated as one epitope and the position with the highest human peptide:allele interaction value was retained.
Overall, there were 453 peptides, corresponding to 338 total proteins, determined to bind a high number of human and murine alleles or interact with many of the tested alleles strongly. Peptides within this data set that bound to 100% of the tested alleles or proteins that contained greater than or equal to 3 epitopes were isolated to further consolidate the data. Ten peptides bound all 206 human alleles (Table 3). A total of 347 peptides bound all 8 murine alleles (Additional file 7). This is not surprising considering the initial data examination filtered the murine output by focusing on peptides that bound 100% of the alleles analyzed. Marked epitopes within Additional file 7 represent peptides that were one to seven amino acids removed from the epitope observed in Additional file 6; where human peptides with higher binding events were kept during discrepancy in Additional file 6, Additional file 7 retained epitopes that had higher numbers of peptide:allele binding events when considering murine alleles. Of the ten peptides that bound every human allelic pair tested, only one, 9-DKEIRAISDYVVNHK-23 of AAO90441.1 (prpD), did not bind all eight murine alleles analyzed.
Evaluation for epitope dense proteins consisted of data consolidation through isolation of proteins containing a high number of epitopes [24,46]. Analysis of the 338 proteins with high scoring MHCII-epitopes determined that there were 85 proteins with more than one epitope present. Examination of proteins with three or more Table 3 Human MHCII epitopes with presentation by an exceptional range of host alleles Pos indicates the peptide/epitope starting position within the protein sequence. GenBank IDs, gene names, and locus tags are the assembly annotations given on NCBI. NB, WB, and SB represent the total number of alleles bound, the number of alleles bound weakly, and the number of alleles bound strongly by the indicated peptide respectively. Location of the proteins was assigned based on Inmembrane, where PSE designates potentially surface exposed proteins. The bolded row indicates the protein not represented in the murine data when filtering for epitopes binding 100% of alleles tested epitopes present shortened this list to 20 proteins (Table 4). Notably, three epitope dense proteins also had epitopes that bound every human and murine allele tested; these were AAO89704.2 (ftsA), AAO90965.2, and AAO91357.1 (parC). Furthermore, AAO90965.2, along with AAO90357.1 (parC), encompassed the highest number of epitopes per protein with 5 total epitopes present in either protein.

Human, murine, and bovine MHCI epitopes
It has become increasingly evident that CD8 + T-cells play just as important of a role during resolution of C. burnetii infection as CD4 + T-cells [27,31]. While MHCII epitope prediction allows determination of antigenic peptides for CD4 + T-cells, there are also MHCI epitope prediction programs available that can help identify antigenic peptides specific for CD8 + T-cell recognition [20,21,23]. One such program is NetMH-Cpan 4.1, which has recently been re-trained in its ability to recognize bovine MHCI epitopes, thereby allowing study of another host species of interest [47]. The same list of conserved C. burnetii proteins without host-similarity was tested against human, mouse, and bovine MHCI alleles. Similar to NetMHCIIpan 4.0, NetMHCpan 4.1 has a large number of human alleles available for testing. Therefore, phylogenetic trees and geographical frequency of alleles were exploited to alleviate the total number of human alleles run (Methods and Additional file 2C/D), where a total of 82 human alleles were examined during NetMHCpan 4.1 analysis. In addition, we tested all 8 murine alleles and all 105 bovine alleles present on the server.

Table 4 MHCII epitope-dense proteins
The epitope count designates the number of epitopes present within a protein. NCBI defined information is present in GenBank ID, gene name, and locus tag columns. Location is interpreted from the program Inmembrane, PSE (potentially surface exposed)

GenBank ID Epitope Count Gene Name Locus Tag Location
NetMHCpan 4.1 generates 8-, 9-, 10-, and 11-mer peptides during allele binding assessment, thereby 1,196,564 peptides were generated and tested in their ability to interact with human, murine, and bovine alleles. The number of peptides that did not bind any alleles varied per species and were 783,576; 1,033,923; and 842,516 for human, murine, and bovine respectively. MHCI epitopes have been less widely studied and are therefore less represented in Additional file 4. Accordingly, there were fewer epitopes to aid in the determination as to where the output cut-off values would reside for data filtration. Comparison of these previous epitopes with the present data output determined an average of 51 (62%) bound alleles or a strong interaction with 18 (22%) alleles. While this allowed for a relatively stringent cut-off for the number of peptides binding alleles, the output list was increased by two-to four-fold when peptides that interacted strongly with twenty percent of alleles were included. For this reason, the quantity of alleles strongly bound was restricted to the lower value, 45% of alleles, from MHCII analysis. In examining alleles that bound either 60% of alleles tested or 45% of alleles strongly, there were 1,367 human peptides, 5,355 murine peptides, and 4,438 bovine peptides returned (Additional file 8). As before, the output was searched for duplicate GenBank IDs and positions. A number of returned peptides were only present in murine and bovine analyses, manual annotation thereby allowed for identification of plausible epitopes in all three species tested (Additional file 9).
Data annotation to isolate epitopes represented in human, murine, and bovine species returned 777 MHCI epitopes within 489 different proteins. The data was further evaluated by looking for peptides binding a high number of alleles or for epitope dense proteins. Contrary to MHCII epitope data, there were not any peptides that bound all the bovine or human alleles tested. In order to analyze peptides that bound a high number of alleles tested, the cut-off value was lowered to 98% alleles bound. This returned 17 peptides binding 103 alleles in cattle and 171 peptides binding 8 alleles in the mouse (Table 5 and Additional file 10). This new definition of high allelic binding continued to lack peptide records within the human analysis. The stringency was therefore further lowered to look at peptides that interacted with 90% of the human alleles tested, which led to the identification of 3 human peptides (Table 5). Table 5 shows that highly bound peptides with the most extreme scores do not overlap between the human and bovine species. In comparing human peptides that show exceptional binding to those peptides binding many alleles in the murine species there is only one coinciding protein, AAO91456. Within this shared murine and human protein, the peptide is positionally located at amino acid 54 for human and 261 for the mouse. Contrastingly, the bovine highly bound peptides are predominantly identical to those found within the murine data, where only proteins, AAO89868.2, AAO89977.1, and AAO90780.1, do not coincide. Of these, AAO89868.2 and AAO90780.1 are not represented within the murine data and AAO89977.1 has an epitope present in an alternate position.
In studying MHCI epitopes for epitope dense proteins, we found a higher number of epitopes per protein (7 in AAO91182.1) was achieved as compared to a maximum of 5 MHCII epitopes (Table 6). There were 28 proteins classified as epitope dense when assessing the MHCI epitope data for proteins with four or more epitopes. Of the epitope dense proteins identified, there was one present in the human analysis, twenty-one present in mouse data, and two present in bovine analysis when comparing the proteins identified as containing epitopes with high allelic coverage (Table 5 and Additional file 10). Human analysis identified CBU_1967, where cattle analysis contained proteins CBU_0425 and CBU_1686. The epitope dense proteins that were missing in the murine high allelic output were CBU_0685, CBU_1226, CBU_1228 (qseC), CBU_1242, CBU_1489 (lpxH), CBU_1928, and CBU_1978 (ostA).

Consolidation of epitopes or proteins from MHCI and MHCII data
Assessment of the C. burnetii proteome for both MHCI and MHCII epitopes enables identification of multi-use epitopes and proteins. There were 31 epitopes that had overlapping use by MHCI and MHCII (Table 7). Of these epitopes, only one has been previously studied and is present in Additional file 4; this is Com1 (CBU_1910) [9,13,14,[17][18][19]. Other notable aspects were that some of the epitopes constituted a complete overlap whereas others were mildly overlapped. In total, eleven of the thirty-one epitopes completely overlapped between identified MHCI and MHCII epitopes. Furthermore, Inmembrane predicted that approximately fifty percent of the epitopes were cytoplasmic and that the remaining fifty percent were in some way associated with the bacterial membrane. Table 5 Human and bovine MHCI epitopes with presentation by an exceptional range of host alleles NetMHCpan 4.1 defined MHCI epitopes that bound 74-76 (greater than or equal to 90% total) human tested alleles or 103 (98% total) bovine tested alleles. Positions delineated with asterisks indicate that the protein associated is not found within murine data encompassing 98% of bound alleles. Total alleles bound, weak peptide interaction with alleles, and strong peptide interaction with alleles are quantified by NB, WB, and SB respectively. Protein information is outlined in columns containing the GenBank ID, gene name, and locus tag, where this information is defined through Nine Mile Phase I (RSA 493) assembly on NCBI. Pos dictates the peptide's starting position within the protein of interest and species indicates in which species the peptide was tested for allelic interaction. Location was defined through the use of Inmembrane GenBank IDs from MHCI and MHCII output summary tables, Additional files 6 and 9, were combined to determine if additional epitope dense proteins would be observed. The resultant proteins can be seen in Table 8, where 33 epitope dense proteins were identified with at least 5 epitopes. Seven of these proteins were not previously identified when looking at either MHCI or MHCII epitope dense proteins alone (GenBankIDs are AAO89890.1 (thiDE), AAO90155.1 (yaeT), AAO90323.2, AAO90990.2, AAO91128.1 (icmO), AAO91393.1, and AAO91455.1 (hemA)). Additionally, there were 19 proteins absent from the combined epitopes dense protein list that were previously encompassed in either the MHCI or MHCII data. Many of the proteins which were lost in the combined epitope dense protein table represent proteins containing the number of epitopes near the bottom of the previous cut-off values. None of the previously studied proteins in Additional file 4 were present as an epitope dense protein in the unified MHCI and Table 6 Epitope dense proteins during MHCI epitope analysis Highly interactive MHCI epitopes that contained greater than or equal to 4 epitopes within all three species studied, human, murine, and bovine. The number of epitopes within a protein is quantified under epitope count. The protein is classified through the GenBank ID, gene name, and locus tag. Inmembrane was exploited to define the location of bacterial proteins. An asterisk next to the GenBank ID indicates that this protein has previously been studied for interaction with the immune system  Table 8. Nine of the epitope dense proteins also contained overlapping epitopes; however, these epitopes were considered separate during quantification due to their binding alternate immune major histocompatibility complexes. In comparing MHCI and MHCII epitope results it was possible to elucidate epitopes or proteins that could stimulate both cytotoxic T-cells and T-helper cells.

Discussion
We sought to leverage both C. burnetii and host genomic diversity to predict widely useful T-cell epitopes across a range of hosts for this zoonotic pathogen. Epitopes were identified by leveraging an array of MHCII and MHCI alleles for antigen     The epitope type is defined in the T-cell epitope column. Protein information is outlined in the following columns: GenBank ID, gene name, and locus tag, where this information is defined through Nine Mile Phase I (RSA 493) assembly on NCBI. Pos dictates the peptides starting position within the protein of interest and location was defined through the use of Inmembrane presentation, thereby capturing epitopes incorporated in both MHC systems across multiple host species. The results highlight broadly useful epitopes, including many with minimal prior study, that can be used for future work and vaccine development.
Foundational data aimed to capture broad representation of C. burnetii and focus on proteins that would avoid self-reactive antigens. In particular, we selected at least one sequence from each genomic group (Table 1), including the relatively minimal genome of virulent Nine Mile Phase I (RSA 493) as a reference. This resulted in a refined list of 1413 conserved proteins for further analysis. This list was further screened for homology within human, mouse, and ruminant host proteins to avoid stimulating potential autoimmune responses. 391 such proteins were identified, suggesting large-scale use of host protein domain structures by C. burnetii. During assembly of the protein query list, it became apparent that a substantial number of annotated genes within the Nine Mile Phase I (RSA 493) genome lack discovery work and that many underlying functions are suggested by homology to alternate bacterial proteins. This promotes analyzing the bacterial proteome in its entirety, as the importance of many C. burnetii proteins has yet to be determined.
Relatively few Gram-negative bacteria have been examined for T-cell epitopes on a proteome-wide basis [34], leaving much of the previous epitope studies examining effector proteins or proteins residing at the cellular surface [24,[48][49][50]. This is no exception for studies examining C. burnetii proteins for host cell epitopes, wherein previous work has focused on proteins injected into the host cytoplasm by the type four secretion system (T4SS) or proteins which elicit an antibody response [13,14,17]. Resolution of C. burnetii infection is known to rely on the production of a Th1 type immune response that results in the production of IFN-γ [15,32,33]. This immune response is accomplished by coordination of T-helper cells through interaction with MHC class II peptide loaded molecules and a harmonized cytokine environment [22]. Therefore proteome-wide analysis for C. burnetii contained epitopes began with identifying MHC class II interacting peptides (See Repository). The MHC class II analysis herein identified numerous epitopes with relatively high allelic interactions (Additional file 6), many with cross-species presentation (Additional file 7). Some had presentation by an exceptional range of host alleles (Table 3), and many were clustered in epitope dense proteins of special interest (Table 4). Studies looking at the importance of different immune cellular subsets during C. burnetii infection has led to increased interest in CD8 + T-cell stimulation, which requires MHC class I presentation of peptides [27,31]. As such, similar methodology was implemented to identify epitopes binding an exceptional number of host MHC class I alleles (Table 5 and Additional file 8) and epitope dense proteins characterized by MHC class I binding ( Table 6).
The Dugway 5J108-111 isolate of C. burnetii represents the only known avirulent strain included in the following analysis and was included to exemplify the high degree of genomic variability contained between bacterial isolates [37,39,41]. Discarding the Dugway 5J108-111 isolate would result in the addition of thirteen proteins to the analysis, where two would be removed upon identification of host homologs (Additional file 12A). Examination of the remaining eleven proteins determined that their inclusion would minimally alter the data included herein, as only three new MHCI T-cell epitopes with cross-species representation were discerned (Additional file 12B). Notably, none of these additional epitopes bound an exceptional number of alleles tested nor did they encompass epitope dense proteins.
Examination of either the MHC class I or II datasets demonstrates the return of proteins which have not previously been studied for T-cell epitopes. As mentioned before, much of the earlier work identifying T-cell epitopes has focused on certain protein subsets [9,13,14,16,19]. Therefore, return of novel epitope-containing proteins does not preclude epitopes defined within this work; instead, these epitopes may represent more immunogenic peptides that exemplify a range of host species. For example, a group of novel epitope-containing proteins can be seen within the MHC class II and I datasets and are responsible for bacterial cell division, encompassing AAO89704.2 (ftsA), AAO89682.2 (ftsI), and AAO90095.2 (rodA) [51]. The MHC class I analysis for bacterial epitopes supports the addition of a ruminant species to the dataset. It is believed that many human outbreaks arise from domestic ruminants, consisting of sheep, goats, and cattle, therefore vaccination efforts in ruminants may help in the prevention of zoonotic spread [3,6,7]. Furthermore, coxiellosis in animals does not come without consequence, where sheep and goats present most frequently with late-term abortions and cattle have decreased birthing weights and possible mastitis [8]. Consequently, Coxiella burnetii infection in these species causes clear economic losses and requires intervention.
A potential pitfall of bioinformatic analysis of T-cell epitopes is the possibility of false positives [14,21,52]. This hinderance has been largely combated through the inclusion of more MHC ligand elution data during server training [21,23,47]. During this research, alleviation of false positives was attempted by assessing a plethora of different MHCI and MHCII alleles and investigating the peptides which had high allelic coverage. It is presumed that false positives arise due to a lack of training data between alleles and that analysis of a myriad of alleles would promote dilution of false positives [21,47,52]. When considering the 8 murine alleles tested during use of either NetMHCpan 4.1 or NetMHCIIpan 4.0, as compared to either 82-206 human alleles or 105 bovine alleles, it is noticeable that there were an increasing number of peptides falling within the filtered data sets (Additional files 6 and 8). This data is suspected to contain a number of false positives, but comparison with high binding peptides of human and cattle alleles is believed to lessen this burden. Previous research on C. burnetii defined T-cell epitopes have used methodologies that measure the ability to achieve host T-cell activation in response to epitopes of interest; including EliSpot, ELISA, flow cytometry, and peptide loading into MHCs [13,14,18,19]. It remains imperative to test returned T-cell epitopes for their ability to interact with the host immune system before production of vaccine candidates may begin.
Once data had been acquired for both MHC class I and II alleles, it became possible to cross-analyze outputs. Investigation into overlapping MHC class II and I epitopes defined 31 peptides of interest (Table 7). Com1, a well-studied C. burnetii protein of interest, was represented within this output. Importantly, former analysis of Com1 as a vaccine candidate against C. burnetii has demonstrated a decent amount of promise [13,18,19]. Specifically, mice exposed to Com1 were afforded better protection during challenge assays and produced IFN-γ during immune system stimulation. Unfortunately, Com1 was categorized as a secreted protein by Inmembrane, where it is a well-studied surface associated protein [16,18,36]. It is likely that there is a secondary processing step that is not recognized by Inmembrane. This does not disqualify the overall purpose for such notation, as many vaccination efforts have focused on surface proteins, where it is believed that these proteins most readily interact with the immune system during infection [1,25,53]. While care should be taken regarding protein location, proteins residing at the level of the membrane or that are secreted would suggest improved immune recognition.
Com1 did not remain in the MHC class I and II cross-analysis when assessing for epitope dense proteins (Table 8). Likewise, none of the previously studied proteins present in Additional file 4 are represented in the 33 epitope dense proteins composed from MHC class I and II data. Of these novel epitope-containing proteins, there were seven that were not returned when assessing MHC class I or II epitope dense proteins alone. These are AAO89890.1 (thiDE), AAO90155.1 (yaeT), AAO90323.2, AAO90990.2, AAO91128.1 (icmO), AAO91393.1, and AAO91455.1 (hemA), which represent epitope rich proteins that have a balanced MHC class I and II coverage. Three of the previously mentioned proteins are designated as secreted or membrane exposed proteins by Inmembrane, AAO90155.1 (yaeT), AAO91128.1 (icmO), and AAO91393.1. Therefore, these proteins are suggested to more readily interact with the immune system upon arrival of the bacterium within host tissues. IcmO and YaeT are significant proteins in regards to host:pathogen interaction as IcmO is part of the multi-subunit T4SS and YaeT is responsible for assembly of beta-barrel surface proteins [54][55][56].
Cross-analysis between MHC class I and II data allows for future vaccination efforts to cover both classes of T-cell epitopes. Furthermore, the investigation herein also aids in epitope decision with regards to alternate vaccine types. For instance, identified epitope dense proteins provide a source of epitopes which can partake in a vectored vaccine [20,34]. On the other hand, when looking at proteins that contain overlapping MHCI and MHCII epitopes, there is the possibility of using the epitopes in a heterologous recombinant subunit vaccine. As a result, the provided data allows for vaccination efforts against Coxiella burnetii to move forward without restrictions on the approach to be used.

Conclusions
These data represent the first comprehensive, proteome-wide examination of T-cell epitopes for C. burnetii. The use of multiple divergent C. burnetii isolates enabled the identification of widely conserved proteins and epitopes to empower future work. Furthermore, the use of multiple host species for antigen presentation analyses supports the existence of widely conserved epitopes that can be broadly useful across many host species for this zoonotic pathogen. The specific results highlight many proteins and epitopes not previously described in regards to host immune recognition, and in so doing provide useful direction for future work in developing epitope-rich vaccines.

Homolog identification in the host species
Nine Mile Phase I (RSA 493) proteins found to be conserved between C. burnetii isolates were entered as a multi-FASTA file onto the Blastp server and analyzed for homologs present in host species. The host species tested and their taxonomic Id's are as follows human (txid 9606), mouse (txid 10,088), cow (txid 9913), goat (txid 9925), and sheep (txid 9940). BlastGrabber was exploited to analyze results obtained from NCBI's basic local alignment search tool (BLASTp) [45]. An E-value cut-off of 0.01 (1e −2 ) and a percent identity greater than 35% was set based on previous experimental methods used to remove host homologs from analysis [24,63,64].

Phylogenetic analysis for human MHC alleles
The top ten most common MHCI alleles for eleven global regions were determined using the Allele Frequency Net Database (AFND) (http:// www. allel efreq uenci es. net/ defau lt. asp) [65,66]. Duplicate alleles were removed from the resultant list and protein FASTA sequences were obtained from the International Immunogenetics Information System/ Human Leukocyte Antigen (IMGT/HLA) database (https:// www. ebi. ac. uk/ ipd/ imgt/ hla/) [67]. Of the remaining MHCI alleles, there were three allelic FASTA sequences that were no longer available within the database and were therefore excluded going forward; these were A*29:25, A*29:50, and A*02:264. Phylogenetic trees were built using MEGA X, wherein 1,000 bootstraps were run during the construction of both a neighbor-joining and maximum likelihood tree [68]. Afterwards, the trees were condensed so that only bootstrap values above 80 were involved in branch generation (Additional file 2C/D). If MHCI alleles were closely related, then a representative allele was chosen based upon its representation within the annotated geographic regions denoted by the AFND. There were 83 human MHCI alleles chosen for epitope analysis from NetMHCpan 4.1. The MHCII DRB1 locus has annotated data for the top ten alleles for each of the eleven geographic regions on AFND. Contrastingly, the DPA1, DPB1, DQA1, and DQB1 loci did not have region associated data. Alleles in these alternate loci were chosen based on an allelic frequency that was greater than or equal 0.05 in any one geographic region, where the database was filtered for gold and silver data that were obtained from available literature [65]. Protein FASTA sequences were again obtained from the IMGT/ HLA database. Notably, DRB1*04:140, DRB1*04:155, DRB1*12:09, DPB1*26:01:01, DPB1*101:01, DQA1*05:02, and DQB1 02:03:01 MHCII alleles were partial sequences and were removed from further analysis. MEGA X was used to make a neighbor-joining and maximum likelihood tree with the remaining MHCII alleles using a minimum of 999 bootstraps per analysis (Additional file 2A/B) [68]. The remainder of the MHCII analysis was completed as described above for the MHCI analysis. There were 28 DRB1, 4 DPA1, 27 DPB1, 10 DQA1, and 7 DQB1 alleles chosen for epitope inquiry, governing a total of 206 allelic parings.

Identification of human, murine, and bovine MHC epitopes
Conserved Nine Mile Phase I (RSA 493) proteins lacking homology to host species were loaded onto the NetMHCpan 4.1 database for analysis across multiple host species (https:// servi ces. healt htech. dtu. dk/ servi ce. php? NetMH Cpan-4.1) and (http:// www. cbs. dtu. dk/ servi ces/ NetMH Cpan/) [23,47,69] Of the approximately 3,000 human MHCI alleles, 83 were chosen based upon locus frequency within defined populations, representation of alleles in more than one region, and greater evolutionary distance as discerned by phylogenetic tree analysis. During this investigation it was determined that allele B*13:07 N was not available for assessment on NetMHCpan 4.1, decreasing the number of human alleles assessed to 82. There were 8 murine MHCI alleles present, which sought to represent the available inbred strains of lab mice. Lastly, 105 BoLA (bovine leukocyte antigens) MHCI alleles were recently trained for server inclusion and allowed for representation of a host ruminant species. Each of these MHCI allelic groupings were evaluated over the course of multiple program runs. A complete list of tested MHCI alleles can be found in Additional file 11. The threshold values were set at 0.5 for %Rank of a strong binder and 2 for %Rank of a weak binder during the assessment. Peptide length was kept at the baseline parameters, wherein this gave 8-, 9-, 10-, and 11-mer peptides in the output. NetMHCIIpan 4.0 was exploited to study peptides that can bind human or murine MHCII alleles (https:// servi ces. healt htech. dtu. dk/ servi ce. php? NetMH CIIpan-4.0) [21,23,70]. There were 8 murine MHCII alleles and 936 human MHCII alleles present on the given server, which generates thousands of human MHCII complexes. Human MHCII alleles to be tested were chosen based on the previously mentioned phylogenetic analysis. Threshold values identified a strong binder as a %Rank less than 2.0 and a weak binder as a %Rank greater than or equal to 2.0 and less than or equal to 10.0. The standard peptide length of 15 amino acids was kept during this investigation. A complete list of tested MHCII alleles can be found in Additional file 11. Positional output differed by one amino acid base between NetMHCIIpan 4.0 and NetMHCpan 4.1 (starting positions designated as 0 versus 1); therefore, all output data was standardized to achieve consistent positional designation.

C. burnetii proteome localization
The multi-FASTA file that contained conserved bacterial and nonhomologous host proteins was run through Inmembrane to determine each protein's localization within the bacterium [71]. The program coordinates runs for a combination of bioinformatic tools consisting of TMHMM, SignalP, LipoP, and HMMER [72][73][74][75].