Re-alignment of the unmapped reads with base quality score
© Peng et al.; licensee BioMed Central Ltd. 2015
Published: 18 March 2015
Based on the next generation genome sequencing technologies, a variety of biological applications are developed, while alignment is the first step once the sequencing reads are obtained. In recent years, many software tools have been developed to efficiently and accurately align short reads to the reference genome. However, there are still many reads that can't be mapped to the reference genome, due to the exceeding of allowable mismatches. Moreover, besides the unmapped reads, the reads with low mapping qualities are also excluded from the downstream analysis, such as variance calling. If we can take advantages of the confident segments of these reads, not only can the alignment rates be improved, but also more information will be provided for the downstream analysis.
This paper proposes a method, called RAUR (Re-align the Unmapped Reads), to re-align the reads that can not be mapped by alignment tools. Firstly, it takes advantages of the base quality scores (reported by the sequencer) to figure out the most confident and informative segments of the unmapped reads by controlling the number of possible mismatches in the alignment. Then, combined with an alignment tool, RAUR re-align these segments of the reads. We run RAUR on both simulated data and real data with different read lengths. The results show that many reads which fail to be aligned by the most popular alignment tools (BWA and Bowtie2) can be correctly re-aligned by RAUR, with a similar Precision. Even compared with the BWA-MEM and the local mode of Bowtie2, which perform local alignment for long reads to improve the alignment rate, RAUR also shows advantages on the Alignment rate and Precision in some cases. Therefore, the trimming strategy used in RAUR is useful to improve the Alignment rate of alignment tools for the next-generation genome sequencing.
All source code are available at http://netlab.csu.edu.cn/bioinformatics/RAUR.html.
Next-generation genome sequencing (NGS) technologies, including Illumina/Solexa and AB/SOLiD, generate billions of short reads (25-200 bp) and become more and more popular. Based on NGS technologies, a variety of biological applications are developed. In many large projects, resequencing and read mapping are extensively used, such as 1000 Genome Project and ENCODE . Recently various high-throughput approaches based on bisulfite conversion combined with NGS have been developed and applied for the genome wide analysis of DNA methylation . Resequencing , disease genome study , and identification of genetic variants [6, 7] are also benefited greatly by NGS. For most applications and analysis, assembly and alignment are the first step once sequencing reads are obtained. When reference genomes are not available, assembly will be used to construct genomes and many algorithms have been proposed, such as . The alignment algorithms are applied when reference genomes are available. However, there are many challenges to accurately map the reads to the genome, due to the sequencing errors with an overall per base error rate around 1-2% , repeats in the reference genome and differences between the donor and reference genomes.
In recent years, many short read alignment algorithms have been developed to address these challenges, different in speed, memory, accuracy, and alignment strategy [10, 11]. There are two main strategies adopted in them. One strategy is spaced seeds, and the representative alignment algorithms are known as MAQ  and SOAP . The other one is Burrow-Wheeler Transform , and the representative alignment algorithms are BWA , Bowtie2 , and SOAP2 . Although these alignment algorithms are more and more efficient and accurate, there are a portion of reads which are not mapped at all by the alignment tool or the mapping quality scores are less than the threshold.
The mapping quality and the related works
Mapping quality was firstly proposed in MAQ , which is an indicator of the likelihood that a mapping is accurate. Later on, many alignment tools also report mapping qualities for their alignments. The calculation of mapping quality is related to "uniquenes". An alignment is unique if it has a much higher alignment score than all the other possible alignments. In another word, the bigger the gap between the best alignment's score and the second-best alignment's score, the more unique the best alignment, and the higher its mapping quality should be.
Mapping quality is important to the downstream analysis, like variance calling. For instance, a variant caller might choose to ignore evidence from alignments with mapping quality less than 10. However, in almost all the state-of-the-art alignment tools, the mapping quality scores do not correlate well with the actual likelihood that a mapping is accurate. Many accurate mappings are generally reported with quality 0, and many inaccurate mappings are reported with high-quality scores. The RMAP algorithm  is proposed to improve mapping accuracy by incorporating base-call quality scores to weight mismatches. Furthermore, Ruffalo et al.  use a machine learning approach to re-calculate the mapping qualities of the short read mappings which are more accurate than those reported by the available alignment tools.
The coming of unmapped reads
The re-calculation of mapping quality of the mappings can make the mapping quality more reliable and promote the accuracy to some extent. However, it can do nothing for the reads which are reported as unmapped.
For most alignment tools, the edit distances or the allowed mismatches are limited, thus some reads can not be mapped if the number of mismatches in any hit exceeds the allowable differences. Given a read of length m, BWA  only tolerates at most k differences (mismatches or gaps) in a hit, where k is chosen such that < 4% of m-long reads with 2% uniform base error rate. With this configuration, for 15-37 bp reads, k equals 2; for 38-63 bp, k = 3; for 64-92 bp, k = 4; for 93-123 bp, k = 5; and for 124-156 bp reads, k = 6. That is to say, the reads with differences more than k in any hits will be unmapped.
Some trimmed-like strategies appear in some alignment programs and try to handle the problem. For example, in local read alignment mode, Bowtie2  might "trim" or "clip" some read characters from one or both ends of the alignment to maximize the alignment score. The local read alignment can improve the Alignment rate at some extend. However, the false positive sites are also introduced by maximizing the alignment score which will affect the alignment accuracy, since the maximum alignment score can't guarantee that high quality bases are involved. BWA-MEM  is a new alignment algorithm, which can perform local alignment and is robust to sequencing errors and applicable to a wide range of sequence lengths.
Our contribution in this article
The unmapped reads also contain many information which is important to the downstream analysis. Thus in this article, we propose a method named (RAUR) to re-align these unmapped reads. A trimming strategy used in RAUR is to figure out the longest and most confident and informative segment of a read based on base quality score. It adopts an iterative progress to trim the unmapped reads until the reads can be confidently mapped or can't be mapped in the whole progress. RAUR can combine with any alignment tool to improve the alignment rate. In our experiments, RAUR is combined with BWA  and Bowtie2  separately, and run on both the simulated data and real data with different read lengths. By comparing the Precision and Alignment rate, we can find out that RAUR can improve the Alignment rate of each alignment tool greatly, while the Pecision are still comparative with those of the original alignment tool. Furthermore, in some cases, it has comparative or better performance than BWA-MEM and the local read alignment mode of Bowtie2, which also adopt trimmed-like strategies.
In this section, we investigate the correlation between the low base quality scores and sequencing errors. Based on the investigation, the trimming strategy adopted in RAUR is presented in details. Then, RAUR algorithm is described.
Base quality scores distribution
where e is the estimated probability of the base call being wrong. Thus, a higher quality score indicates a smaller probability of error. A quality score of 10 represents an error rate of 1 in 10, with a corresponding call accuracy of 90%; a quality score of 20 represents an error rate of 1 in 100, with a corresponding call accuracy of 99%; a quality score of 30 represents an error rate of 1 in 1000, with a corresponding call accuracy of 99.9%. In this paper, a base quality score ≥ 20 is considered as a high base quality, otherwise it is a low base quality.
The strategy of trimming
There is a saying that the more things you do, the higher possibility you will make a mistake. Similarly, more bases considered, more sequencing errors will be encountered, which may ruin the alignment. With the number of mismatches or the edit distance greater than the allowed value, some reads will be unmapped by the alignment tools, or are mapped with low mapping qualities. These reads are excluded from downstream analysis. However, some confident segments of these reads can be used in variance calling. The first and most important step to make use of the unmapped reads is to figure out the most confident and informative segment of an unmapped read, which can be aligned correctly. This step is called trimming.
The purpose of trimming is to control the number of possible mismatches in the alignment. Mismatches in alignment can be sequencing errors and variances. Given a segment with K low quality bases, the maximum number of possible mismatches is K+b, and the minimum number is 0, where b is the number of possible variances. From Figure 2, we can know that the probability that all the K low quality bases in the segment are sequencing errors is small. Furthermore, Sachidanandam et al.  found out that it is nearly in 1 kb that there is a SNP, which indicates in a short read, b is ≤ 1. Thus, an alignment tool which allows K edit distances in a read, can align a segment with K low quality bases confidently. Additionally, to align uniquely, the length of the segment should be long enough. Thus, our aim of trimming is to find the longest segment with no more than K low quality bases, which can be aligned uniquely.
The details of trimming is illustrated as Algorithm 1. The inputs are unmapped reads, and parameter K. K is the number of low quality bases allowed in the segment. For each read, the positions of the bases with low qualities in the read are stored in a array. A segment of a read is several successive bases. Then we check the lengths of segments in the read containing K low quality bases. Each unmapped read is undertook the trimming in RAUR, and can be represented by a longest segment(or called a trimmed read) under the parameter K. The longest segments will be output in the same format as the original unmapped reads. The start position and the end position of a trimmed read in the original unmapped read are recorded, which can be used to deduce the position of an original unmapped read by using these information.
Algorithm 1 Trimming
1: Input: reads in fastq format, parameter K;
2: Output: trimmed reads in fastq format;
5: for each read R do
6: ▹ find the positions of low quality score
7: N_Low = 0,i = 0,Low_position = ;
8: Max_length = 0,Max_start = 0,Max_end = 0;
9: for each base i ∈ R do
10: if i has a low base quality then
11: Low_position[N_Low++] = i
12: end if
14: end for
15: if N_Low ≤ K then
16: output R in fastq format
18: end if
19: ▹find the longest segment with K low quality bases
20: for S = 0;S ≤ N Low − K;S++ do
21: length = 0,start = 0,end = 0,j = S + K;
22: if S ≥ 1 then
23: start = Low_position[S-1]+1;
25: start = 0;
26: end if
27: if j <N Low then
28: end = Low_position[j]-1;
30: end = R.length-1;
31: end if
32: length = end-start+1
33: if length >Max_length then
34: Max_start = start
35: Max_end = end
36: Max_length = length
37: end if
38: end for
39: Output substr(R,Max_start,Max_end,Max_length) in fastq format
40: end for
The process of RAUR is illustrated in Algorithm 2. Firstly, reads are aligned by an alignment program. Then the unmapped reads and the unconfident mapped reads (with mapping quality less than 10)  are the input of the loop. RAUR makes every effort to find out the longest and mappable segments of these reads by decreasing the values of K of the loop. The parameter K is used to control the number of low quality bases allowed in the trimmed reads. In all experiments of this paper, K is set as 8. For each iteration, the first step is to trim each unmapped reads into a longest segment (trimmed reads) containing K low quality bases. Then align these trimmed reads by the alignment program. When the trimmed reads with K low quality bases cannot be aligned or confidently mapped, their original reads are the input of the next loop with K = K-1. The whole process will stop when K = 0. Thus, for each read, it either can be confidently mapped with a certain value of K or can't be mapped with any value of K.
Algorithm 2 RAUR
1: Input: reference sequence, illumina Reads in fastq format, parameter K(K > 0);
2: Output: alignment_file in sam format;
5: Align Reads against reference sequence with an aligner;
7: Figure out the unmapped reads and reads with mapping quality ≥10 and write into file unmapped R eads
9: for K_low = K;K_low > 0;K_low = K_low - 1 do
10: ▹ Trim reads into longest segments with K_low low quality bases
11: K_low_Reads = Trimming(unmapped_Reads, K_low);
13: Align K_low_Reads against reference sequence with an aligner;
15: Figure out the unmapped reads and reads with mapping quality ≥10 and write their original reads into file unmapped R eads
17: end for
Evaluated programs and Evaluation metrics
To demonstrate the efficiency of RAUR, two alignment programs are involved in the experiments: BWA(v0.7.5)[15, 20], and Bowtie(v2.0.4), which are BWT-based short read alignment tools. RAUR combines each alignment program separately to re-align the unmapped reads and the unconfident mapped reads. RAUR(BWA) and RAUR(Bowtie2) denote the alignment program combined in RAUR. The two alignment programs are run independently as the control group. BWA-MEM algorithm and the local mode of Bowtie2 are sensitive to align longer reads, such as 70 bp-1 Mbp query reads. For further comparison, BWA-MEM  (denoted as BWA(mem)) and the local mode of Bowtie2 (denoted as Bowtie2(local)), which perform local alignment for long reads to improve the alignment rate, are run on the datasets with read length greater than 70. For all the alignment programs, the default options are adopted, and the value of K in RAUR is initiated as 8.
where N is the number of total reads, CN is the number of confidently mapped reads with mapping quality ≥ 10, and CCN is the number of confidently and correctly mapped reads.
Simulated data and performance
The alignment rate and precision of each alignment method on single-end simulated data with different read length.
The alignment rate and precision of each alignment method on paired-end simulated data with different read length.
As shown in Table 1 for the simulated single-end reads with length 50 bp, the Alignment rate of BWA and Bowtie2 are about 74% and 79%, respectively, while the Alignment rate of RAUR(BWA) and RAUR(Bowtie2) are about 83%. It means about 4% and 9% reads can be re-aligned by RAUR. The Precision of RAUR(BWA) is comparative with that of BWA and Bowtie2, whose precision are above 99%, while the Precision of RAUR(Bowtie2) has a little decrease. For the 75-bp reads and 100-bp reads, the Alignment rate of RAUR(BWA) and RAUR(Bowtie2) not only outperform BWA and Bowtie2, but also show advantages when compared with BWA(men) and Bowtie2(local). Although in theory BWA works with arbitrarily long reads, its performances are degraded on long reads especially when the sequencing error rate is high. The Alignment rate of RAUR(BWA) are about 13% more and 47% more than those of BWA on the 75-bp reads and 100-bp reads, and about 3% more and 4% more than those of BWA(men). The Precision of RAUR(BWA) are above 99%, which are comparative with those of BWA and B-WA(men). Compared with Bowtie2, the Alignment rate of both RAUR(Bowtie2) and Bowtie2(local) on the 75-bp reads and 100-bp reads are improved, however, their Precision decrease to about 98% and 95%, respectively.
The performance of each alignment program on paired-end reads with different read lengths are compared, as shown in Table 2. Compared with the single-end reads with the same read length, the Alignment rate of each alignment program on paired-end reads are much higher. The Alignment rate of BWA and Bowtie2 are about 84% and 89% on 50-bp paired-end reads, and 91% and 88% on 75-bp paired-end reads, respectively. However, the Alignment rate of BWA on 100-bp paired-end reads is as low as that of BWA on 100-bp single-end reads. In contrast, the Alignment rate of RAUR(BWA) and RAUR(Bowtie2) are above 94% on paired-end reads with different read lengths. Compared with Bowtie2(local), not only the Alignment rate but also the Precision of RAUR(BWA) and RAUR(Bowtie2) are greater than those of Bowtie2(local) on both 75-bp paired-end reads and 100-bp paired-end reads. However, the performances of BWA(men) are slightly better than RAUR(BWA) on Alignment rate or Precision.
The number of TP (true positive), and FP (false positive) in the re-aligned reads from single-end simulated datasets.
The number of TP (true positive), and FP (false positive) in the re-aligned reads from paired-end simulated datasets.
The alignment rate and precision of Bowtie2 on single-end simulated data with different initial values of K.
Real data and performance
The alignment rate and precision of each alignment method on single-end real data with different read length.
The alignment rate and precision of each alignment method on paired-end real data with different read length.
In Table 6 the Alignment rate of RAUR(BWA) and RAUR(Bowtie2) are significantly higher than those of BWA and Bowtie2, and consistent with those of RAUR(BWA) and RAUR(Bowtie2) on single-end simulated data with read length 75-bp. A little different from the simulated results, the Alignment rate of RAU-R(BWA) and RAUR(Bowtie2) outperform those of BWA(men) on three datasets, while Bowtie2(local) gains the highest Alignment rate on SRR006273 and ER-R00884s3, compared with other alignment programs, which is 2% more than those of RAUR(BWA) and RAUR(Bowtie2).
On the three real datasets of paired-end reads, as shown in Table 7 RAUR(BWA) and RAUR(Bowtie2) outperform BWA and Bowtie2, and show significant improvement on ERR007641 and SRR019044. All the alignment programs work well on long reads (ERR050728(90-bp)). The Alignment rate of RAUR(BWA) is comparative with those of BWA(men) and Bowtie2(local), while the Alignment rate of RAUR(Bowtie2) is about 1-2% less than Bowtie2(local).
For a read, if it originates from a unique region and its differences with the reference sequence do not exceed the alignment tools' allowance, it will be mapped uniquely. If a read is copied from a repeat region within the allowed number of mismatches, it has multi hits and the alignment tools have little confidence in its mapping. However, a read is probably unmapped if it has too much mismatches in the alignment, no matter they are sequencing errors or variances. RAUR is proposed to re-align these reads which cannot be mapped by alignment tools. The trimming strategy adopted in RAUR is used to find out the longest and confident fragments of these unmapped reads, with K low quality bases at most. Therefore, compared with the original reads, the possible mismatches in the alignments of the trimmed reads will decrease, and the possibility of successful alignments will increase.
RAUR is not only efficient to re-align the unmapped reads, but also works well on the reads with low mapping quality scores. There exists some reads with multi hits, but in fact they come from the unique regions of the genome. Even for the repeat regions, two repeats of one type also have small differences. To uniquely map the reads in the repeat regions is also possible, if the characterized differences are involved in the alignment, rather than the sequencing errors. Our method can control the possible mismatches and emphasize the characterized differences in the alignment. Thus, for these reads with low mapping quality scores, RAUR can figure out their longest and confident fragments and try to find out their correct positions.
RAUR also can efficiently align long reads against a reference sequence, which is a new challenge to many alignment tools. As we known, the length of reads coming from the new sequencing technologies become longer and longer, which makes many of the alignment tools exclusively designed for reads no longer than 100 bp inefficient. However, RAUR can employ these short read alignment tools to align long reads.
In this paper, by analyzing the base quality distributions of sequencing errors, a method (RAUR) is proposed to re-align the unmapped reads and the reads with low mapping quality scores. The key strategy adopted in our method is to align the most reliable and informative part of the read. We evaluate the method by comparing the Alignment rates and Precision on both simulated data and real data with different lengths. Combined with BWA or Bowtie2, RAUR can align more reads confidently than BWA and Bowtie2, with comparative Precision. Furthermore, the performance of RAUR is seldom affected with the increasing of read length. Moreover, RAUR outperforms BWA-MEM and the local mode of Bowtie2 in some cases.
The publication costs for this article were funded in part by the National Natural Science Foundation of China under grant nos. 61232001, 61379108, and 61370172, Hunan Provincial Innovation Foundation For Postgraduate (CX2013B070), and Science and Technology Plan Projects of Science and Technology Bureau of Hengyang City (grant 2013KJ29).
This article has been published as part of BMC Bioinformatics Volume 16 Supplement 5, 2015: Selected articles from the 10th International Symposium on Bioinformatics Research and Applications (ISBRA-14): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S5.
- Siva N: 1000 Genomes project. Nature biotechnology. 2008, 26 (3): 256-256.PubMedGoogle Scholar
- Feingold E, Good P, Guyer M, Kamholz S, Liefer L, Wetterstrand K, Collins F, Gingeras T, Kampa D, Sekinger E, et al: The ENCODE (ENCyclopedia of DNA elements) project. Science. 2004, 306 (5696): 636-640.View ArticleGoogle Scholar
- Zhang Y, Jeltsch A: The application of next generation sequencing in DNA methylation analysis. Genes. 2010, 1 (1): 85-101. 10.3390/genes1010085.PubMed CentralView ArticlePubMedGoogle Scholar
- Bentley DR: Whole-genome re-sequencing. Current opinion in genetics & development. 2006, 16 (6): 545-552. 10.1016/j.gde.2006.10.009.View ArticleGoogle Scholar
- Meyerson M, Gabriel S, Getz G: Advances in understanding cancer genomes through second-generation sequencing. Nature Reviews Genetics. 2010, 11 (10): 685-696. 10.1038/nrg2841.View ArticlePubMedGoogle Scholar
- Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, Kitzman JO, Baker C, Malig M, Mutlu O, et al: Personalized copy number and segmental duplication maps using next-generation sequencing. Nature genetics. 2009, 41 (10): 1061-1067. 10.1038/ng.437.PubMed CentralView ArticlePubMedGoogle Scholar
- Stratton M: Genome resequencing and genetic variation. Nature biotechnology. 2008, 26 (1): 65-66. 10.1038/nbt0108-65.View ArticlePubMedGoogle Scholar
- Luo J, Wang J, Zhang Z, Wu F-X, Li M, Pan Y: EPGA: de novo assembly using the distributions of reads and insert size. Bioinformatics. 2014, 762-Google Scholar
- Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, et al: Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008, 456 (7218): 53-59. 10.1038/nature07517.PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Homer N: A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics. 2010, 11 (5): 473-483. 10.1093/bib/bbq015.PubMed CentralView ArticlePubMedGoogle Scholar
- Ruffalo M, LaFramboise T, Koyutürk M: Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics. 2011, 27 (20): 2790-2796. 10.1093/bioinformatics/btr477.View ArticlePubMedGoogle Scholar
- Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome research. 2008, 18 (11): 1851-1858. 10.1101/gr.078212.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Li R, Li Y, Kristiansen K, Wang J: Soap: short oligonucleotide alignment program. Bioinformatics. 2008, 24 (5): 713-714. 10.1093/bioinformatics/btn025.View ArticlePubMedGoogle Scholar
- Burrows M, Wheeler DJ: A block-sorting lossless data compression algorithm. Technical report 124, Palo Alto, CA, Digital Equipment Corporation. 1994Google Scholar
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25 (14): 1754-1760. 10.1093/bioinformatics/btp324.PubMed CentralView ArticlePubMedGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL, et al: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10 (3): 25-10.1186/gb-2009-10-3-r25.View ArticleGoogle Scholar
- Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009, 25 (15): 1966-1967. 10.1093/bioinformatics/btp336.View ArticlePubMedGoogle Scholar
- Smith AD, Xuan Z, Zhang MQ: Using quality scores and longer reads improves accuracy of solexa read mapping. BMC bioinformatics. 2008, 9 (1): 128-10.1186/1471-2105-9-128.PubMed CentralView ArticlePubMedGoogle Scholar
- Ruffalo M, Koyutürk M, Ray S, LaFramboise T: Accurate estimation of short read mapping quality for next-generation genome sequencing. Bioinformatics. 2012, 28 (18): 349-355. 10.1093/bioinformatics/bts408.View ArticleGoogle Scholar
- Li H: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 2013, arXiv preprint arXiv:1303.3997Google Scholar
- Ewing B, Green P: Base-calling of automated sequencer traces using Phred. II. error probabilities. Genome research. 1998, 8 (3): 186-194.View ArticlePubMedGoogle Scholar
- Huang W, Li L, Myers JR, Marth GT: ART: a next-generation sequencing read simulator. Bioinformatics. 2012, 28 (4): 593-594. 10.1093/bioinformatics/btr708.PubMed CentralView ArticlePubMedGoogle Scholar
- Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, Sherry S, Mullikin JC, Mortimore BJ, Willey DL, et al: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001, 409 (6822): 928-933. 10.1038/35057149.View ArticlePubMedGoogle Scholar
- Eid J, Fehr A, Gray J, Luong K, Lyle J, et al: Real-Time DNA Sequencing from Single Polymerase Molecules. Science. 2009, 323 (5910): 133-138. 10.1126/science.1162986. doi:10.1126/science.1162986View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.