Volume 14 Supplement 11
Selected articles from The Second Workshop on Data Mining of Next-Generation Sequencing in conjunction with the 2012 IEEE International Conference on Bioinformatics and Biomedicine
An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads
- Julia D Warnke^{1} and
- Hesham H Ali^{2}Email author
https://doi.org/10.1186/1471-2105-14-S11-S7
© Warnke and Ali; licensee BioMed Central Ltd. 2013
Published: 4 November 2013
Abstract
Background
Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly.
Results
Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies.
Conclusions
Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies.
Background
Next generation sequencing has been responsible for numerous advances in the biological sciences allowing for the rapid production of sequencing data at rates not previously possible. Next generation sequencing has allowed for much innovative research in fields such as cancer genomics [1–3], epigenetics [4, 5], and metagenomics [6, 7]. These instruments are capable of producing several millions to billions of short reads in a single run. These reads are usually a small fraction of the original genome and do not contain much information individually. The massive amount of data that next generation sequencing technologies have produced has necessitated the development of efficient algorithms for short read analysis. Next generation sequencing technologies generate reads at high levels of genome coverage causing many of the reads to overlap. Specialized software programs called assemblers utilize these overlap relationships to organize, order, and align reads to produce long stretches of continuous sequence called contigs, which can be used for downstream analysis. Graph models for providing structure for the reads and their overlap relationships form the foundation of many of these assembly algorithms [8].
Metagenomics is a field of research that focuses on the sequencing of communities of organisms. This adds an additional layer of complexity to the analysis of short reads produced from metagenomics samples containing multiple sources of genetic information. Often these reads must be clustered or binned into their respective genomes before assembly or analysis of the reads can take place to avoid chimeric assembly results [9]. Multiple clustering and binning algorithms have been developed to address this issue [10–12]. While assembly results have been shown to be substantially improved by clustering metagenomics data before sequence assembly [13], overlap relationships retained by the assembly overlap graph are lost, leading to the removal of key global read overlap relationships and read similarities.
To address this issue, we previously introduced a short read analysis algorithm [14] that utilizes an overlap graph to model reads and their overlap relationships. Our algorithm utilizes Heavy Edge Matching (HEM) and graph coarsening methods [15] to efficiently reduce the overlap graph iteratively and to generate clusters of reads. At each graph coarsening iteration the algorithm outputs the reduced graph, producing a series of graphs representing the original read dataset across a spectrum of granularities. In comparison, most previous methods rely on a single graph to represent the read dataset, which may not capture all dataset features. The goal of our research is to create a multilevel approach that will allow for the extraction and analysis of dataset features at different information granularities that can be integrated into the assembly or analysis process. In our previous work, we applied our algorithm to cluster simulated reads representing a metagenomics dataset produced by the 454 technology. We then applied our algorithm to 454 bacterial datasets downloaded from NCBI to test our algorithm's ability to efficiently reduce and store the overlap graph. In this paper, we test the scalability of our algorithm and its ability to accurately cluster simulated Illumina read datasets at different genome coverages. We compare our algorithm's performance when applied to simulated 454 reads versus simulated Illumina reads. We also conduct a study using an Illumina metagenomics dataset downloaded from NCBI to evaluate the scalability of our algorithm. The obtained results show that our algorithm was able to substantially reduce the Illumina metagenomics overlap graph size and is scalable for large datasets. Results also demonstrate that our algorithm is practical for both 454 and Illumina data.
Results and discussion
In this section, we apply our graph coarsening and clustering algorithm to three Illumina metagenomics read datasets simulated at 5x, 15x, and 25x genome coverage. We evaluate our algorithm's graph coarsening and clustering results and compare them to results obtained by clustering a similar 454 metagenomics read dataset. Finally, we apply our algorithm to a large Illumina metagenomics dataset to demonstrate its scalability and ability to reduce large datasets. For all datasets, we report read overlapping and graph coarsening runtimes when ran on single or multiple compute nodes in a high performance computing environment.
Metagenomics clustering of simulated Illumina and 454 reads
For this study we generated Illumina read datasets from the eight reference genomes downloaded from NCBI RefSeq [16] used to generate the 454 metagenomics dataset in [14]. These reference genomes were selected at various levels of homology. Half of the bacterial genomes were chosen from the phylum Firmicutes and the remaining half were chosen from the phylum Actinobacteria. Pairs of reference genomes were chosen from the same order.
Metagenomics 454 read dataset.
Accession # | Organism | Genome Length(bp) | Number of Reads | Avg. Read Length (bp) |
---|---|---|---|---|
NC_012472 | Bacillus cerus | 5 269 628 | 40 000 | 445 |
NC_017138 | Bacillus megaterium | 4 983 975 | 40 000 | 440 |
NC_017999 | Bifidobacterium bifidum | 2 223 664 | 40 000 | 406 |
NC_014656 | Bifidobacterium longum | 2 265 943 | 40 000 | 408 |
NC_017465 | Lactobacillus fermentum | 2 100 449 | 40 000 | 467 |
NC_017486 | Lactobacillus lactis | 2 399 458 | 40 000 | 461 |
NC_011896 | Mycobacterium leprae | 3 268 071 | 40 000 | 413 |
NC_017523 | Mycobacterium tuberculosis | 4 398 812 | 40 000 | 420 |
Metagenomics Illumina read datasets.
Accession # | Organism | Number of Reads (5x) | Number of Reads (15x) | Number of Reads (25x) |
---|---|---|---|---|
NC_012472 | Bacillus cereus | 263 480 | 790 440 | 1 317 400 |
NC_017138 | Bacillus megaterium | 249 163 | 747 507 | 1 245 833 |
NC_017999 | Bifidobacterium bifidum | 111 180 | 333 540 | 555 900 |
NC_014656 | Bifidobacterium longum | 113 295 | 339 885 | 566 475 |
NC_017465 | Lactobacillus fermentum | 105 008 | 315 035 | 525 058 |
NC_017486 | Lactobacillus lactis | 119 970 | 359 910 | 599 850 |
NC_011896 | Mycobacterium leprae | 163 392 | 490 168 | 816 953 |
NC_017523 | Mycobacterium tuberculosis | 219 940 | 659 820 | 1 099 700 |
Total Reads | ■ | 1 345 428 | 4 036 305 | 6 727 169 |
Eight compute nodes on the commercial strength Firefly cluster located at the Holland Computing Center were used for read overlapping of the simulated Illumina metagenomics dataset [18]. After this was completed, the graph coarsening algorithm was applied to the read overlaps that were output by the overlap algorithm. Graph coarsening was run on a single node on the Firefly computer. The minimum for the ratio of nodes successfully matched to total graph size was .01. The minimum edge density was set to a threshold of 50.
Node counts per graph coarsening iteration.
Iteration | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
5x Illumina | 1345428 | 1088207 | 1001256 | 982272 | 978758 | n/a | n/a |
15x Illumina | 4036305 | 3030374 | 2551815 | 2365898 | 2296455 | 2270702 | 2260151 |
25x Illumina | 6727169 | 4941413 | 4073476 | 3685025 | 3494872 | 3405273 | 3363256 |
454 Reads | 320 000 | 182 532 | 113 382 | 71862 | 45991 | 29846 | 20318 |
Bioreactor | 9641139 | 5419384 | 4771806 | 419376 | 3690376 | 3248944 | 2866551 |
Iteration | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
5x Illumina | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
15x Illumina | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
25x Illumina | 3342376 | 3330806 | n/a | n/a | n/a | n/a | n/a |
454 Reads | 15592 | 13747 | 13215 | 13071 | 13026 | n/a | n/a |
Bioreactor | 2535701 | 2248986 | 2002686 | 1792359 | 1613895 | 1463064 | 1334821 |
Iteration | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
5x Illumina | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
15x Illumina | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
25x Illumina | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
454 Reads | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
Bioreactor | 1226384 | 1134971 | 1058853 | 996532 | 94591 | 907599 | 877863 |
Iteration | 21 | 22 | 23 | 24 | 25 | 26 | 27 |
5x Illumina | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
15x Illumina | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
25x Illumina | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
454 Reads | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
Bioreactor | 855983 | 840179 | 828999 | 821065 | 815434 | 811363 | n/a |
Read overlapping runtime (8 Nodes).
Dataset | Runtime (seconds) |
---|---|
5x Illumina | 2231 |
15x Illumina | 15 130 |
25x Illumina | 42 973 |
454 Reads | 3054 |
Illumina bioreactor metagenomics (30 Nodes) | 26 529 |
Graph coarsening runtime (serial merge sort).
Dataset | Runtime (seconds) |
---|---|
5x Illumina | 181 |
15x Illumina | 1045 |
25x Illumina | 2646 |
454 Reads | 390 |
Illumina bioreactor metagenomics dataset
The results from the simulated metagenomics study demonstrated the algorithm's ability to reveal incremental levels of information in read datasets and that it can be extended to both 454 and Illumina read datasets. However, for this algorithm to be practical for a wide range of sequencing applications, we must demonstrate the scalability of this algorithm for large read datasets. For this purpose, we applied our algorithm to a large Illumina bioreactor metagenomics dataset and evaluated its runtime and ability to reduce a large overlap graph.
We downloaded an Illumina bioreactor metagenomics dataset from the NCBI sequence read archive [19]. Table 2 describes the characteristics of this dataset. Paired-end reads were split for a total of 9,641,139 single reads. Any low quality read ends were trimmed with a minimum quality score of twenty using the FASTQ Quality Trimmer of the FASTX-toolkit [20].
Conclusions
In this paper, we introduced a graph coarsening and clustering algorithm that is able to model reads at multiple levels across a spectrum of information granularities. We demonstrated our algorithm's ability to cluster simulated Illumina metagenomics data at different levels of genome coverage. Clustering error rates of the algorithm applied to simulated Illumina metagenomics reads are comparable to error rates for clustering a simulated 454 metagenomics dataset with similar dataset characteristics. This suggests that our algorithm can be successfully applied to both Illumina and 454 read data. Algorithm runtimes were practical for all datasets. The largest simulated Illumina read dataset required less than twelve hours to complete read overlapping on eight compute nodes. Graph coarsening was completed on a single node in less than forty-five minutes. Read overlapping of the simulated 454 read dataset required less than an hour on eight compute nodes. Graph coarsening required less than seven minutes on a single compute node. The graph coarsening algorithm was more effectively able to reduce the higher coverage simulated Illumina datasets than the lower coverage Illumina datasets.
Finally, we applied our algorithm to a large Illumina metagenomics dataset to demonstrate its scalability. By utilizing parallel computing, our read overlapping algorithm was able to complete less than eight hours. The graph coarsening algorithm completed within approximately seven to twelve hours depending on the number of nodes added to the parallel merge sort portion of the graph coarsening algorithm. The algorithm was able to reduce the number of edges in the overlap graph nearly 168 fold while recording each graph level on disk, allowing a researcher to access the overlap graph at various levels of complexity. We plan to expand our graph coarsening algorithm to a full sequence assembler which will be contrasted to currently available assembly methods. We also plan to conduct further in-depth studies on the impact of input parameters on the graph coarsening process. Most current approaches rely on one overlap graph to capture a single snapshot of the reads and their overlap relationships. In contrast, our proposed assembly algorithm will rely on a series of coarsened graphs to capture both local and global dataset features.
The goal of our research is to develop an analysis method that will allow us to extract features of the read dataset at multiple information granularities to incorporate into the read analysis and assembly process. In the future, we plan to configure the algorithm such that clusters or nodes can be selected at different levels of information granularity. For example, if a node in a reduced graph is over-collapsed, we can select its child nodes from the previous graph iteration instead. We can continue with this zooming-in and zooming-out process, selecting child nodes from previous graph iterations until the desired node criteria is achieved or optimized. This will facilitate a customizable, intelligent approach to the read analysis and assembly problem.
Methods
Read overlapping
Graph theoretic model
Graph theory has become an important tool in many areas of bioinformatics research. The overlap graph forms the structural foundation of our read analysis and clustering algorithm. For this graph theoretic model, there is a one-to-one correspondence between the reads in the read dataset and nodes in the overlap graph. Edges connecting nodes in the overlap graph represent overlap relationships between reads. Each edge stores its corresponding overlap's alignment length and identity score. The overlap graph shares many similarities with the interval graph. The interval graph is one of the perfect graphs in graph theory and has many defined properties, making it a robust model for many applications [24].
Graph coarsening
We apply Heavy Edge Matching (HEM) to produce a series of coarsened graphs [15]. Graph coarsening provides levels of information about the structure of a graph over a spectrum of graph granularities. A very simple overview of global graph features exists at the coarsest graph levels, while more complex graph details are retained in earlier coarsening iterations and the original overlap graph.
This graph coarsening process can be applied to the newly coarsened graph G_{n+1} to produce an even coarser graph G_{n+2}. This iterative process of node matching and merging produces a series of coarsened graphs G_{0}, G_{1}, G_{2} ... G_{n}, where N(|G_{0}|) ≥ N(|G_{1}|) ≥ N(|G_{2}|) ... ≥ N(|G_{n}|) representing the dataset across a spectrum of information levels. Graph coarsening is terminated when the ratio of the number of nodes successfully matched to graph size falls below a user minimum.
Four array data structures are used to hold critical information describing the graph coarsening process. For each graph G_{0}, G_{1}, G_{2} ... G_{n} in the series of coarsened graphs, there are two arrays, node_weights and edge_weights. For each supernode z in a graph G_{n}, node_weights_{ n } records the number of child nodes descended from z in G_{0} and edge_weights_{ n } records the total weight of the edges induced by the child nodes. Let z be a supernode in a graph G_{n+1} and u and v be its child nodes in G_{n}. We use these arrays to calculate node density with the following equation.
where ew[u] = edge_weights_{ n }[u] and nw[u] = node_weights_{ n }[u].
During the matching process, a node u will be matched to its neighbor v only if edge_density(u, v) is greater than a user-provided minimum.
Each graph is also assigned two additional arrays, node_map and node_map_inverse. For each node u in a graph G_{n}, node_map_{ n }, records the label of the supernode that u is mapped to in G_{n+1}. For each supernode z in a graph G_{n+1}, node_map_inverse_{ (n+1) } records the labels of its child nodes in G_{n}. Let u and v be nodes in G_{n} that are mapped to supernode z in G_{n+1}, then node_map_{ n }[u] = node_map_{ n }[v] = z. If z is a supernode in G_{n+1}, then node_map_inverse_{(n+1)}[2*z] = u and node_map_inverse_{(n+1)}[2*z+1] = v, where (u, v) is an edge in a matching M applied to G_{n}. If z only has one child node u, then node_map_inverse_{(n+1)}[2*z] = u and node_map_inverse_{(n+1})[2*z+1] = -1.
Read cluster generation
Our algorithm uses the node_map_inverse arrays to recover reads clusters from a given coarsened graph G_{n}. Recall that if z is a supernode in G_{n}, then node_map_inverse_{ n }[2*z] = u and node_map_inverse_{ n }[2*z+1] = v, where u and v are child nodes of z in G_{n-1}. The labels of the child nodes of u in G_{n-2} would therefore be given by node_map_inverse_{ (n-1) }[2*node_map_inverse_{ n }[2*z] ] and node_map_inverse_{ (n-1) }[2*node_map_inverse_{ n }[2*z] +1]. The child nodes of v would be given by node_map_inverse_{ (n-1) }[2*node_map_inverse_{ n }[2*z+ 1] ] and node_map_inverse[2*node_map_inverse_{ n }[2*z+ 1] +1]. This nested iteration through the node_map_inverse arrays continues until the all of the labels of the child nodes of z in G_{0} are recovered. Since the label of each node in G_{0} is the id of the read to which it corresponds to in the read dataset, we can use the child node labels to generate the read cluster belonging to the supernode z.
Edge relabeling
Graph traversal of the reduced overlap graph is used to determine an ordering of the nodes in the original, full overlap graph. The end points of each edge in the original overlap graph are relabeled according to the node ordering recovered from the reduced overlap graph. The goal of the edge relabeling process is to organize the edges within the original overlap graph such that many of the edges with common endpoints are close to one another in the graph data structure, facilitating more efficient access to overlap graph information. More details on the reduced graph traversal and edge relabeling process and experiments examining its effectiveness can be found in [14].
Declarations
Acknowledgements
We would like to thank the staff of the UNO Bioinformatics Core Facility for the support of this research. This work is partially funded by grants from the National Center for Research Resources (5P20RR016469) and the National Institute for General Medical Science (NIGMS) (8P20GM103427), a component of NIH.
Declarations
The publication costs for this article were funded by the corresponding author.
This article has been published as part of BMC Bioinformatics Volume 14 Supplement 11, 2013: Selected articles from The Second Workshop on Data Mining of Next-Generation Sequencing in conjunction with the 2012 IEEE International Conference on Bioinformatics and Biomedicine. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S11.
Authors’ Affiliations
References
- Meyerson M, Gabriel S, Getz G: Advances in understanding cancer genomes through second-generation sequencing. Nature Reviews Genetics. 2010, 11 (10): 685-696. 10.1038/nrg2841.View ArticlePubMedGoogle Scholar
- Ding L, Wendl MC, Koboldt DC, Mardis ER: Analysis of next-generation genomic data in cancer: accomplishments and challenges. Human Molecular Genetics. 2010, 19 (R2): R188-R196. 10.1093/hmg/ddq391.PubMed CentralView ArticlePubMedGoogle Scholar
- Ross JS, Cronin M: Whole cancer genome sequencing by next-generation methods. Am J Clin Pathol. 2011, 136 (4): 527-539. 10.1309/AJCPR1SVT1VHUGXW.View ArticlePubMedGoogle Scholar
- Meaburn E, Schulz R: Next generation sequencing in epigenetics: insights and challenges. Seminars in Cell & Developmental Biology. 2012, 23 (2): 192-199. 10.1016/j.semcdb.2011.10.010.View ArticleGoogle Scholar
- Hirst M, Marra MA: Next Generation sequencing based approaches to epigenomics. Briefings in Functional Genomics. 2010, 9 (5-6): 455-465. 10.1093/bfgp/elq035.PubMed CentralView ArticlePubMedGoogle Scholar
- MacLean D, Jones JDG, Studholme DJ: Application of next-generation sequencing technologies to microbial genetics. Nature Reviews Microbiology. 2009, 7 (4): 287-296.PubMedGoogle Scholar
- Shokralla S, Spall JL, Gibson JF, Hajibabaei M: Next-generation sequencing technologies for enviromental DNA research. Molecular Ecology. 2012, 21 (8): 1794-1805. 10.1111/j.1365-294X.2012.05538.x.View ArticlePubMedGoogle Scholar
- Miller J, Koren S, Sutton G: Assembly algorithm for next-generation sequencing data. Genomics. 2010, 95 (6): 315-327. 10.1016/j.ygeno.2010.03.001.PubMed CentralView ArticlePubMedGoogle Scholar
- Pignatelli M, Moya A: Evaluating the fidelity of de novo short read metagenomics assembly using simulated data. PLoS ONE. 2011, 6 (5): e19984-10.1371/journal.pone.0019984.PubMed CentralView ArticlePubMedGoogle Scholar
- Schloss PD, Handelsman J: Introducing DOTUR a computer program for defining operational taxonomic units and estimating species richness. Applied and environmental microbiology. 2005, 71 (3): 1501-1506. 10.1128/AEM.71.3.1501-1506.2005.PubMed CentralView ArticlePubMedGoogle Scholar
- Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Shal JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF: Introducing mothur: Open-source, platform independent, community-supported software for describing and comparing microbial communities. Applied and environmental microbiology. 2009, 75 (23): 7537-7541. 10.1128/AEM.01541-09.PubMed CentralView ArticlePubMedGoogle Scholar
- Sun Y, Cai Y, Lui L, Yu F, Farrell ML, McKendree W, Farmerie W: ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences. Nucleic Acids Res. 2009, 37 (10): e76-10.1093/nar/gkp285.PubMed CentralView ArticlePubMedGoogle Scholar
- Bao E: SEED: efficient clustering of next-generation sequences. Bioinformatics. 2011, 27 (18): 2502-2509.PubMed CentralPubMedGoogle Scholar
- Warnke J, Ali HH: An efficient overlap graph coarsening approach for modeling short reads. Bioinformatics and Biomedicine Workshops (BIBMW). 2012, 704-711. 10.1109/BIBMW.2012.6470223. IEEE International Conference on: 4-7 October 2012Google Scholar
- Karypis G, Kumar V: A fast and high quality multilevel scheme for partitioning irregular graphs. Siam J on Scientific Comput. 1998, 20 (1): 359-392. 10.1137/S1064827595287997.View ArticleGoogle Scholar
- The reference sequence (RefSeq) project. The NCBI Handbook. Bethesda: National Library of Medicine (US), National Center for Biotechnology Information, 2002, ch. 18, [http://www.ncbi.nim.nih.gov/books/NBK21091]
- Huang W, Li L, Myers JR, Marth GT: ART: a next generation sequencing read simulator. Bioinformatics. 2012, 28 (4): 593-594. 10.1093/bioinformatics/btr708.PubMed CentralView ArticlePubMedGoogle Scholar
- Holland Computing Center. [http://hcc.unl.edu/main/index.php]
- Leinonen R, Sugawara H, Shumway M: The sequence read archive. Nucleic acids research. 2011, 39 (1): D19-D2.1. 10.1093/nar/gkq768.PubMed CentralView ArticlePubMedGoogle Scholar
- Gordon A: FASTX-toolkit. [http://hannonlab.cshl.edu/fastx_toolkit/index.Html]
- Larsson NJ, Sadakane K: Faster suffix sorting. 1999, Lund University, Lund, Sweden, 99-214. Tech. Rep. LU-CS-TRGoogle Scholar
- Rasmussen KR, Stove J, Myers EW: Efficient q-graph filters for finding all ε -matches over a given length. Proceedings of the RECOMB 1999 3rd annual international conf on Computational molecular biology. 1999, New YorkGoogle Scholar
- Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequences of two proteins. J Mol Biol. 1970, 48 (3): 443-453. 10.1016/0022-2836(70)90057-4.View ArticlePubMedGoogle Scholar
- Golumbic MC: Algorithmic Graph Theory and Perfect Graphs. 2004, Amsterdam: The Netherlands Elsevier B.V, 2Google Scholar
- Vigna S: Broadword implementation of rank/select queries. In the Proceedings of the 7th International Workshop on Experimental Algorithms. 2008, Springer, 154-168.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.