- Poster presentation
- Open Access
- Published:
Validation and quality assurance for genome browser database exports
BMC Bioinformatics volume 16, Article number: P13 (2015)
Background
A genome browser transition utility designed in our lab, FPD2GB2 (Fungal Project Database to GBrowse 2), exports data from a custom database used by the Fungal Endophytes Genome Project[1, 2]. Designed as a collection of scripts, FPD2GB2 outputs the contents of a locally developed genome annotation database into the standard GFF3 format, allowing for bulk import of data into the GBrowse2 genome browser[3]. In short, FPD2GB2 is a collection of scripts designed to export data encoded in the Fungal Project Database format into a format which can be easily imported into GBrowse 2, namely GFF3.
Materials and methods
Any application which converts between data formats should ensure the completeness and accuracy of the output produced by FPD2GB2. Adding a data validator as part of the FPD2GB2 script collection allows for independent verification of the quality and soundness of the GFF3 files being imported into a production GBrowse2 environment.
We measure the accuracy of the output by comparing the features listed in the GFF3 files to the contents of the original database. Ensuring accurate offsets relative to reference features provides validation of accuracy. Comparing the parent-child inheritance structure of features in the output to that of the source data ensures the completeness of the output. The script collection is structured into a “master” script and several “worker” scripts, each of which produces its own output. The structure of the collection is shown in Figure 1. The goals and methods for the validator are described in Table 1.
Results
It is notoriously difficult to prove accuracy of computational results and in practice validation is based on testing. In our case to validate the completeness, correctness and accuracy we use metrics which can not only give confidence that the output tends to accurately reflect the output, but also that the algorithms used to create the output are correct. The size of some of the databases and number of annotation tracks also makes full comparison of related tracks impractical, as fully comparing tracks takes a quadratic number of runs with respect to the number of tracks. Finally, because of the way the annotations do not have metadata establishing relationships, comparisons using ParsEval have to be run manually.
References
Fungal Endophytes Genome Project: [http://www.endophyte.uky.edu/]
Schardl CL, Young CA, Hesse U, Amyotte SG, Andreeva K, Calie PJ, et al: Plant-symbiotic fungi as chemical engineers: multi-genome analysis of the Clavicipitaceae reveals dynamics of alkaloid loci. PLoS Genetics. 2013, 9 (2): e1003323-
Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, et al: The Generic Genome Browser: A Building Block for a Model Organism System Database. Genome Research. 2002, 12 (10): 1599-1610.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Chui, R., Jaromczyk, J.W., Moore, N. et al. Validation and quality assurance for genome browser database exports. BMC Bioinformatics 16 (Suppl 15), P13 (2015). https://doi.org/10.1186/1471-2105-16-S15-P13
Published:
DOI: https://doi.org/10.1186/1471-2105-16-S15-P13
Keywords
- Genome Browser
- Fungal Endophyte
- Annotation Database
- Quadratic Number
- Independent Verification