- Research
- Open Access
An Integrated Korean Biodiversity and Genetic Information Retrieval System
https://doi.org/10.1186/1471-2105-9-S12-S24
© Lim et al; licensee BioMed Central Ltd. 2008
- Published: 12 December 2008
Abstract
Background
On-line biodiversity information databases are growing quickly and being integrated into general bioinformatics systems due to the advances of fast gene sequencing technologies and the Internet. These can reduce the cost and effort of performing biodiversity surveys and genetic searches, which allows scientists to spend more time researching and less time collecting and maintaining data. This will cause an increased rate of knowledge build-up and improve conservations. The biodiversity databases in Korea have been scattered among several institutes and local natural history museums with incompatible data types. Therefore, a comprehensive database and a nation wide web portal for biodiversity information is necessary in order to integrate diverse information resources, including molecular and genomic databases.
Results
The Korean Natural History Research Information System (NARIS) was built and serviced as the central biodiversity information system to collect and integrate the biodiversity data of various institutes and natural history museums in Korea. This database aims to be an integrated resource that contains additional biological information, such as genome sequences and molecular level diversity. Currently, twelve institutes and museums in Korea are integrated by the DiGIR (Distributed Generic Information Retrieval) protocol, with Darwin Core2.0 format as its metadata standard for data exchange. Data quality control and statistical analysis functions have been implemented. In particular, integrating molecular and genetic information from the National Center for Biotechnology Information (NCBI) databases with NARIS was recently accomplished. NARIS can also be extended to accommodate other institutes abroad, and the whole system can be exported to establish local biodiversity management servers.
Conclusion
A Korean data portal, NARIS, has been developed to efficiently manage and utilize biodiversity data, which includes genetic resources. NARIS aims to be integral in maximizing bio-resource utilization for conservation, management, research, education, industrial applications, and integration with other bioinformation data resources. It can be found at http://www.naris.go.kr.
Keywords
- Biodiversity Data
- Database Connection
- Search Result Page
- Biodiversity Database
- Java Database Connectivity
Background
Plant and animal specimen data with surveys and observational data stored in museums and herbaria provide a vast information resource; these data include not only the historic information going back several hundred years, but also present day information on the locations of these entities [1]. The availability of this specimen data in on-line databases is greatly improving science and reducing cost and effort by providing for more efficient and effective biological surveys, which allow scientists to spend more time on research [2].
Traditionally, collections in museums and herbaria were made with only one main purpose in mind, taxonomic study, but their long-term mission has been to document biodiversity and its distribution through time and space for research, education, and service to the public [3]. The introduction of computer databases has opened up this vast data storage to many new uses [4]. These uses include biogeographic studies [5], conservation planning [6], reserve selection [7], climate change studies [8, 9], species translocation studies [10], etc.
The Internet's development has allowed new opportunities for interchanging data. Until the Species Analyst project [11], which began in the late 90's, there had been few successful electronic data interchange projects that utilized the Internet. Since then, a number of distributed projects have been initiated: the Mammal Networked Information System (MaNIS) [12], the World Network on Biodiversity (CONABIO) [13], the Australian Virtual Herbarium (AVH) [14], and the European Natural History Specimen Information Network (ENHSIN) [15]. Furthermore, a number of international cooperative projects (e.g., GBIF data portal, Species 2000, Consortium for the Barcode of Life (CBOL), Encyclopedia of Life (EOL), Ocean Biogeographic Information System (OBIS), Scratchpads [16], and WikiSpecies [17]) have recently begun to link and distribute biodiversity data among countries and international organizations. A general trend in bioscience is that advanced bioinformatics analysis and link-up tools are overcoming the challenges of different database types [18, 19]. For example, NCBI's databases contain information from molecular to species in different layers [20]. These layers are now becoming linked and integrated through genetic information from fast sequencing projects such as metagenomics; this process also accelerates biotechnological applications. The most important aspect of integration is that once these systems are deployed world wide, it will be possible for machines to process biodiversity information automatically, assuming the data are accurate. This will also accelerate standardization and raise the efficiency of the process of applying biodiversity studies by many magnitudes.
The biodiversity databases in Korea have been scattered among several institutes and local natural history museums, and the data were made up of heterogeneous types of databases with different formats and properties; a centralized standardized portal that enabled access to biodiversity information was necessary. A web portal will be able to speed up the investigation of complicated biological inquiries, allow researchers to develop new knowledge by analyzing large data sets, and provide the appropriate analytical tools [21]. For example, identification studies of predominant areas for conservation [22] and impacts of climate change across natural systems [23] will be increased through the accessibility of large, integrated databases.
We built NARIS, a database and national website for biodiversity, to manage and utilize the biodiversity resources in Korea and beyond. This system can be used in other nations with minimum modifications and language translations. At its core, NARIS is a biodiversity database, but it includes links to molecular and genetic information from the NCBI databases. NARIS aims to promote conservation, management, education, research, and industrial applications in biodiversity. In particular, with the advancements in fast genome sequencing technology, we expect that most of the common species will be completely sequenced before 2015. An integrated biodiversity management system such as NARIS will provide a platform that directly integrates the vast amount of genetic diversity information in the existing species diversity information.
Methods
System architecture
NARIS System Architecture. The data from several institutes and museums in Korea was integrated with NARIS by DiGIR protocol for data sharing between data providers, portal engine, GBIF, Korean BioInformation Center, Korea Science Technology Knowledge Information System, and Korea Knowledge Portal.
Participants of NARIS. As of August 2008, the data generated from 12 institutes and natural history museums in Korea are standardized with the Darwin Core2.0 format and integrated with NARIS to share and exchange the data.
Organizations | Records of Data |
---|---|
National Science Museum | 137,680 |
Korea National Arboretum | 980,000 |
Gyeryongsan Natural History Museum | 4,000 |
Kyunghee University Natural History Museum | 6,181 |
Mokpo Natural History Museum | 17,754 |
Hannam University Natural History Museum | 5,000 |
Seodaemun Museum of Natural History | 2,010 |
Ewha Womans University Natural History Museum | 5,050 |
Jeonam Maritime & Fisheries Science Museum | 2,000 |
Folklore & Natural History Museum Jeju Special Self-Governing Province | 54,588 |
Chungnam University Natural History Museum | 1,005 |
Busan Marine Natural History Museum | 2,275 |
NCBI Data Search and Link Engine.
The NARIS system uses an XML-based database connection module. The search engine supports a logical calculator, wildcards, Processor, API (C, Java-based), web-based management, and database connection with WAS and Oracle. It is equipped with several functions: a multi-database search, a document filter, a synonym/thesaurus dictionary, morpheme analyses, popular and recommended keywords, and statistical analyses. The J2EE (Java 2 Platform, Enterprise Edition) technology has been used for web service, as well as JDBC (Java Database Connectivity), using a database connection pool.
Data format and protocol
Schematic Diagram of the DiGIR Protocol. The DiGIR protocol is installed in each data provider's server, and the data are extracted after matching with provider databases. The data are saved in a temporary storage of Provider Registry by XML, and then they are transmitted to each portal system, along with corresponding metadata collected through the DiGIR software installed in each data provider's web server.
Results and discussion
User interface
Sample of Linked Genetic Information for Sus scrofa (pig). The integrated information for Sus scrofa, which included taxonomic data and genetic data from NCBI databases, can be accessed in a single search result page.
Unique functionalities
Data Error Report Function. A user can report a datum error in the comment box on NARIS's web site. Then the message is reported to a NARIS administrator.
Data Error Management Procedure. An incorrect datum reported from a user on NARIS's web site is reported to the NARIS administrator; the subject area experts assess it; and the incorrect datum is revised.
Statistical System. The X axis represents the location of survey, and the Y axis represents the index of species diversity. Each line (triangle, rectangle, and circle) indicates the index of species diversity for each survey period.
Future development
The future development of NARIS will contain several new functions. First, a personalized web-service (e.g., an individual database management system) will be included to attract voluntary participation of individual users. Through this system, individual users will be able to upload their own high quality and quantity of data, which will be registered on the NARIS with a verification process. Another multi-media function such as sounds will be provided for access to more exciting services. Two other new functions will be a system to distribute the actual collections and data contents, which can be actively utilized in academic researches or industrial applications, and support for taxonomic studies by providing not only morphological data but also DNA barcode data that have been recently used to identify species. Finally, molecular and genetic information that links to other bioinformatics portals, such as European Bioinformatics Institute (EBI) databases, will be offered.
Conclusion
With the advancement of information technology, the Internet, and rapid gene sequencers, managing biodiversity information in association with genetic information is becoming a global issue. Sharing and disseminating biodiversity information among different countries through relevant international organizations is becoming the standard practice.
In this paper, we presented an effective data management system, NARIS, which is a centralized data portal enabling an integrated information retrieval of the distributed biodiversity data and is linked with genetic information from NCBI. NARIS was specifically developed in Korea but is applicable to other nations. NARIS uses the DiGIR protocol with Darwin Core2.0 format as its metadata standard for data exchange to integrate and share the biodiversity data among organizations in Korea and abroad and to play an important role as a data node of the GBIF as well.
Considering the growing importance of biodiversity resources as the original source of bio-industry, establishing this integrated biodiversity information system is expected to be integral in protecting valuable national natural resources. Also, the system will be useful for the study of species and their distributions, conservation, scientific research, education, natural resource management, climate change, social and political uses, and medicinal studies.
Declarations
Acknowledgements
We thank Seunghun Baek, Hee-Yeun Lee, and Seong-Yong Yang, who provided technical assistance while constructing the database. This work was supported by a grant from the Ministry of Education, Science and Technology of Korea (M10868000001-08N6800-00100), the Korean Research Institute of Bioscience and Biotechnology (KRIBB) Research Initiative Program, Ministry of Environment of Korea (2008-05002-0065-0), and the Yeungnam University Research Program in 2008.
This article has been published as part of BMC Bioinformatics Volume 9 Supplement 12, 2008: Asia Pacific Bioinformatics Network (APBioNet) Seventh International Conference on Bioinformatics (InCoB2008). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/9?issue=S12.
Authors’ Affiliations
References
- Chapman AD, Busby JR: Linking plant species information to continental biodiversity inventory, climate and environmental monitoring. In Mapping the Diversity of Nature. Edited by: Miller RI. London: Springer; 1994:177–195.Google Scholar
- Chapman AD: Uses of primary species-occurrence data, version 1.0. Report for Global Biodiversity Information Facility. Copenhagen 2005.Google Scholar
- Winker K: Natural history museums in a postbiodiversity era. BioScience 2004,54(5):455–459. 10.1641/0006-3568(2004)054[0455:NHMIAP]2.0.CO;2View ArticleGoogle Scholar
- Chapman AD: Quality control and validation of point-sourced environmental resource data. In Spatial accuracy assessment: Land information uncertainty in natural resources. Edited by: Lowell K, Jaton A. Chelsea: Ann Arbor Press; 1999:409–418.Google Scholar
- Peterson AT, Navarro-Sigüenza AG, Benitez-Diaz H: The need for continued scientific collecting: A geographic analysis of Mexican bird specimens. Ibis 1998, 140: 288–294. 10.1111/j.1474-919X.1998.tb04391.xView ArticleGoogle Scholar
- Faith DP, Walker PA, Margules CR, Stein J, Natera G: Practical application of biodiversity surrogates and percentage targets for conservation in Papua New Guinea. Pacific Conservation Biology 2001, 6: 289–303.Google Scholar
- Margules CR, Pressey RL: Systematic conservation planning. Nature 2000, 405: 243–253. 10.1038/35012251View ArticlePubMedGoogle Scholar
- Root TL, Price JT, Hall KR, Schneider SH, Rosenzweigk C, Pounds JA: Fingerprints of global warming on wild animals and plants. Nature 2003, 421: 57–60. 10.1038/nature01333View ArticlePubMedGoogle Scholar
- Peterson AT, Ortega-Huerta MA, Bartley J, Sánchez-Cordero V, Soberón J, Buddemeier RH, Stockwell DRB: Future projections for Mexican faunas under global climate change scenarios. Nature 2002, 416: 626–629. 10.1038/416626aView ArticlePubMedGoogle Scholar
- Peterson AT, Vieglais DA: Predicting species invasions using ecological niche modeling. BioScience 2001, 51: 363–371. 10.1641/0006-3568(2001)051[0363:PSIUEN]2.0.CO;2View ArticleGoogle Scholar
- Kaiser J: Web tools: Searching museums from your desktop. Science 1999,284(5416):888. 10.1126/science.284.5416.888aView ArticleGoogle Scholar
- MaNIS – Mammal Networked Information System[http://manisnet.org/]
- CONABIO – Comisión national para el conocimiento y uso de la biodiversidad[http://www.conabio.gob.mx/]
- AVH – Australian"s Virtual Herbarium[http://www.anbg.gov.au/avh/]
- ENHSIN – European Natural History Specimen Information Network[http://www.bgbm.org/BioDivInf/projects/ENHSIN/]
- Scratchpads[http://scratchpads.eu/about]
- WikiSpecies[http://species.wikimedia.org/]
- Stein LD: Integrating biological databases. Nature Reviews Genetics 2003, 4: 337–345. 10.1038/nrg1065View ArticlePubMedGoogle Scholar
- Shao KT, Peng CI, Yen E, Lai KC, Wang MC, Lin J, Lee H, Alan Y, Chen SY: Integration of biodiversity databases in Taiwan and linkage to global databases. Data Science Journal 2007, 6: 2–10. 10.2481/dsj.6.S2View ArticleGoogle Scholar
- NCBI Taxonomy[http://www.ncbi.nlm.nih.gov/Taxonomy/]
- Neale SH, Pulland MR, Watson MF: Online biodiversity resources – principles for usability. Biodiversity Informatics 2007, 4: 27–36.View ArticleGoogle Scholar
- Williams P, Gibbons D, Margules CR, Rebelo A, Humphries C, Pressey R: A comparison of richness hotspots, rarity hotspots, and complementary areas for conserving diversity of British birds. Conservation Biology 1996, 10: 155–174. 10.1046/j.1523-1739.1996.10010155.xView ArticleGoogle Scholar
- Parmesan C, Yohe G: A globally coherent fingerprint of climate change impacts across natural systems. Nature 2003, 421: 37–42. 10.1038/nature01286View ArticlePubMedGoogle Scholar
- DiGIR – Distributed Generic Information Retrieval[http://digir.net/]
- Chapman AD: Principles and methods of data cleaning. Report for Global Biodiversity Information Facility. Copenhagen 2005.Google Scholar
- Chapman AD: Principles of data quality. Report for Global Biodiversity Information Facility. Copenhagen 2005.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.