Biodiversity research is rapidly developing into big-data science, enabling researchers to model processes that affect entire biotas and to predict ecosystem-wide effects of environmental change. To facilitate this, infrastructures that provide open access to species observation data for all types of life are crucial. The Living Atlas (LA) is an infrastructure for integration of biodiversity data from multiple sources with environmental and contextual information. It was originally developed by the Atlas of Living Australia, in response to growing demands of the biodiversity research community for open access to extensive databases and analysis tools [1]. It is, however, also supported by the Global Biodiversity Information Facility [2], and now serves as the main biodiversity data hub in 27 countries and regions [3]. The software is developed in open collaboration, and more than 100 developers have contributed to the codebase.
Although the LA accommodates less traditional data types such as images, or output from animal tracking devices, it has so far offered limited functionality for DNA sequence-based observations. Meanwhile, molecular methods for species observation, in particular metabarcoding (amplicon sequencing of taxonomic marker genes) of environmental DNA (eDNA) and bulk samples, are becoming increasingly important tools for documenting the diversity of life [4], especially in the microscopic realm (prokaryotes, protists and fungi; see e.g., [5]).
We identified three features that would make the LA platform more useful for handling occurrence data derived from metabarcoding: (1) the option to store processed barcode sequences in the form of Amplicon Sequence Variants (ASVs), underlying occurrences in the atlas, and to use the Basic Local Alignment Search Tool (BLAST; [6]) to find such occurrences, (2) the possibility of searching for ASVs and occurrence records based on sequencing details, such as target genes and primers, and (3) a dynamic approach to taxonomic annotation of observed ASVs, allowing for easy updates as reference databases develop. Below, we present an application that provides these features, and functions as a semi-integrated LA module.
Implementation
The ASV portal is a web interface to sequence-based biodiversity observations in the LA platform, and is implemented as five separate microservices that are defined and orchestrated with Docker Compose ([7]; Fig. 1). The main application includes a Python-Flask [8] backend, a jQuery [9] frontend, and a uWSGI [10] application server that forwards requests to Flask from the NGINX reverse proxy server [11]. Flask, in turn, retrieves ASV and occurrence records from a PostgreSQL [12] database, turned into a RESTful API by the PostgREST [13] server. In addition, the main application delegates BLAST jobs to a worker, spawning additional worker processes when needed. Finally, the service configuration includes volumes for persistent storage of e.g. file uploads, BLAST and ASV database records.