Skip to main content

Table 1 Description of database tables.

From: SMITH: a LIMS for handling next-generation sequencing workflows

Table

Description

User (not shown in Figure 2)

Includes name and surname, phone number, email, etc. The passwords are not stored in the database but provided either from a Lightweight Directory Access Protocol (LDAP) server or from a file realm.

Sample

Represents the biological sample of a sequencing experiment. Most attributes are used to set the sequencing machine. A new sample is created at each new request for a sequencing experiment.

Application

Contains the parameters characterizing a sequencing run: Read length, read mode, and sequencing depth. These parameters have been combined into a set of predefined recipes. This approach makes it easier for the user to choose appropriate parameters and reduces the number of possible applications, which in turn facilitates sequencing diverse sample in the same sequencing run.

SequencingIndexes

Contains all the possible sequencing barcode indices used in the laboratory. When the users prepare their own sequencing library, they must provide information about the sequencing barcode indices.

MultipleRequest

Using the web interface, it is possible to request more than one sample at the same time. Such samples are linked by the MultipleRequest table.

Project

Groups the samples into projects. A project is associated to a list of users (collaborators). The project creator can set special permissions for collaborators to view or modify the information regarding specific samples.

AttributeValue

Connects each sample to custom attributes and values. This approach permits enriching each sample with specific meta-data that can be used for searching for specific samples and for statistical analyses. Note that all the tables connected to sample and representing the results of the sequencing and the following analyses will be connected to the meta-data.

SampleRun

Represents the run of the sequencing machine for a specific sample, connected to the sequencing reagents used. Many samples can run together and be connected by the same run_id.

RawData

Keep track of FASTQ files produced. It stores the paths to files, samples and runs that originated the data.

AlignedData

Stores the algorithm and the reference genome used as well as the path to the resulting aligned data (in BAM format)

AnnotatedData

Analysis steps following the alignment are saved in this table. Many algorithms use as input the output of a precedent step. Thus, the table contains a one-to-many reference to itself.