GMO Genetic Elements Thesaurus (GMO-GET): a controlled vocabulary for the consensus designation of introduced or modified genetic elements in genetically modified organisms
BMC Bioinformatics volume 22, Article number: 48 (2021)
Various databases on genetically modified organisms (GMOs) exist, all with their specific focus to facilitate access to information needed for, e. g., the assistance in risk assessment, the development of detection and identification strategies or inspection and control activities. Each database has its unique approach towards the subject. Often these databases use different terminology to describe the GMOs. For adequate GMO addressing and identification and exchange of GMO-related information it is necessary to use commonly agreed upon concepts and terminology.
A hierarchically structured controlled vocabulary describing the genetic elements inserted into conventional GMOs, and GMOs developed by the use of gen(om)e-editing is presented: the GMO genetic element thesaurus (GMO-GET). GMO-GET can be used for GMO-related documentation, including GMO-related databases. It has initially been developed on the basis of two GMO databases, i.e. the Biosafety Clearing-House and the EUginius database.
The use of GMO-GET will enable consistent and compatible information (harmonisation), also allowing an accurate exchange of information between the different data systems and thereby facilitating their interoperability. GMO-GET can also be used to describe genetic elements that are altered in organisms obtained through current targeted genome-editing techniques.
Since the initial advent of conventional genetically modified organisms (GMOs)  or—as defined by the Cartagena Protocol on Biosafety—of living modified organisms (LMOs) and products derived from GMOs on the global market, GMOs have been considered as a distinct group of new plant, animal or microbial genotypes. Consequently, GMOs and all the specific aspects related to GMOs have led to a new vocabulary describing the introduced DNA sequences, the mechanisms to achieve the gene transfer and the resulting organism harbouring the newly introduced genetic constructs.
New genome editing techniques may result in relevant alterations in an organism´s genome but may also lead to minor modifications that are indistinguishable from untargeted or spontaneous mutations in the organism’s genome. According to the European Union´s (EU) legislation, such gen(om)e-edited organisms are covered by the EU GMO legislation [2, 3]. They are considered “GE-GMO” in this paper.
On the molecular level, a conventional GMO is determined by the genetic elements inserted into the endogenous DNA of the recipient organism. In case of GE-GMOs, an identifiable genetic element has been altered. The term genetic element describes a string of nucleotides (continuous DNA sequence) with a distinct function originating from one or multiple donor organism(s) or of synthetic origin, i.e. a molecular unit. Genetic elements are combined in a construct that comprises one or more coding DNA sequences (translated into proteins) or silencing elements together with appropriate regulatory DNA elements (promoters, terminators, etc.) acting during transcription or translation on DNA or RNA level to ensure correct expression of the integrated construct (Fig. 1).
Procedures for the enforcement of GMO legislation need a clear overview of the characteristics of, on the one hand, authorised GMOs that have been assessed for their safety for human and animal consumption and the environment, and of unauthorised GMOs, on the other hand, that have not (yet) been assessed by regulators of the respective countries. For an effective overview of the known GMOs, authorised and unauthorised, and also for inspection services and enforcement laboratories, it is important to have user-friendly databases providing easy access to all required information on each known GMO, being conventional or GE-GMO.
Various GMO databases have been compiled in the recent years and decades, all with their specific focus to facilitate access to information needed for, e. g., the assistance in risk assessment, the development of detection and identification strategies or inspection and control activities. A list of GMO-related websites is given in the EUginius database (https://euginius.eu/euginius/pages/intrestingLinks.jsf). Each database has its unique approach towards the subject. While each database contains unique information, there is a considerable overlap of information among them as exemplarily shown for some selected databases in Table 1. To complicate matters, these databases use different vocabularies to describe GMOs. This may lead to confusing situations, if users interpret the data differently than originally intended by the owner of the database. Another consequence is that these different vocabularies hamper an easy exchange of data between these databases. In practice, for adequate GMO identification it is essential to exchange GMO-related information and to do this effectively, it is necessary to use agreed concepts and terminology.
GMOs that are in the process of market approval and commercialisation in any member country of the Organisation for Economic Co-operation and Development (OECD), are specifically assigned with a unique identifier (UID) for unambiguous designation (not including the more recent GE-GMOs) [4, 5]. However, such a controlled vocabulary based on international consensus does not exist for the genetic elements introduced into GMOs.
To this end, this paper presents a hierarchically structured controlled vocabulary describing the genetic elements introduced or altered in conventional GMOs or GE-GMOs. We propose the term GMO genetic element thesaurus (GMO-GET). GMO-GET can apply to any GMO-related documentation, including any GMO-related database. It has initially been developed on the basis of three GMO databases, i.e. the database of the Biosafety Clearing-House (BCH; www.bch.cbd.int) , the European Database of Reference Methods for GMO Analysis (gmo-crl.jrc.ec.europa.eu/gmomethods) [6, 7] and the EUginius GMO database (www.euginius.eu) . The latter is a joint development by the Dutch WFSR (Wageningen Food Safety Research, formerly RIKILT Wageningen University & Research) and the German BVL (Federal Office of Consumer Protection and Food Safety). Both the BCH and the EUginius database already apply the here proposed vocabulary. The use of GMO-GET will enable a consistent and compatible formal molecular characterisation to support an accurate exchange of information between the different data systems and thereby facilitating their interoperability (Fig. 2).
Construction and content
Many bio-ontologies are stored at http://obo.sourceforge.net and are accepted by the scientific community as authoritative. All bio-ontologies assign an identifier (ID) for each term and these allow the archiving, storing and accessing of data in databases. Ontology IDs provide a means of exchanging data with unambiguous, shared meaning between databases, an ability known as 'semantic interoperability'.
Several bio-ontologies relate to plant genetic elements such as Gene Ontology , the Plant Ontology [10, 11] the Ontology of Genes and Genomes (OGG) , Sequence Ontology , and the Synthetic Biology Open Language (SBOL) . However, since none of these existing ontologies are suitable for the purpose of describing GMO-related genetic elements we decided to establish a new vocabulary following a hierarchical structure, i.e. a thesaurus. The basic strategy of using an exchangeable format has been adopted from Gene Ontology, Plant Ontology and others. The software OBO-Edit is used to build and structure GMO-GET (http://oboedit.org/) [15, 16]. GMO-GET is publicly available via EUginius (www.euginius.eu/euginius/pages/gmo_genetic_elements.jsf) .
Ontologies provide a means of formalizing knowledge in complex hierarchies that are composed of terms and rules [18, 19]. The ontology starts with a ‘root’ term, which can be connected to ‘child’ terms via defined relations. Those terms can be ‘parent’ to other terms. For example, the child-term can be related to the parent-term via an is_a relation (apple is_a fruit), or via a part_of relation (apple_peel is part of an apple). GMO-GET only uses the is_a relation (e. g. P-Cauliflower mosaic virus is_a promoter). The result is a hierarchical simple tree structure where functionally or phylogenetically related genetic elements are grouped together in branches. A set of descriptors is assigned to each element, including one preferred term precisely characterised by one definition, several synonyms (i.e. non-preferred terms), exactly one relation to a broader term and one or more relations to narrower terms as recommended in the international standard for thesauri and interoperability (ISO 25964) . In addition to these ISO-descriptors, each term has its own set of properties such as an ID, a comment for scientific references and, if applicable, trait (see Table 2 for more details and Fig. 3 and Table 3 for examples). The genetic elements can, therefore, be defined in detail by means of the thesaurus, not only in terms of their individual entity but also in terms of their relation to each other. The entire tree can be seen on the page ‘List with GMOs and genetic elements’ of the EUginius website (www.euginius.eu/euginius/pages/gmo_genetic_elements.jsf) .
GMO-GET is structured in five hierarchical levels
Level_0 is the root representing the general idea of a genetic element. Attached to the root are the level_1 terms. Level_1 describes the general functional types of genetic elements: coding sequence, enhancer, gene silencing elements, genomic sequence, intron, leader, promoter, other regulatory elements, terminator, transit peptide, unknown origin, vector fragment. The lower level (level_2) is composed of an abbreviation of the element type plus a generally comprehensive long name referring to the detailed biological function, e.g. CS-nopaline synthase. Below level_2, genetic elements of common/homologous origin (e.g. cry delta endotoxin genes) and/or with comparable/analogous features (e.g. variants of the epsps gene) are grouped. Those level_3 terms serve as a label for the actual genetic elements and are described with an abbreviation following specific syntax rules. Therefore, each genetic element listed in GMO-GET has an unambiguous designation (e.g. P-nos-RHIRD).
Where information is available a fourth level, level_4, can be added to collect variants of a genetic element with, e.g., minor sequence differences resulting from cloning strategies, spontaneous or induced mutagenesis. In allopolyploid GE-GMOs it could define variants of differently modified homoeologs. Such records in level_4 can be used to assign methods that target specific variants of a genetic element and, vice versa, exclude genetic elements that do not contain the target sequence of a method.
The general syntax for a genetic element in level_3 of GMO-GET is XX-YYYY-ZZZZZ, with a prefix (XX), a name part (YYYY) and a donor part (ZZZZZ). For gene silencing elements and elements modified using genome editing techniques, the name part also includes information about the particularity of the elements as a suffix after YYYY. For example, YYYY_genome_edited indicates an element modified by the use of genome-editing techniques, YYYY_siRNAs indicates a sense orientation of elements leading to gene silencing through siRNA, YYYY_siRNAas an antisense orientation and YYYY_siRNAu the undefined orientation of this element.
The prefix (XX, one or two characters) indicates the element type, i.e. whether the genetic element is a coding sequence (CS-), an intron (I-), a promotor (P-) or something else. All currently used prefixes are listed in Table 4.
The middle part of the syntax (YYYY, without fixed length) accounts for an abbreviation of the element. The abbreviation should reflect the most common abbreviation of the element. If an abbreviation does not exist or a common abbreviation is difficult to pinpoint, an easy-to-understand abbreviation of the element name should be used upon agreement by the constructors of the thesaurus. The abbreviation should be the same for all types of an element (i.e. promoter, terminator, etc.). Some rules were established to design the middle part of the syntax:
The abbreviation of the element name is written in lowercase letters unless the commonly used abbreviation uses uppercase letters. If the common abbreviation includes a species abbreviation, this should be deleted from the name since this will be indicated in the suffix of the element name (e. g. CS-AtAHAS will translate into CS-ahas-ARATH, P-ZmUbi1 will translate into P-ubi1-MAIZE); an exception from this rule is the naming of the Agrobacterium tumefaciens (update scientific name Rhizobium radiobacter) strain CP4. As there is no specific abbreviation for this strain “CP4” will be part of -YYYY-, e.g. CS-CP4epsps-RHIRD.
Special characters are avoided as much as possible and are exchanged by an underline “_”. E.g. “.”, “/” and “´” are interpreted as special characters.
Greek symbols are written out instead of using the one letter code (e.g. CS-beta-gal, instead of β).
The 3′ UTR and 5′ UTR (untranslated region) should be removed from element names since this seems obvious from the fact that the element is a terminator (T-) or promoter (P-).
Finally, for conventional GMOs, the syntax includes an organism code as suffix or donor part (ZZZZZ, usually four or five characters), which denominates the donor species or source of the genetic element by giving an abbreviation for the species in capital letters; hybrid elements have the donor SYNTH. The organism code follows the recommendations suggested by UniProt (http://www.uniprot.org/docs/speclist)  except for viruses. Abbreviations to reflect species of viruses are adopted from Plant Viruses Online—Index to Virus Acronyms (http://bio-mirror.im.ac.cn/mirrors/pvo/vide/acrindex.htm) or, if not found there, from the Ninth Report of the International Committee on Taxonomy of Viruses (2012) .
Utility and discussion
Genetic elements are used to identify a conventional GMO, i.e. by describing the genetic elements that are present, either as part of the insert or as endogenous (flanking of species-specific) sequence, a conventional GMO can be identified unambiguously. Therefore, GMO-GET enables the precise description of a conventional GMO by its genetic elements and the thesaurus makes it possible to integrate data from databases which use these terms even if from different hierarchical levels. For example, the BCH annotates GMOs in its databases with genetic elements that are defined with two terms, name and abbreviation, but without hierarchy. EUginius adopted those terms and incorporated them into GMO-GET: The BCH name became level_2 and the BCH abbreviation is level_3. EUginius uses level_3 terms of GMO-GET to describe GMOs and uses the hierarchical structure of GMO-GET to allow identification of related genetic elements and subsequently corresponding GMOs and/or methods. When the hierarchy of the thesaurus is used, a query with a level_2 term will find the same GMOs (if present) in other databases using GMO-GET, because the query will also find GMOs annotated with children (level_3 and level_4) of this particular level_2 term.
The EUginius database uses GMO-GET in its approach to provide major and relevant information on GMOs. This database is based on four interconnected modules. (1) The GMO module lists existing conventional GMOs and genome edited organisms and enables sorting and filtering by specific criteria like trait, company or genetic elements. It also provides detailed information on the molecular characterisation including annotated sequences. (2) The detection module contains information on detection methods including reference materials, tools supporting the development of screening strategies and relevant literature. (3) The analysis module provides a tool for the interpretation of screening test results. (4) The authorisation module offers detailed authorisation status and EU application details on food and feed.
GMO-GET is used (1) in the GMO module to describe the inserts in GMOs, and (2) especially for identifying non authorized GMOs: It is the structural basis for the GMO method matrix of the detection module and the analysis module.
By using GMO-GET, EUginius offers a way to include all information available on GE-GMOs—including trait, genetic element affected, developer, etc.—in a structured and standardized way. The standardisation makes the information on GE-GMOs available to systems that support inspection and control activities.
Based on the dual entity relation of many genetic elements it is also possible to use a computer to predict which GMO can be detected with a particular element-specific method. No manual intervention is required. EUginius uses this information, for example, for its Detection module and Analysis module, which facilitate the selection of appropriate detection methods and the interpretation of corresponding results of GMO analysis experiments.
The genetic element thesaurus GMO-GET presented here is the first ontology that provides a controlled vocabulary for GMO-derived genetic elements using an unambiguous and harmonised vocabulary with a predictable but flexible set of syntax rules. The thesaurus can easily be expanded, if needed. GMO-GET thereby allows exchange of information between databases with overlapping information by increasing the possibilities for automatic data exchange between databases. Only the element ID (or element name) needs to be transferred as the properties of the element are already defined in GMO-GET for the linked databases. Thus, linkage of databases using GMO-GET will enable supplementing the information from one database with additional information from the other database(s). Thereby the user of the databases does not have to perform a new, modified, query for each database, but can rely on asking for information on the same genetic element in all linked databases. The GMO-GET thesaurus with syntax rules and clear hierarchy enables the extrapolation of new terms from different parties without lengthy discussions on individual terms.
GMO-GET is the only known system that addresses organisms derived through gen(om)e-editing techniques, which are considered GMO in the EU but not necessarily everywhere else in the world.
Furthermore, GMO-GET allows explicit assignment of PCR methods for detection of genetic elements and of corresponding GMOs that contain the DNA sequence of the construct used for transformation. Since GMO-GET also offers information on the relationship of the genetic elements, using the hierarchical properties of the thesaurus, this also enables linking properties of one genetic element to another, related, genetic element. This increases the investigative potential of existing PCR detection methods by linking them not only to GMOs containing the confirmed target elements but also to GMOs containing ‘children’ of those elements. It names the modified gene of GE-GMO and supports the development of specific detection methods.
GMO-GET will thus facilitate harmonised enforcement strategies based on a clear overview of the characteristics of known authorised as well as unauthorised GMOs.
Availability of data
The GMO-GET is available via the website www.euginius.eu. The thesaurus is also available in OBO-format and can be requested from the corresponding authors on reasonable request.
Genetically modified organism
Gen(om)e-edited genetically modified organism
Genetically modified organism genetic element thesaurus
Living modified organism
Polymerase chain reaction
WHO. Frequently asked questions on genetically modified food. https://www.who.int/foodsafety/areas_work/food-technology/faq-genetically-modified-food/en/. Accessed 6 May 2020.
Directive 2001/18/EC of the European Parliament and of the Council of 12 March 2001 on the deliberate release into the environment of genetically modified organisms and repealing Council Directive 90/220/EEC. Official Journal of the European Communities 2001, L 106/1.
Case C‑528/16 Confédération paysanne and Others v Premier ministre and Ministre de l’Agriculture, de l’Agroalimentaire et de la Forêt. Curia: Court of Justice of the European Union; 2018.
Commission Regulation (EC) No 65/2004 of 14 January 2004 establishing a system for the development and assign‑ ment of unique identifiers for genetically modified organisms. Official Journal of the European Union 2004, L 10/7.
OECD. OECD guidance for the designation of a unique identifier for transgenic plants. Series on Harmonization of Regulatory Oversight in Biotechnology, No. 23. ENV/JM/MONO(2002)7/REV1.
Biosafety Clearing House. BCH Central Portal. http://bch.cbd.int/. Accessed 6 May 2020.
European Commission. GMOMETHODS: EU Database of Reference Methods for GMO Analysis. http://gmo-crl.jrc.ec.europa.eu/gmomethods. Accessed 6 May 2020.
Bonfini L, van den Bulcke MH, Mazzara M, Ben E, Patak A. GMOMETHODS: the European Union database of reference methods for GMO analysis. J AOAC Int. 2012;95(6):1713–9.
EUginius. The European GMO database. https://euginius.eu. Accessed 6 May 2020.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–9.
The Plant Ontology Consortium. The plant ontology consortium and plant ontologies. Comp Funct Genomics. 2002;3(2):137–42.
Jaiswal P, Avraham S, Ilic K, Kellogg EA, McCouch S, Pujar A, et al. Plant Ontology (PO): a controlled vocabulary of plant structures and growth stages. Comp Funct Genomics. 2005;6(7–8):388–97.
He Y, Liu Y, Zhao B, editors. OGG: a biological ontology for representing genes and genomes in specific organisms. In: Proceedings of the 5th International Conference on Biomedical Ontologies (ICBO), Houston, Texas, USA. 8–9 October 2014; 2014. p. 13–20.
Mungall CJ, Batchelor C, Eilbeck K. Evolution of the sequence ontology terms and relationships. J Biomed Inform. 2011;44(1):87–93.
Galdzicki M, Clancy KP, Oberortner E, Pocock M, Quinn JY, Rodriguez CA, et al. The Synthetic Biology Open Language (SBOL) provides a community standard for communicating designs in synthetic biology. Nat Biotechnol. 2014;32(6):545–50.
OBO-Edit - The OBO Ontology Editor ND. http://oboedit.org/. Accessed 6 May 2020.
Day-Richter J, Harris MA, Haendel M. The Gene Ontology OBOEWG, Lewis S. OBO-Edit—an ontology editor for biologists. Bioinformatics. 2007;23(16):2198–200.
EUginius. List with GMOs and genetic elements. https://www.euginius.eu/euginius/pages/gmo_genetic_elements.jsf. Accessed 6 May 2020.
Bard JBL, Rhee SY. Ontologies in biology: design, applications and future challenges. Nat Rev Genet. 2004;5(3):213–22.
Jensen LJ, Bork P. Ontologies in quantitative biology: a basis for comparison, integration, and discovery. PLoS Biol. 2010;8(5):e1000374.
National Information Standards Organization. ISO 25964—the international standard for thesauri and interoperability with other vocabularies Baltimore, MD, USA. https://www.niso.org/schemas/iso25964. Accessed 6 May 2020.
The UniProt Consortium. UniProt: a hub for protein information. Nucl Acids Res. 2014;43(D1):D204–12.
King AMQ, Adams MJ, Carstens EB, Lefkowitz EJ. Virus taxonomy. Ninth Report of the International Committee on Taxonomy of Viruses. Elsevier Academic Press. 2012.
Prins TW, van Dijk JP, Beenen HG, Van Hoef AMA, Voorhuijzen MM, Schoen CD, et al. Optimised padlock probe ligation and microarray detection of multiple (non-authorised) GMOs in a single reaction. BMC Genomics. 2008;9:584.
Biosafety Clearing-House. Living Modified Organism (LMO) Registry. http://bch.cbd.int/database/lmo-registry/. Accessed 6 May 2020.
FAO. FAO GM Foods Platform. http://www.fao.org/food/food-safety-quality/gm-foods-platform. Accessed 6 May 2020.
GenBit. GM Crops Database. https://www.genbitgroup.com/en/gmo/gmodatabase. Accessed 6 May 2020.
GMO Detection Laboratory in Shanghai JiaoTong University. GMO Detection method Database. http://gmdd.sjtu.edu.cn. Accessed 6 May 2020.
ISAAA. International service for the acquisition of agri-biotech applications database. http://www.isaaa.org/gmapprovaldatabase. Accessed 6 May 2020.
OECD. BioTrack Product Database. https://biotrackproductdatabase.oecd.org. Accessed 6 May 2020.
We greatly thank P. Heinze and N. Duensing (BVL) as well as T.W. Prins and L. van den Heuvel (WFSR) for contributing to the development of the thesaurus. We also express our appreciation to staff at the Secretariat of the Convention on Biological Diversity for their collaboration.
This project was financially supported by the Dutch Ministry of Economic affairs, and the German Federal Office of Consumer Protection and Food Safety. The funding bodies had no role in the design of the thesaurus and in writing the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors PA, ED, DW, MPM, EJK and JB declare that they have no competing interests. KBT is an employee of Yield10 Bioscience, Inc.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Adamse, P., Dagand, E., Bohmert-Tatarev, K. et al. GMO Genetic Elements Thesaurus (GMO-GET): a controlled vocabulary for the consensus designation of introduced or modified genetic elements in genetically modified organisms. BMC Bioinformatics 22, 48 (2021). https://doi.org/10.1186/s12859-020-03880-0
- Genetic elements
- Targeted mutagenesis