EXACT2
We present a fundamentally new version of the ontology EXACT2 for recording biomedical protocols. EXACT2 aims to explicitly define the semantics of experimental protocols in order to ensure their reproducibility, and to support computer applications that assist biologists in the preparation, maintenance, submission and sharing of experimental protocols. The range of experimental procedures in biomedicine is extremely wide, and ever increasing. While EXACT2 aims to cover the majority of experimental actions found in biomedical protocols, our estimate is that EXACT2 currently includes 85% of typical experimental actions. This estimate is based on processing of previously 'unseen' protocols. The scope of EXACT2 is restricted, for example by not allowing negations. Negations are rarely used in biomedical procedures, and are problematic to represent under the open world assumption. Our aim is to keep EXACT2 as simple as possible, and consequently such instructions as do not smoke cannot be represented with EXACT2, but such information can be captured in a form of notes as free text.
It is a challenging task to capture and formalise information pertinent to biomedical protocols, we therefore applied a modular approach to the problem. EXACT2 is focused on the formal description of experimental actions and imports other entities participating in experimental actions from external resources such as ChEBI (Chemical Entities of Biological Interest) dictionary [11] for biochemical entities and eagle-i (see the eagle-i project [12]) ontology for experimental equipment.
Upper level ontologies
The previous version of EXACT used SUMO (the Suggested Upper Merged Ontology) [13] and Time Ontology (see [14]) as upper-level ontologies [7]. Following OBO Foundry recommendations, the new version has been constructed with the use of the top-level classes from BFO (the Basic Formal Ontology) 1.1 [15], IAO (the Information Artifact Ontology) [16], PATO (Phenotype And Trait Ontology) [17] and OBI [2] (see Figure 1). The result is that the class SUMO: Object has been replaced with the class OBI: material entity, and the class EXACT: proposition has been replaced with the class IAO: information content entity. EXACT2 imports the PATO classes: volume, speed, temperature as descriptors of experimental actions. Such classes as IAO: document title, IAO: author identification were imported to EXACT2 to enable the representation of protocol's provenance. IAO classes textual entity and table were imported to capture information about such important textual elements of biomedical protocols as tables, notes, cautions, troubleshooting.
The adherence to BFO, IAO, PATO and OBI enables an efficient integration of EXACT2 with other biomedical ontologies, particularly with ChEBI for the representation of biochemical entities, and eagle-i for the representation of equipment used in experiments.
Structure of EXACT2
EXACT2 has a streamlined structure in order to ease the navigation through the ontology hierarchy. Underused top level classes such as EXACT: mode of transformation, EXACT: mode of separation have been deprecated. We have also deprecated classes that had only one or two subclasses. For example the class EXACT: shake had only one subclass EXACT: swirl, the class EXACT: cover had only one subclass EXACT: seal and the class EXACT: remove had only two subclasses EXACT: vortex and filter. All these classes are now defined as subclasses of the class EXACT2: experimental action. Consequently, a user or a computer application, in order to process information does not need to identify that, for example, the action rotate is a 'subtype of the remove type' of an action.
The structure of EXACT2 has been simplified further by the deprecation of roles. For example the class EXACT: container was represented as a role played by equipment. While this is an accurate representation, and different pieces of equipment can play different roles, we judged that EXACT2 should only include the top-level class EXACT2: equipment without specifying what roles it can play or what functionality it may have. Instead, a modular approach that enables an import of lab-specific equipment is employed for the encoding of biomedical protocols.
EXACT2 no longer directly supports commands (i.e. stop, continue) and other expressions (i.e. if-then expression) that could be included to biomedical protocols in order to describe a sequence of experimental actions. There are other formalisms (e.g. Petri nets) that are better suited for the representation of such knowledge.
The experimental actions branch
The experimental actions branch has been significantly extended. The previously published version of EXACT contained 45 actions, including command actions, equipment setup actions, and data actions. 33 of the 45 actions were classified as experimental action. We manually analysed hundreds of biomedical protocols and added 51 experimental actions that were missing from the previous version. For example the actions EXACT2: aliquot (definition: an experimental action "to measure out (a substance) into small samples of equal size; to divide into aliquot parts, especially for use as experimental samples" [18]), EXACT2: dilute (definition: an experimental action "to make or become less concentrated, especially by adding water or a thinner, (of a solution, suspension, mixture, etc.) having a low concentration or a concentration that has been reduced by admixture" ([19])were added to the new version.
EXACT2 imports three actions from OBI. Specialists in the area of biomedicine analysed the OBI branch planned process and identified OBI classes that represent experimental actions. This resulted in the addition of such classes to EXACT2 as OBI: elution (definition: the process of extracting one material from another by washing with a solvent to remove adsorbed material from an adsorbent (as in washing of loaded ion-exchange resins to remove captured ions)), OBI: injection (definition: injection is process which aims at introducing a compound or a mixture into a material entity (either biological entity or instrument) by relying on devices such as syringe or injector connection, attached or forced into a vascular system (veins of an organism or tubes of a machine) or in a tissue.).
Mapping of EXACT2 experimental actions to OBI planned processes
EXACT was developed before OBI independently included semantic descriptors relevant to experimental actions. As a result there are several EXACT and OBI classes that have similar semantic meanings. Following ontology design best practices EXACT2 explicitly maps such classes via the annotation property has synonym. For example the class EXACT2: wait is mapped to OBI: waiting, the class EXACT2: store is mapped to OBI: storage. The semantic meaning of these processes is similar, but not identical. In EXACT2 these actions are defined via a set of descriptors. The experimental action EXACT2: store requires the recording of such descriptors as (storage) temperature, period (of storage), biochemical entity (to be stored), (storage) condition, and equipment (used for storage). Otherwise, according to EXACT2, this experimental action cannot be reproduced adequately. OBI has the following properties for the process storage: has specified input some material entity (this is consistent with the EXACT2 descriptors biochemical entity and equipment), achieves planned objective some 'material maintenance objective' (based on our analysis of the protocols, EXACT2 does not enforce the recording of the descriptor goal for this experimental action), and realizes some (concretizes some 'plan specification') (again, based on our analysis of the protocols, EXACT2 does not enforce the description of the plan specification). Thus, OBI lacks the representation of such essential properties of the process storage as (storage) temperature, and period (of storage). Some biochemical entities must be stored at (or below) -196°C, -80°C, -20°C, +4°C and some may be kept at a room temperature. The failure to record such essential information may result in the failure to correctly follow biomedical procedures, and produce erroneous results. It is true that a storage period is frequently not specified in biomedical protocols. However, it is important information, for example for safety and reproducibility, and it is essential to record it whenever possible.
Optional descriptors
One of the requirements for EXACT2 is to represent which descriptors of experimental actions are essential, and which are optional. In a scenario where a user submits a protocol to an EXACT2-based system, and some experimental actions in that protocol do not include essential descriptors, then the system will request that the user specifies those missing descriptors (see the next section for more explanations). Conversely, a frequent occurrence in protocols is that experimental actions contain descriptors that are non-essential (= optional). These descriptors are beneficial to the understanding of protocols, and therefore should be preserved in machine amenable representations of protocols. However, a system supporting such representations needs to be able to strike the right balance between ensuring that all essential information about a protocol is captured, and remaining user-friendly by not enforcing users to input non-essential information. For example, it is not essential to specify the value of the descriptor temperature for the actions EXACT2: filter and EXACT2: resuspend. These actions are typically executed at room temperature, or at the temperature of the previous step, and it is normally specified in protocols if otherwise. EXACT2 aims to represent typical situations and, in order not to enforce the recording of the descriptor temperature for every instance of the classes EXACT2: filter and EXACT2: resuspend, EXACT2 needs to classify this descriptor as optional.
Unfortunately, the limited expressivity of OWL does not allow us to represent that an experimental action may have certain descriptors. To overcome this limitation we have introduced the class EXACT2: optional descriptor of experimental action with such subclasses as EXACT2: (optional) temperature, EXACT2: (optional) equipment, etc. An alternative solution would have been to assign probabilities to the statements 'an experimental action has a descriptor' [20]. However, we judged that the probabilistic approach would unnecessarily complicate the EXACT2 representations.