In this section we reconcile the semantics of OWL-DL and GO's DAG: we analyse how one can be translated to the other and where, in that process, there could be problems. To perform such a translation it is necessary to understand the semantics of source and target languages and the aim is, of course, to say the same in each representation.
We start by assessing a technical issue that does not affect the semantics, but is important: naming conventions. OWL-DL has got its own naming conventions: non alpha-numeric characters or white spaces are not allowed in the names of the classes, only underscores and alpha-numeric characters. This presents a problem since many GO term names include non-alphanumeric characters. A solution to this problem is to translate any non-alphanumeric character into a string that spells out the disallowed: for example (-)-borneol dehydrogenase activity in GO would become PAR_MINUS_PAR_MINUS_borneol_dehydrogenase_activity in OWL. There is a choice to be made as to whether the term or GO identifier become the class label. The id is the primary identifier (GO:0047503), but the term is the more readable. Whatever the decision, one can be represented using the class label and one using an assertion on an "annotation property": in OWL-DL, we can declare a property to be an annotation property, and then use such a property to attach information to classes–without them being taken into account by an OWL-DL reasoner. That is, assertions on annotation properties act as comments from a DL point of view, yet they can be displayed to the biologist as a piece of information on this class–just as in GO. The most suitable annotation property for labelling a term with its id is rdfs:label, which is already included in OWL.
We cannot translate the natural language definitions associated with a term into OWL-DL axioms. These definitions might be expressible in OWL, yet we cannot automatically generate the correct OWL-DL expressions from a piece of English text. We can, however, capture them using another assertion on an annotation property.
We can capture the synonyms and other alternative labels given for a term in a variety of ways:
-
1.
As assertions on an annotation property;
-
2.
Using equivalence, subclass and superclass axioms;
-
3.
A mixture of approaches one and two.
In the first approach, we can use a series of annotation properties such as exact synonym, broad synonym, narrow synonym and related synonym.
In the second approach, if S1,...,S
n
are the exact synonyms given for a term T, then we translate this into an equivalence axiom EquivalentClasses(T S1...S
n
). Thus, each instance of S
i
is also an instance of T and each S
j
and, vice versa, each instance of T is also an instance of each S
i
.
In OWL-DL, an equivalence axiom EquivalentClasses(T S) means that the classes T, S involved have the same extent of instances. It can further be argued that they are therefore the same class. If the synonyms are exact, this is logically correct, though the ontologist might be presented with a plethora of classes in the user interface. It can be argued, however, that for the user this is simply a presentational issue, and that the user interface should collapse equivalent classes. Some methodologies, such as [25], suggest that a minimal number of classes should be used in an ontology. Use of equivalent classes does not break such an edict if we interpret classes with the same extent of objects as the same class (which is, after all, what is being said). It should be remembered, as is the message all through this article, that the reader should be wary of conflating presentation and the real semantics of a statement. Just as assumptions can be made about the presentation in Figure 1, so can assumptions be made about syntax showing "multiple" classes in an OWL-DL file.
A more significant argument is that this solution conflates a class level argument with a lexical argument. It should be remembered that labels on classes can change, while the class itself is unaltered. One only has to think, for instance, of the different French, German and English words for Leg that all refer to the same class of instances. Also, the equivalence axiom approach breaks when the synonyms are not exact synonyms. It could then be argued that the synonym labels should not be used, but one of narrow, broad or related. Hence the equivalence axiom solution is slightly sub-optimal since we would have preferred to have only a single class and more than one name for it, yet this would have required some expressiveness not (yet) available in OWL-DL, and the second approach has largely the same effect. In a similar manner to the equivalence axiom, if we have an alternative name S that is "broader than" a term T, then we add a statement SubclassOf(T S), and if we have an alternative name S that is "narrower than" a term T, then we add a statement SubclassOf(S T).
Please note that the second approach does not take into account related class labels which are not either exact, narrow nor broad, like virulence and pathogenesis. In this case, we can only suggest to use the first approach. In approach two, we cannot completely translate all class labels in an OWL-DL form, because the related-to tag has no reasonable representation as either subclass axiom or restriction upon a class, so we would have to use approach three, with a mixture of logical axioms and one assertion on an annotation property.
The use of the extra equivalence and subclass axioms has a logical argument and can be useful. When a reasoner is applied to such a translation, inconsistencies can be found. If the translator, however, feels that this approach mixes lexical and logical issues then only approach 1, using only assertions on annotation properties is the most valid approach.
Next, the DAG is-a relationship translates directly into OWL's sub-class relationship since they have the same semantics, i.e., every instance of a class is also an instance of each of its superclasses.
We can assume that subclasses in the DAG representation, like OWL subclasses, overlap by default, i.e., if C1 and C2 are subclasses of the same superclass, then we cannot exclude that there exists an object that is an instance of both C1 and C2. This will capture most of the biology in GO correctly. However, we might want to examine the GO and check, for each pair of subclasses, whether we cannot provide more information. For example, we should ask ourselves whether it is possible for an individual molecular function to be both function-x and function-y at the same time. If this is not the case, then we should make this knowledge explicit in the OWL ontology through the axiom DisjointClasses(function-x function-y).
In a similar manual step, we should add covering constraints where appropriate. A covering axiom means that, if an object is a member of a class, then it must be a member of one of the classes that it is asserted to "cover". That is, if Person covers Man and Woman, then any object that is a Person must be either a member of Man or a member of Woman, but it is possible not to have enough information to know to which of these classes that object belongs. For a biological example, if Enzyme activity covers all the enzyme functions, then an enzyme activity must be one of those activities; a new enzyme activity would be inconsistent with the ontology. The GO DAG representation does not allow such axioms and we believe that biologists would not use them widely even if it were possible because such axioms would require more knowledge than is usually available. An assumption of no covering is, therefore, not unreasonable.
Since the GO DAG does not capture disjointness or covering constraints, its inclusion is a matter of capturing biological knowledge, and there is no way of simply automating knowledge of disjointness. An automatic translation is possible if it is assumed that there is no "covering" and all sibling classes can possibly overlap.
5.1 Capturing the GO DAG part-of in OWL
OWL-DL provides a language that allows us to use as many properties as we want, and we can constrain their interpretation in a number of ways using existential, universal, or cardinality restrictions, and we can make statements about them such as one property being implied by another one or that a property is transitive. In Section 4, we have discussed four possible readings of the GO DAG's part-of links, and we show here how these different interpretations can be captured via translations to OWL-DL axioms. The advantage here is that, rather than using a single construct which may be read in a number of different ways, OWL-DL allows us to distinguish between these different readings. We can then use different readings of the part-of relationship (e.g., those discussed in Section 4), without any danger of confusion. In the following examples, we consider how we capture the particular semantics of the assertion P part-of W.
Reading 1 does not impose any restrictions on an instance of P or W as they only deal with the potential for the relationship. If one insists, one can translate this reading into an OWL-DL axiom
SubClassOf (P UnionOf ((restriction(part-of someValuesFrom W))
ComplementOf (restriction(part-of someValuesFrom W))),
yet this statement does not impose any constraints: indeed, it is equivalent to saying that P is a subclass of OWL:thing or saying nothing. In contrast, impossibilities do impose constraints, and we can express them in OWL-DL: to express that a P can never be part of a W, we can add the OWL-DL axiom
SubClassOf (P ComplementOf (restriction(part-ofallValuesFromW)))
Reading 2 Whenever a P exists, it is part of a W. This can be represented through the following axiom:
SubClassOf(P restriction(part-of someValuesFrom W)),
stating that, for each and every instance of P, there must be an instance of W of which it is a part. For example, every instance of SemiNiferousTubule is a part of an instance of Testis.
Reading 3 Whenever a W exists, it has some P as a part. This can be represented through the following axiom:
SubClassOf (W restriction(has-part someValuesFrom P)),
provided that we have declared that the property has-part is the inverse of part-of, as in Figure 4 (many description logics allow the definition and use of inverse relationships; in OWL there is no inverse property operator for use in expressions, but we can introduce and define properties as inverses). Inverse properties are interpreted as one would expect: two individuals a and b are related via a property P if and only if b and a are related via the inverse of P. For example, we can use such an axiom to state that every instance of Testis has a part that is an instance of SemiNiferousTubule. Please note that this statement and the one given as an example for the third reading are independent in the sense that they do not imply each other.
Reading 4 This is simply a conjunction of 2 and 3, and we can thus encode it by including both axioms introduced above.
As mentioned before, GO employs reading 2 for part-of links. Hence we translate each such link into the corresponding OWL statement. Additionally, we can then manually add more statements, e.g., in cases where our biology tells us that reading 4 would be more precise. These various semantics for the part-of relationship used in the GO DAG pre-date the OBO relationships described below in Section 6. In the OBO relationships, as we shall see, the semantics are more strictly defined and the translation to an existential property on a class, as in interpretation above, is clear.
Recall that, in GO, orphan nodes are those that do not have any outgoing is-a link. In OWL-DL, the corresponding classes do not cause any problems since they will be automatically placed in the class hierarchy under the most general class called OWL:thing. There are, therefore, no orphan nodes in an OWL-DL ontology and any modelling that makes any biological assertions to overcome placing subclass axioms to OWL:thing must be part of a process independent of the translation of representation.
That completes our discussion of the translation of GO's DAG into OWL-DL. We can see, therefore, that it is possible to represent what is captured in the GO in OWL-DL with making only two assumptions, both of which are reasonable. The OWL-DL representation will capture the same knowledge as the GO DAG. In addition, we can even distinguish between the uses of readings two and four in the part-of relationship in GO.
5.2 Translating OWL-DL back into DAG
As we have observed above, the DAG's is-a relationship and the subclass relationship in an OWL-DL ontology have the same reading. Hence we can take an OWL-DL representation of a DAG ontology, ask a reasoner such as FaCT++, Pellet or Racer [26–28] to infer all subclass relationships, and then translate the resulting class hierarchy back into DAG format. To be more precise, for each class name A, we first ask the reasoner to return all classes that are equivalent (if we have used the translation of synonyms using equivalence axioms described above) with A. Then we choose a "main" node label A' from A and the reasoner's answer, and create a node labelled A' whose exact synonyms are set to A (in case that A' is different from A) and the reasoner's answer (possibly minus A'). As a result of this step, we obtain a set of nodes labelled with terms and exact synonyms. Next, for each pair of node labels A, B, we ask the reasoner whether A is a subclass of B. If this is the case, we add an is-a link from A to B, otherwise we do not do anything. Similarly, for each pair of node labels A, B, we ask the reasoner whether A is a subclass of restriction(part-of someValuesFrom B). If this is the case, we add a part-of link from A to B, otherwise we do not do anything. Narrow and broad synonyms can be obtained by looking for subclasses and superclasses, respectively, yet this would be exactly the same information as represented in the is-a structure and thus redundant. Finally, those features of the GO DAG that we have translated to assertions on annotation properties can be retrieved and back-translated appropriately.
As a result, we obtain a graph whose nodes are labelled with names and sets of synonyms, and whose edges are labelled with is-a and part-of. If any axioms have been added to the GO in OWL, such as disjointness or covering axioms, these are retrieved through calls to the reasoner. Disjointness can be represented in the OBO format (see Section 6 below), but covering cannot. So, the back-translation of an augmented GO in OWL might well be lossy; i.e., they are lost in translation. This would also be true of all those features of OWL-DL that cannot be expressed in the OBO format. In general, this graph might not necessarily be acyclic, i.e., it may contain cycles. Since the GO DAG only allows part-of and not has-part relationships, however, common sense tells us that we should obtain an acyclic graph: a cycle would need to involve a part-of link since pure is-a cycles have been collapsed into a single node by construction. Now a cycle involving a part-of link, say from a node labelled A, would mean that, in every world conforming to our ontology, we have an infinite chain of instances a
i
of A with a1 part-of a2 part-of a3 part-of..., which clearly clashes with our intuition. However, if other relationships are used in the DAG, such as has-location or interacts-with, a cycle could easily arise (e.g. a protein that interacts with itself). As we will see below (Section 6) the wider OBO language allows cycles.