Semantic representation of monogenean haptoral Bar image annotation
© Abu et al; licensee BioMed Central Ltd. 2013
Received: 18 September 2012
Accepted: 8 February 2013
Published: 12 February 2013
Digitised monogenean images are usually stored in file system directories in an unstructured manner. In this paper we propose a semantic representation of these images in the form of a Monogenean Haptoral Bar Image (MHBI) ontology, which are annotated with taxonomic classification, diagnostic hard part and image properties. The data we used are basically of the monogenean species found in fish, thus we built a simple Fish ontology to demonstrate how the host (fish) ontology can be linked to the MHBI ontology. This will enable linking of information from the monogenean ontology to the host species found in the fish ontology without changing the underlying schema for either of the ontologies.
In this paper, we utilized the Taxonomic Data Working Group Life Sciences Identifier (TDWG LSID) vocabulary to represent our data and defined a new vocabulary which is specific for annotating monogenean haptoral bar images to develop the MHBI ontology and a merged MHBI-Fish ontologies. These ontologies are successfully evaluated using five criteria which are clarity, coherence, extendibility, ontology commitment and encoding bias.
In this paper, we show that unstructured data can be represented in a structured form using semantics. In the process, we have come up with a new vocabulary for annotating the monogenean images with textual information. The proposed monogenean image ontology will form the basis of a monogenean knowledge base to assist researchers in retrieving information for their analysis.
Over the years, we have been collating information on monogeneans found in Malaysian waters, digitized these images and stored them as unstructured data. These images which were extracted from journal publications are meaningless without their textual annotations. Contemporary approaches to organizing image data and its corresponding textual descriptions are by using either the relational database technologies or the XML technologies. For example, the Biota , InsideWood , MonoDb  used the relational database technology, while the Open Microscopy Environment (OME) Data Model and XML File , knowledge-based grid services for high-throughput biological imaging , PLAZi  utilised the XML technology. Annotations of images in a relational database are confined by the number of columns used for the descriptions of the images. The number of characters allowed in a cell of a database table is also fixed. Any new inclusions into existing relational model with fixed tables and set of fields may require new schema to be developed and existing queries to be revised. Migration to a new schema and revision of queries can be very cumbersome and time consuming. Excessive images stored in a database take up a lot of space and creates a huge database file, affecting retrieval time. Storing images outside the database file in a directory and linking them via identifiers in the database column was a possible solution but here again any new inclusion of data will required a change in identifiers [7, 8]. XML is normally used to describe and structure data . Annotations of images in XML are not linked and hence the relationships between objects are not expressed.
Semantics is needed to organize data by focusing on the meaning of objects by expressing relationships. It provides necessary vocabularies to link different data entities to their properties .
Data used in this paper are images of the monogenean haptoral bars along with textual information which consist of: (1) taxonomic classification and (2) description of properties of an image found in publications (see Figure 1). The data is analysed and structured into main concepts. Defining these concepts using a standard structured vocabulary is necessary to make sure the meaning of data is clear and explicit, thus facilitating data sharing and maximizing reusability in wide variety of contexts.
The Taxonomic Data Working Group (TDWG)  strongly suggests the deployment of Life Science Identifiers (LSID), the preferred Globally Unique Identifier technology and transitioning to RDF encoded metadata as defined by a set of simple vocabularies. The TDWG LSID vocabulary has been widely used in biodiversity and offers a wide coverage of concepts, which are suitable to annotate the taxonomic information of an organism. The nomenclature used in this research is from TDWG LSID vocabulary and where necessary, appropriate vocabularies specific to the monogeneans are formed (see Additional file 1). Specific vocabularies (for example DiagnosticPartTerms) are needed as Monogeneans are parasitic platyhelminths and are distinguished based on both soft reproductive anatomical features as well as shapes and sizes of sclerotised hard parts such as the haptoral bar, anchor, hook and male and female copulatory organ .
Seven concepts are described from the monogenean data used in this paper - Specimen, TaxonName, PublicationCitation, KindofSpecimenTerm, TaxonRankTerms, PublicationTypeTerms are defined using the TDWG LSID controlled vocabulary, whereas the DiagnosticPartTerm is a new concept. Specimen concept represents the illustrated images of the haptoral bars of the monogeneans. TaxonName represents a single scientific name. PublicationCitation represents a reference to the publication of the monogenean species. KindofSpecimenTerm represents the specimen terms such as illustration, digital object and still image. TaxonRankTerms represents the taxon rank terms for taxonomic classification. PublicationTypeTerms represents type of publication for example an article in journal or book. DiagnosticPartTerms represents the name of the monogenean hard parts.
Defining properties and relationships
There are two types of properties for the semantics representation which are object properties and datatype properties. Object properties are relationships between two individuals (link an individual to an individual), whereas datatype properties describe relationships between an individual and data values. The properties defined for the seven concepts are mentioned here and descriptions are available in Additional file 1.
Properties for specimen concept
Four object properties are defined under the Specimen concept; kindOfSpecimen, isHaptorBar, isCitedIn , typeForName and three datatype properties; specimenId, imgDir and imgDescription.
Properties for taxon name concept
Eight object properties are defined under the TaxonName concept; rank, isBelong,part,hasSpecies,hasGenus,hasFamily,hasOrder,isHostedIn and four datatype properties; nameComplete, authorship, year and locality.
Properties for publication citation concept
Two object properties are defined under PublicationCitation concept; pubType and lists and five datatype properties; author, year, title, parentPublicationString, number.
Properties for diagnostic part terms, kind of specimen terms, taxon rank terms, publication type terms concepts
One datatype property is defined for DiagnosticPartTerms, KindofSpecimenTerms, TaxonRankTerms, PublicationTypeTerms concepts, which is called definedTerm. This property is given a generic name as it will be used to bind multiple concepts together.
Semantic representation of data using the Web ontology language
concepts, 27 properties, and the relationships between them represent conceptualization of the data used in this paper. This conceptual framework needs to be converted in a machine readable formal specification to reason about the identified concepts and eventually describe the data. This formal specification of shared conceptualization is called ontology .
Naming of instance and number of instance s for each concept
Naming of instance
Name of instance(in bold)
Number of instance s
Instance for species is named according to genus and species name
instance of species Bifurcohaptor baungi is labelled as BifBaungi
The full name of genus is used for naming the genus instance name
instance of genus Bifurcohaptor is labelled as Bifurcohaptor
The full name of family is used for naming the family instance name
instance of family Ancylodiscoididae is labelled as Ancylodiscoididae
The full name of order is used for naming the order instance name
instance of order Dactylogyridea is labelled as Dactylogyridea
Instance for publication is named according to author and year
instance of publication
Lim, L. H. S. & Furtado, J. I. (1983). Ancylodiscoidins (Monogenea: Dactylogyridae) from two freshwater fish species of Peninsular Malaysia. Folia Parasitologica. 30, 377 - 380 is labelled as LimFurtado1983
The full name of diagnostic part is used for naming the instance
instance of haptor sclerotised parts bar is labelled as HaptorSclerotisedpartsBar
The full name is used for naming the instance
instance of illustration is labelled as Illustration
The full name is used for naming the instance
instance of species is labelled as Species
The name of publication type is used for naming the instance
instance of journal article is labelled as JournalArticle
Concepts, instances, object or data type properties
Example of data
Haptor Sclerotised parts Bar
Lim & Furtado
Tasek Bera, Pahang; Bukit Merah Reservoir, Perak
Linking data from other ontologies
We may consider the ontology evaluation process either from the technical point of view (quality of the designed ontology), or from the practical view (usability of the designed ontology). For the purpose of evaluation of quality of the designed ontologies, we adopted five criteria suggested by Gruber  against which these ontologies will be evaluated. This methodology was successfully used previously to evaluate the Protein Ontology . The five criteria are clarity, coherence, extendibility, ontology commitment and encoding bias. A discussion on how these criteria are applied to the concepts and properties in MHBI ontology is presented in the Results section.
We introduce some level of formality into this discussion by adopting criteria suggested by Gruber  against which the ontology needs to be evaluated.
No Cardinality Restriction on Transitive Properties
No Classes or Properties in Enumerations
No Import of System Ontologies
No Properties with Class as Range
No Sub Classes of RDF Classes
No Super or Sub Properties of Annotation Properties
Transitive Properties cannot be Functional
Domain of a Property should not be empty
Domain of a Property should not contain redundant Classes
Range of a Property should not contain redundant Classes
Domain of a Sub Property can only narrow Super Property
Range of a Sub Property can only narrow Super Property
Inverse of Functional must be Inverse Functional
Inverse of Inverse Functional must be Functional
Inverse of Sub Property must be Subproperty of Inverse of Super Property
Inverse of Symmetric Property must be Symmetric Property
Inverse of Top Level Property must be Top Level Property
Inverse of Transitive Property must be Transitive Property
Inverse Property must have matching Range and Domain
Results of the Test 1 to Test 3 are presented in Additional file 1. As shown in the results, domain and range of all the properties are assigned and no contain redundant classes.
One of the results for Test 6 and Test 7 were applicable on the typeForName and part properties. If a property is inverse functional, then it means that the inverse property is functional . For example, as illustrated in Figure 4, in this ontology, typeForName is functional property while part is inverse functional property. Thus, we can state that BifBaungi typeForName for bif-baungi-vb-i1, and then because of the inverse property we can infer that bif-baungi-vb-i1 part of BifBaungi.
An example for the result of Test 11 is illustrated as well in Figure 4. It shows an example of the transitive property isBelong. Since Bifbaungi isbelong to Bifurcohaptor, and Bifurcohaptor isbelong to Ancylodicoididae, then we can infer that Bifbaungi isbelong to Ancylodicoididae. As for inverse of transitive property hasSpecies, we can infer that Ancylodicoididae hasSpecies Bifbaungi. Furthermore, as presented in Additional file 1, inverse property in this example was fulfilled the Test 12 whereby it matched the range and domain.
It should be possible to extend the ontology without altering the existing definitions. The requirement of easy ontology extension is quite an important feature as new knowledge emerges each day and may need to be added to an already existing ontology. To make MHBI Ontology extendable, the design consists of a hierarchical classification of concepts represented as classes, from general to specific. In MHBI ontology the notions classification, reasoning, and consistency are applied by defining new concepts from defined generic concepts. The concepts derived from generic concepts are placed precisely into the class hierarchy of MHBI Ontology to completely represent information defining a specimen.
Ontology representation language should be as independent as possible from the use of the ontology. While developing MHBI Ontology, the choice of representation language as OWL  will keep the encoding bias to a minimum as MHBI ontology will be used by all stakeholders of taxonomy domain like: domain experts, pharmaceutical companies, researchers and students.
In this paper, we have used the TDWG LSID vocabulary to represent our data using semantics and we have also defined new vocabulary which is specific for annotating monogenean haptoral bar images (see Additional file 1 for the list and description).
MHBI and MHBI-fish ontologies
Discussion and conclusions
Semantic annotations of morphological descriptions that have been proposed till date have no information on the actual annotation of morphological descriptions or morphological images . In this paper, we have annotated the monogenean images semantically and have developed a MHBI ontology which was eventually merged with a Fish ontology forming MHBI-Fish ontologies. This will enable linking of information from the monogenean ontology to the host species found in the fish ontology without changing the underlying schema for either of the ontologies.
To semantically represent our data we have used the vocabularies in TDWG LSID  which is the standard semantic naming convention for biodiversity information. We have also defined new vocabulary (Additional file 1) because this is the first time that images of the monogenean diagnostic hard part are being annotated semantically. In this paper, we have identified 7 concepts, and 27 properties (object and datatype properties in ontology) to represent descriptions of 159 images (instance s) (see Table 2).
In the future, we intend to work on developing a semantic query model through which a researcher can search using any word or phrase related to monogeneans and their hosts. In the future we also intend annotate images of other diagnostic hard parts to build a complete monogenean ontology. We will also build specific ontologies for the all the monogenean hosts such as fish, amphibians and reptiles. These ontologies will form the basis of a monogenean knowledge base to assist researchers in retrieving information for their analysis.
Furthermore, query results from the MHBI ontology presented in this paper are used as training set images for the Content Based Image Retrieval (CBIR). We have used this ontology to improve the efficiency of CBIR for Biodiversity [20, 21]. As a result the relevancy rate of results provided by CBIR increases due to the decrease in the size of the training set as most the images are relevant to the query image. Also the retrieved images in the CBIR results are annotated, providing more information to the researcher.
This project was supported by the University of Malaya’s Postgraduate Research Fund (PS284/2009B) to the first author and the University of Malaya Research Grant (RG053/09SUS) to the second and fourth authors.
- Biota: The Biodiversity Database Manager. http://viceroy.eeb.uconn.edu/Biota
- Inside Wood - Search the Inside Wood Database http://insidewood.lib.ncsu.edu
- MonoDb Homepage. http://www.monodb.org/index.php
- Goldberg IG, Allan C, Burel JM, Creager D, Falconi A, Hochheiser H, Johnston J, Mellen J, Sorger PK, Swedlow JR: The Open Microscopy Environment (OME) Data Model and XML File: Open Tools for Informatics and Quantitative Analysis in Biological Imaging. Genome Biol 2005, 6: R47. 10.1186/gb-2005-6-5-r47PubMed CentralView ArticlePubMed
- Ahmed WM, Lenz D, Jia L, Robinson JP, Ghafoor A: XML-Based Data Model and Architecture for a Knowledge-Based Grid-Enabled Problem-Solving Environment for High-Throughput Biological Imaging. Information Technology in Biomedicine, IEEE Transactions on 2008,12(2):226-240. 10.1109/TITB.2007.904153View Article
- Plazi: Access to Taxonomic Literature. http://plazi.org/
- Arpah A Master thesis. In The use of information classification in face recognition and identification using eigenfaces. Kuala Lumpur: University of Malaya; 2007.
- Arpah A, Sarinder KKS, Lim LHS: A Database Management System (DBMS) for Monogenean Taxonomy. In Proceedings of 2010 International Conference on Environmental Science and Technology: 23-24 April 2010; Bangkok, Thailand. Edited by: Saji B, Parvinder Singh S. Research Publishing Services, Singapore; 2010.
- Taniar D, Rusu LI: Strategic Advancements in Utilizing Data Mining and Warehousing Technologies: New Concepts and Developments. Hershey, Pennsylvania (USA): IGI Global; 2010.View Article
- Toby S, Colin E, Jamie T: Programming the Semantic Web. Sebastopol, CA: O’Reilly Media; 2009.
- Taxonomic Data Working Group. http://tdwg.org
- Lim LHS: Bravohollisia bychowsky and Nagibina, 1970 and Caballeria bychowsky and Nagibina, 1970 (Monogenea, Ancyrocephalidae) from Pomadasys-Hasta (Bloch) (Pomadasyidae), with the description of a new attachment mechanism. Syst Parasitol 1995,32(3):211-224. 10.1007/BF00008830View Article
- Gruber TR: Towards Principles for the Design of Ontologies Used for Knowledge Sharing. Int J Hum Comput Stud 1995, 43: 907-928. 10.1006/ijhc.1995.1081View Article
- Deborah LM, Frank van H: OWL Web Ontology Language Overview. http://www.w3.org/2004/OWL/
- Sidhu AS, Dillon TS, Chang E, Sidhu BS: Protein ontology: vocabulary for protein data. In 3rd International IEEE Conference on Information Technology and Applications. Edited by: He X, Hintz T, Piccardi M, Wu Q, Huang M, Tien D. Sydney: IEEE CS Press; 2005:465-469.
- Sidhu AS, Dillon TS, Chang E: 2007, Protein Ontology. In Biological Database Modeling. Edited by: Chen J, Sidhu AS. New York: Artech House; 2007:63-80.
- Protégé. http://protege.stanford.edu/
- Michael KS, Chris W, Deborah LM: OWL Web Ontology Language Guide. http://www.w3.org/TR/owl-guide/
- Cui H: Semantic Annotation of Morphological Descriptions: An Overall Strategy. BMC Bioinforma 2010, 11: 278. 10.1186/1471-2105-11-278View Article
- Arpah A Phd Thesis (Unpublished). In Architecture for Biodiversity Image Retrieval Using Ontology and Content Based Image Retrieval (CBIR). Kuala Lumpur: University of Malaya; 2012.
- Arpah A, Lim SLH, Amandeep SS, Sarinder KD: Biodiversity image retrieval framework for monogeneans. Systematics and Biodiversity 2013. 10.1080/14772000.2012.761655
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.