An ontology for representing hematologic malignancies: the cancer cell ontology
BMC Bioinformatics volume 20, Article number: 181 (2019)
Within the cancer domain, ontologies play an important role in the integration and annotation of data in order to support numerous biomedical tools and applications. This work seeks to leverage existing standards in immunophenotyping cell types found in hematologic malignancies to provide an ontological representation of them to aid in data annotation and analysis for patient data.
We have developed the Cancer Cell Ontology according to OBO Foundry principles as an extension of the Cell Ontology. We define classes in Cancer Cell Ontology by using a genus-differentia approach using logical axioms capturing the expression of cellular surface markers in order to represent types of hematologic malignancies. By adopting conventions used in the Cell Ontology, we have created human and computer-readable definitions for 300 classes of blood cancers, based on the EGIL classification system for leukemias, and relying upon additional classification approaches for multiple myelomas and other hematologic malignancies.
We have demonstrated a proof of concept for leveraging the built-in logical axioms of the ontology in order to classify patient surface marker data into appropriate diagnostic categories. We plan to integrate our ontology into existing tools for flow cytometry data analysis to facilitate the automated diagnosis of hematologic malignancies.
We live in an age of ever-increasing troves of data comprised of genomic, imaging and clinical information. Within the field of medicine, oncology leads in this regard by using large, varied datasets to refine therapies and stratify patient populations into meaningful subgroups. The explosion of data heralds new challenges for researchers and technologists that struggle to keep track of these mountains of information while maintaining interoperability between systems.  One of the methods to manage, sort, and analyze this data comes in the form of ontologies, which are representational artifacts comprised of universals and the relations between them that designate entities in reality.  Briefly, ontologies at their core are semantic terminologies that exist as two types: reference ontologies, which embody established knowledge via rich, precise meanings for terms in a domain, and application ontologies, which are designed for a specific purpose and weave together sets of related classes from reference ontologies in order to represent the entities of complex domains. [3, 4]
The cell ontology
A description of past ontologies that represent cells is warranted as the work presented here directly builds upon these artifacts. The Cell Ontology (CL) was originally developed in 2005 with the goal of representing a variety of cell types from the prokaryotic, fungal, animal and plant worlds.  As interest and support has shifted over the years, the scope of the CL has shifted to focus primarily on vertebrate cell types with special attention to hematopoietic cell types. [6, 7] The CL links to other ontologies within the Open Biological and Biomedical Ontology (OBO) foundry via relations from the Relations Ontology. [8, 9] These relations often take the form of has_plasma_membrane_part to connect cell types to appropriate surface markers found in the Protein Ontology. Similarly, the relations, has_high_plasma_membrane_amount and has_low_plasma_membrane_amount are used within computable definitions to denote surface protein expression that is above or below the mean of a population of cells and were originally described in Masci et al., and generally relate to relative expression values determined by flow cytometry.  Lastly, negative criteria are also implemented in definitions using lacks_part and lacks_plasma_membrane_part relations.
Within the cancer domain, ontologies are an important component of numerous biomedical tools and applications. Without question, the most successful ontology in cancer research is the Gene Ontology (GO) owing to its widespread use. A PubMed search using the search terms “cancer”, “oncology” and “gene ontology” reveals hundreds of articles published within the last five years. Even after excluding the GO, it is apparent a number of diverse ontologies have seen varied application in cancer research. Longstanding ontologies have been used to annotate and integrate oncologic data. The Foundational Model of Anatomy has been used to annotate biomarkers for brain tumors while the Disease Ontology (DO) has been used to integrate several databases into a cohesive set. [11, 12] Newer ontologies have cropped up in recent years to represent the many facets of cancer care including ontologies representing staging systems (TMN ontology), cancer treatments, brachytherapy (ENT COBRA ontology), and after-care treatment plans that enhance patient engagement (Profile Ontology for Adolescent and Young Adult Cancer Survivors). [13,14,15,16] Additionally, ontologies support text mining applications, clinical decision support systems, the analyzing of adverse events, and the targeting of cancer drugs. [17,18,19,20] The National Cancer Institute (NCI) Thesaurus is one of the largest and most widely used resources within the field of cancer. Although technically a terminology with ontology-like features, the NCI Thesaurus covers 110,000 terms in 36,000 concepts in the cancer research domain and arose from a need to integrate varied data systems through a unified coding system. [21, 22]
Diagnosing blood cancers
The diagnosis of hematologic malignancies like acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) involves a myriad of tests and clinical examination ranging from a simple history and physical to advanced genetic testing. In terms of pathological assessment, patient blood samples or bone marrow aspirates are stained and examined for the morphology of cancer cells. Karyotyping provides information on chromosomal translocations. Recently, genetic analysis either through microarrays or sequencing provides detailed insight into the molecular characterizations of cell populations.  Flow cytometry is another mainstay of blood cancer diagnosis whereby a laser examines the emission properties of cells labeled with fluorochrome-conjugated antibodies in a suspension. These antibodies are specific for cell markers of interest and are commercially available as a product that contains an attached fluorochrome. The flow cytometer is capable of assessing the expression of markers, which are typically surface proteins, on millions of cells in real-time.  The output of this process is referred to as the immunophenotype of a cell. Each antibody-fluorochrome conjugate can emit a different range of light wavelengths allowing for simultaneous assessment of multiple markers. Laboratories apply antibody panels to patient specimens to typically examine four to eight different markers at a time although the use of larger panels is increasingly possible as technology improves. The definition of positivity for markers has changed over time as technical sensitivity has improved. In the past, a simple cutoff of 20% of the population bearing a marker was labeled as positive. Increasingly, this method has been superseded by comparing the fluorescence shift and distribution pattern of cancer cell populations to appropriate controls. 
In the current work, we have created an ontology, the Cancer Cell Ontology (CCL), that represents cancer cell types in the domain of hematologic malignancies, namely acute lymphoblastic leukemia, acute myeloid leukemia and multiple myeloma, using immunophenotypes as differentia.
The CCL was created with the latest version of Protégé (5.2.0) developed by the Stanford Center for Biomedical Informatics Research.  Our ontology imports the entirety of the CL, which indirectly imports modules from the Protein Ontology (PRO), the Chemical Entities of Biological Interest ontology (CHEBI), the Phenotypic Quality Ontology (PATO), the Cell Line Ontology (CLO), the Relation Ontology (RO), the National Center for Biotechnology Information taxonomy (NCIT), the Uber Anatomy Ontology (UBERON) and the GO. The CCL also directly imports small OWL modules from PRO and CHEBI. [9, 27,28,29,30,31,32,33] The total size of the CCL as viewed in Protégé is 6900 classes. 6600 of these classes have been imported from the CL and 300 new terms have been added. Roughly three dozen terms were reused from the CL and primarily consist of PRO terms. The ELK 0.4.3 reasoner was used for inferential reasoning.  All terms added by the CCL have been manually reviewed for errors of inconsistency. Additionally, Protégé’s built-in debugging tool found no errors. A GitHub page with the latest version of the ontology is available at: https://github.com/LucasSerra1/CCL.git for viewing.
The CCL was constructed according to published guidelines of best practices in ontology development and adheres to the principles put forth by the OBO foundry such as openness, a common format, textual definitions, well-defined relations, etc. [2, 8] A genus-differentia approach was taken to construct the new classes by using surface marker expression as the main axis. There exist many schemas for classifying leukemia with systems such as WHO classification, the French–American–British classification system, St. Jude’s system and the European Group for Immunophenotyping Leukemia (EGIL) system. [35,36,37] The EGIL system was selected as the backbone hierarchy for the ontology due to a few compelling factors. EGIL does not represent simply a single institution’s idea of leukemia. The EGIL system is the continuously developing result of years of discussion and consensus from what is now known as the Euroflow Consortium, a group composed of more than forty researchers spanning eight nations that began in 2006.  As a consequence, this system has seen widespread use in laboratories and initial proposals have seen exorbitant numbers of citations.  Additionally, this leukemia classification system relies exclusively on the immunophenotypes of leukemic cells, which is precisely the information we wish to capture in our ontology. Table 1 is a diagram of the overarching structure of the EGIL system.
Unfortunately, there exists no such unified classification system for other blood cancers like multiple myeloma. Instead, we pooled several antibody panels geared towards the diagnosis of multiple myeloma that had been examined in recent literature. The consensus of four separate studies, which included review articles, was used to create the classes of multiple myeloma within our ontology. [40,41,42,43]
Definitions, relations, and structure
The CCL exists as an extension of the CL. The root term of the CCL ‘hematologic malignant cell’ resides under the CL parent term ‘malignant cell’, which exists under the terms ‘neoplastic cell’ and ‘abnormal cell’. Our ontology contains human and machine-readable definitions for every added class. Many of these definitions follow a similar format to definitions found in the CL by linking necessary surface markers to the corresponding elements in the Protein Ontology. The textual definitions of the CCL upper level terms are defined according to the cell lineage the aberrant cell is derived from. For instance, ‘acute lymphoblastic leukemic cell’ is defined as a “hematologic malignant cell whose precursor is of lymphoid lineage”. Lower level child terms are instead textually defined according to marker expression as seen in the definition of ‘pre-B CD19-positive, CD22-positive acute lymphoblastic leukemic cell’, which states that this entity is a “B lineage CD19-positive, CD22-positive acute lymphoblastic leukemic cell that is cytoplasmic Ig mu positive”. The axiomatic definitions follow the same structure and use relations such as has_plasma_membrane_part, has_cytoplasm_part and lacks_part to denote marker positivity status (Fig. 1). The CCL also includes two new relations, has_cytoplasm_part and lacks_cytoplasm_part. These relations are absent from the imported ontologies and were needed to represent the handful of cases that use cytoplasmic markers to distinguish classes. The definitions of these relations mirror their plasma membrane counterparts from the CL as has_cytoplasm_part is defined as holding “between a cell c and a protein complex or protein p if and only if that cell has as part a cytoplasm, and that cytoplasm has p as part,” which means that if a protein is found in the cytoplasm of a cell then that cell has the protein as a cytoplasmic part.
The CCL contains over 300 classes representing cancer cell types found in acute lymphoblastic leukemia, acute myeloid leukemia and multiple myeloma, which are based upon widely used classification systems and pooled studies. Each class is differentiated based upon surface marker expression and has human and machine-readable definitions that are composed of necessary and sufficient conditions. We have also created test instances of patient data with a series of positive or negative marker expression. After running the reasoner, the test data was sorted into matching CCL classes, which demonstrates the utility of our ontology towards automatically sorting patient cell data into relevant diagnostic groups.
The CCL is a natural evolution of the work started on hematopoietic cells in the CL and is the first ontology to represent the cell types of hematologic malignancies. This ontology contributes to the process of cancer research in a number of ways. This is new work that can complement work that already exists in the NCIT and DO. Our ontology enables sophisticated queries on patient data and allows researchers to efficiently examine surface marker expression. This approach could allow for easier stratification of novel subgroups of patients. The CCL also enables easier integration of disparate data sources by providing a structured semantic representation and explicit, well-crafted definitions that are human and computer readable. With respect to the antibody panels themselves, our ontology lends itself to the creation of a separate antibody panel ontology that could allow researchers to determine whether there exists agreement across laboratories in regards to antibody panel composition and the definition of cancer cell types based on reactivity to the panels. We realize our focus on immunophenotypes paints an incomplete picture of blood cancer cells and omits important diagnostic and prognostic information regarding these malignancies. It is our plan that future development of the CCL includes integration of cytogenetic and morphological logical definitions and allows for utilization of patient-specific information. This would, in turn, allow for a more holistic description of blood cancer cells and enable more accurate classifications of malignancies to further subtype patients.
The sheer number and diversity of surface markers also presents an issue to future development of the CCL. As each individual class is defined as some combination of surface marker status, adding new surface markers increases the work of adding new classes exponentially. Luckily, current panels at most use eight antibody-conjugates, but with additional markers we see a combinatorial explosion in terms of class definitions. In the same vein, every cancer patient is unique and each disease phenotype is unique. However, there appear to be common patterns of marker expression for the various cancer types, and deep phenotyping may reveal important subgroups that relate to speed of disease progression and responsiveness to treatments.
Perhaps most importantly, we have shown a proof of concept for leveraging the built-in logical axioms of the ontology in order to classify patient surface marker data into appropriate diagnostic categories. The current work will eventually be part of a larger framework involving a combination of FLOCK clustering analysis of the raw flow cytometry data in combination with the flowCL tool, which will be modified to match cell types to corresponding entities in the CL and CCL. [44, 45] By incorporation CCL into this software system we hope to facilitate the automated diagnosis of blood cancers. Future iterations of the CCL will incorporate additional classification systems and represent a broader range of blood cancers. We also plan to relate each cancer cell type to its immediate normal precursor in the style of the CL.
The CCL is the first ontology to represent hematologic malignancies solely via their immunophenotypes and succeeds as a first step towards increased automation in the diagnosis of blood cancers. We plan to integrate our ontology into existing tools for flow cytometry data analysis to facilitate the automated diagnosis of hematologic malignancies.
Kadadi A, Agrawal R, Nyamful C, Atiq R, editors. Challenges of data integration and interoperability in big data. 2014 IEEE International Conference on Big Data (Big Data); 2014 27–30 Oct. 2014.
Arp R, Smith B, Spear AD. Building ontologies with basic formal ontology: the MIT press; 2015. 248 p.
Zemmouchi-Ghomari L, Ghomari AR, editors. Reference Ontology. 2009 Fifth International Conference on Signal Image Technology and Internet Based Systems; 2009 Nov. 29 2009-Dec. 4 2009.
Brinkley JF, Suciu D, Detwiler LT, Gennari JH, Rosse C, Structural IG. A framework for using reference ontologies as a foundation for the semantic web. AMIA Ann Symp Proc. 2006;2006:96–100.
Bard J, Rhee SY, Ashburner M. An ontology for cell types. Genome Biology. 2005;6(2):R21-R.
Diehl AD, Augustine AD, Blake JA, Cowell LG, Gold ES, Gondré-Lewis TA, et al. Hematopoietic cell types: prototype for a revised cell ontology. J Biomed Inform. 2011;44(1):75–9.
Diehl AD, Meehan TF, Bradford YM, Brush MH, Dahdul WM, Dougall DS, et al. The cell ontology 2016: enhanced content, modularization, and ontology interoperability. Journal of Biomedical Semantics. 2016;7(1):44.
Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, et al. The OBO foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007;25:1251.
Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax J, et al. Relations in biomedical ontologies. Genome Biology. 2005;6(5):R46-R.
Masci AM, Arighi CN, Diehl AD, Lieberman AE, Mungall C, Scheuermann RH, et al. An improved ontological representation of dendritic cells as a paradigm for all cell types. BMC bioinformatics. 2009;10:70-.
Wu T-J, Schriml LM, Chen Q-R, Colbert M, Crichton DJ, Finney R, et al. Generating a focused view of disease ontology cancer terms for pan-cancer data integration and analysis. Database: The Journal of Biological Databases and Curation. 2015;2015:bav032.
Butler WE, Atai N, Carter B, Hochberg F. Informatic system for a global tissue–fluid biorepository with a graph theory–oriented graphical user interface. Journal of Extracellular Vesicles. 2014;3:https://doi.org/10.3402/jev.v3.24247.
Boeker M, França F, Bronsert P, Schulz S. TNM-O: ontology support for staging of malignant tumours. Journal of Biomedical Semantics. 2016;7:64.
Gao M, Warner J, Yang P, Alterovitz G. On the Bayesian derivation of a treatment-based Cancer ontology. AMIA Summits on Translational Science Proceedings. 2014;2014:209–17.
Tagliaferri L, Kovács G, Autorino R, Budrukkar A, Guinot JL, Hildebrand G, et al. ENT COBRA (consortium for brachytherapy data analysis): interdisciplinary standardized data collection system for head and neck patients treated with interventional radiotherapy (brachytherapy). Journal of Contemporary Brachytherapy. 2016;8(4):336–43.
Myneni S, Amith M, Geng Y, Tao C. Towards an ontology-driven framework to enable development of personalized mHealth solutions for Cancer survivors’ engagement in healthy living. Studies in health technology and informatics. 2015;216:113–7.
Spasic I, Livsey J, Keane JA, Nenadic G. Text mining of cancer-related information: review of current status and future directions. Int J Med Inform. 2014;83(9):605–23.
Shen Y, Colloc J, Jacquet-Andrieu A, Lei K. Emerging medical informatics with case-based reasoning for aiding clinical decision in multi-agent system. J Biomed Inform. 2015;56:307–17.
Wang C, Zimmermann MT, Prodduturi N, Chute CG, Jiang G. Adverse drug event-based stratification of tumor mutations: a case study of breast Cancer patients receiving aromatase inhibitors. AMIA Ann Symp Proc. 2014;2014:1160–9.
Regan K, Raje S, Saravanamuthu C, Payne PRO. Conceptual knowledge discovery in databases for drug combinations predictions in malignant melanoma. Studies in health technology and informatics. 2015;216:663–7.
Fragoso G, de Coronado S, Haber M, Hartel F, Wright L. Overview and utilization of the NCI thesaurus. Comparative and Functional Genomics. 2005;5(8):648–54.
Ceusters W, Smith B, Goldberg L. A terminological and ontological analysis of the NCI thesaurus. Methods Inf Med. 2005;44(4):498–507.
Onciu M. Acute lymphoblastic leukemia. Hematol Oncol Clin North Am. 2009;23(4):655–74.
Pockley AG, Foulds Gemma A, Oughton Julie A, Kerkvliet Nancy I, Multhoff G. Immune Cell Phenotyping Using Flow Cytometry. Current Protocols in Toxicology. 2015;66(1):18.8.1–.8.34.
Dworzak Michael N, Buldini B, Gaipa G, Ratei R, Hrusak O, Luria D, et al. AIEOP-BFM consensus guidelines 2016 for flow cytometric Immunophenotyping of pediatric acute lymphoblastic leukemia. Cytometry B Clin Cytom. 2017;94(1):82–93.
Noy NF, Crubézy M, Fergerson RW, Knublauch H, Tu SW, Vendetti J, et al. Protégé-2000: An Open-Source Ontology-Development and Knowledge-Acquisition Environment: AMIA 2003 Open Source Expo. AMIA Annual Symposium Proceedings. 2003;2003:953-.
Natale DA, Arighi CN, Barker WC, Blake JA, Bult CJ, Caudy M, et al. The protein ontology: a structured representation of protein forms and complexes. Nucleic Acids Res. 2011;39(Database issue):D539–D45.
Degtyarenko K, de Matos P, Ennis M, Hastings J, Zbinden M, McNaught A, et al. ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res. 2008;36(Database issue):D344–D50.
Meehan TF, Masci AM, Abdulla A, Cowell LG, Blake JA, Mungall CJ, et al. Logical Development of the Cell Ontology. BMC bioinformatics. 2011;12:6-.
Sarntivijai S, Lin Y, Xiang Z, Meehan TF, Diehl AD, Vempati UD, et al. CLO: The cell line ontology. Journal of Biomedical Semantics. 2014;5:37-.
Federhen S. The NCBI taxonomy database. Nucleic Acids Res. 2012;40(Database issue):D136–D43.
Mungall CJ, Torniai C, Gkoutos GV, Lewis SE, Haendel MA. Uberon, an integrative multi-species anatomy ontology. Genome Biol. 2012;13(1):R5.
The Gene Ontology C, Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–9.
Kazakov Y, Krötzsch M, Simancik F. ELK : a reasoner for OWL EL ontologies ( technical report ). Conference Proceedings. 2012.
Arber DA, Orazi A, Hasserjian R, Thiele J, Borowitz MJ, Le Beau MM, et al. The 2016 revision to the World Health Organization (WHO) classification of myeloid neoplasms and acute leukemia. Blood. 2016.
van Eys J, Pullen J, Head D, Boyett J, Crist W, Falletta J, et al. The French-American-British (FAB) classification of leukemia. The pediatric oncology group experience with lymphocytic leukemia. Cancer. 1986;57(5):1046–51.
FG B. Classification of acute leukemias. . In: C-H P, editor. Treatment of acute leukemias. Totowa, NJ: Humana Press; 2003. p. p 43–58.
van Dongen JJM, Orfao A. EuroFlow: resetting leukemia and lymphoma immunophenotyping. Basis for companion diagnostics and personalized medicine. Leukemia. 2012;26(9):1899–907.
Bene M, Castoldi G, Knapp W, Ludwig W-D, Matutes E, Orfao A, et al. Proposals for the immunological classification of acute leukemias. European Group for the Immunological Characterization of Leukemias (EGIL)1995. 1783–6 p.
Kumar S, Kimlinger T, Morice W. Immunophenotyping in multiple myeloma and related plasma cell disorders. Best Pract Res Clin Haematol. 2010;23(3):433–51.
Paiva B, Almeida J, Perez-Andres M, Mateo G, Lopez A, Rasillo A, et al. Utility of flow cytometry immunophenotyping in multiple myeloma and other clonal plasma cell-related disorders. Cytometry B Clin Cytom. 2010;78(4):239–52.
Pojero F, Casuccio A, Parrino MF, Cardinale G, Colonna Romano G, Caruso C, et al. Old and new immunophenotypic markers in multiple myeloma for discrimination of responding and relapsing patients: the importance of "normal" residual plasma cell analysis. Cytometry B Clin Cytom. 2015;88(3):165–82.
Raja KR, Kovarova L, Hajek R. Review of phenotypic markers used in flow cytometric analysis of MGUS and MM, and applicability of flow cytometry in other plasma cell disorders. Br J Haematol. 2010;149(3):334–51.
Courtot M, Meskas J, Diehl AD, Droumeva R, Gottardo R, Jalali A, et al. flowCL: ontology-based cell population labelling in flow cytometry. Bioinformatics. 2015;31(8):1337–9.
Qian Y, Wei C, Lee FE-H, Campbell J, Halliley J, Lee JA, et al. Elucidation of seventeen human peripheral blood B cell subsets and quantification of the tetanus response using a density-based method for the automated identification of cell populations in multidimensional flow cytometry data. Cytometry B Clin Cytom. 2010;78(Suppl 1):S69–82.
We thank Paul Wallace of Roswell Park Comprehensive Cancer Center for providing background on flow cytometry of hematologic malignancies.
AD was supported by 5UL1TR001412 (NCATS). WD was supported in part by Cancer Center Support Grant NCI P30CA16056. Publication costs are funded by the Department of Biomedical Informatics, University at Buffalo.
Availability of data and materials
Project name: The Cancer Cell Ontology.
Project home page: https://github.com/LucasSerra1/CCL.git
Other requirements: Protégé version 5.2.0 or higher is recommended.
No restrictions to use.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 20 Supplement 5, 2019: Selected articles from the Second International Workshop on Cells in ExperimentaL Life Sciences (CELLS-2018) at the 2018 International Conference on Biological Ontology (ICBO-2018). The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-20-supplement-5.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Serra, L.M., Duncan, W.D. & Diehl, A.D. An ontology for representing hematologic malignancies: the cancer cell ontology. BMC Bioinformatics 20 (Suppl 5), 181 (2019). https://doi.org/10.1186/s12859-019-2722-8
- Blood cancer
- Cell ontology
- Cancer cell ontology