BioCause: Annotating and analysing causality in the biomedical domain
© Mihăilă et al.; licensee BioMed Central Ltd. 2013
Received: 4 October 2012
Accepted: 29 December 2012
Published: 16 January 2013
Biomedical corpora annotated with event-level information represent an important resource for domain-specific information extraction (IE) systems. However, bio-event annotation alone cannot cater for all the needs of biologists. Unlike work on relation and event extraction, most of which focusses on specific events and named entities, we aim to build a comprehensive resource, covering all statements of causal association present in discourse. Causality lies at the heart of biomedical knowledge, such as diagnosis, pathology or systems biology, and, thus, automatic causality recognition can greatly reduce the human workload by suggesting possible causal connections and aiding in the curation of pathway models. A biomedical text corpus annotated with such relations is, hence, crucial for developing and evaluating biomedical text mining.
We have defined an annotation scheme for enriching biomedical domain corpora with causality relations. This schema has subsequently been used to annotate 851 causal relations to form BioCause, a collection of 19 open-access full-text biomedical journal articles belonging to the subdomain of infectious diseases. These documents have been pre-annotated with named entity and event information in the context of previous shared tasks. We report an inter-annotator agreement rate of over 60% for triggers and of over 80% for arguments using an exact match constraint. These increase significantly using a relaxed match setting. Moreover, we analyse and describe the causality relations in BioCause from various points of view. This information can then be leveraged for the training of automatic causality detection systems.
Augmenting named entity and event annotations with information about causal discourse relations could benefit the development of more sophisticated IE systems. These will further influence the development of multiple tasks, such as enabling textual inference to detect entailments, discovering new facts and providing new hypotheses for experimental work.
Due to the ever-increasing number of innovations and discoveries in the biomedical domain, the amount of knowledge published daily in the form of research articles is growing exponentially. This has resulted in the need to provide automated, efficient and accurate means of retrieving and extracting user-oriented biomedical knowledge [1-4]. In response to this need, the biomedical text mining community has accelerated research and the development of tools. Text is being enriched via the addition of semantic metadata and thus supports tasks such as analysing molecular pathways  and semantic searching .
Reviews  show that, over the last decade, biomedical text mining has seen significant advancements, ranging from semantically foundational tasks, such as named entity recognition , coreference resolution [9, 10] and relation [11, 12] and event extraction [13-17], to more complex tasks, e.g., automatic summarisation [18, 19], question answering [20, 21], multimedia  and even multi- and cross-lingual information retrieval and extraction [23, 24]. The heterogenous tools resulting from this research can also be combined into workflows, using systems such as U-Compare  and Argo . Furthermore, there has been much interest recently in studying the intentions expressed in text, also known as meta-knowledge [27, 28]. This includes, amongst others, recognising sentences which contain speculation [29-32], negation [31-33] or manner . Other researchers who have looked at biomedical articles noticed significant differences between abstracts and full papers regarding structural, morpho-syntactic and discourse features  and event and meta-knowledge aspects . Others define various discourse zones and try to determine automatically to which zone a sentence belongs .
One of the most important outcomes of the recent research undertaken into biomedical text mining is the large number of newly created, manually annotated corpora. Examples of such resources are the widely used GENIA corpus , GENETAG  and other corpora from shared tasks, such as BioNLP ST 2009 and 2011 [16, 17]. Although these resources have been designed for their target tasks, they are not necessarily restricted to their respective task and can provide support for other tasks as well. Data reuse is both highly demanded and occurs frequently, as it saves important amounts of human effort, time and money. For instance, the GENIA corpus, which initially contained only named entity annotations, has been extended, partially or fully, by various researchers and groups, to include event annotations and meta-knowledge information.
However, until now, comparatively little work has been carried out on discourse relations in the biomedical domain. The notion of discourse can be defined as a coherent sequence of clauses and sentences. These are connected in a logical manner by discourse relations, such as causal, temporal and conditional, which characterise how facts in text are related. In turn, these help readers infer deeper, more complex knowledge about the facts mentioned in the discourse. These relations can be either explicit or implicit, depending on how they are expressed in text - using overt discourse connectives (also known as triggers) or not, respectively.
Statements regarding causal associations have been long studied in general language, mostly as part of more complex tasks, such as question answering [40, 41] and textual entailment . Despite this, a single, unified theory of causality has not yet emerged, be it in general or specialised language. There are several pieces of work which characterise how annotators perceive causality and the mechanisms they employ to identify it. For instance, some researchers have showed that causality cannot be identified using intuitive testing techniques in a conscious manner . Therefore, they devise an experiment to select features which allow annotators to coherently identify causality, such as rewording, temporal asymmetry, counterfactuality and various linguistic tests. Other, independent results are similar and show that necessary and sufficient conditions are not enough to achieve satisfactory inter-annotator agreement and that paraphrasing is a much more useful method .
In the case of PmrB, a normal response to mild acid pH requires not only a periplasmic histidine but also several glutamic acid residues. Therefore, regulation of PmrB activity may involve protonation of one or more of these amino acids.
This medium lacked Fe3+ or Al3+, the only known PmrB ligands (Wosten et al., 2000), and contained 10 mM MgCl2, which represses expression of PmrA-activated genes (Soncini and Groisman, 1996; Kox et al., 2000).
Amongst the large number of corpora that have been developed for biomedical text mining purposes, several include the annotation of statements regarding causal associations, such as BioInfer , GENIA  and GREC . However, these corpora do not include an exhaustive coverage of causal statements. Furthermore, the granularity of the annotation of such statements is limited in several respects, which are described below. Since such corpus resources underlie most currently existing methods for the automatic analysis of biomedical text, there is an opportunity to advance the state of the art in domain-specific IE and text mining through the improvement of annotation schemata, resources and methods in the area of causal association statements.
The development of tools and resources for the automatic analysis of statements of causality is thus of key importance to information extraction and text mining in domain-specific scientific text. In this paper, we provide an overview of how causality is captured in three types of biomedical research efforts, namely biocuration efforts, pathway models and biomedical corpora. We then describe guidelines for the annotation of statements associated with causal relationships in biomedical texts and present BioCause, a corpus that has been created according to these guidelines. Finally, we analyse the causality annotations and the agreement achieved between the annotators.
Causality in biocuration efforts
General, non-specific physical causation is of obvious interest in biocuration efforts such as the assignment of Gene Ontology (GO)  terms to genes to characterise gene functions , in part because detailed molecular-level interactions are rarely known when a phenomenon is first observed. For example, an effect due to P1 positively regulating the expression of P2 through activation of a transcription factor of P2 by catalysing its phosphorylation may be first observed, reported and curated simply as P1 having a positive effect on the activity of P2. Yet, general terms of causality such as “cause” rarely appear in biomedical domain ontologies or other formalisations of the ways in which entities, processes and events are associated with each other. Instead, such formalisations frequently apply terms such as “regulation”, “stimulation” and “inhibition”. Whilst such terms also carry specific senses in biology, their definitions in domain ontologies and use in biocuration efforts show that, typically, their scope effectively encompasses any general causal association.
GO regulation definitions
Regulation of a biological process
Any process that modulates the frequency, rate extent of a biological process. Biological processes are regulated by many means; examples include the control of gene expression, protein modification or interaction with a protein or substrate molecule.
Positive regulation of a biological process
Any process that activates or increases the frequency extent of a biological process. Biological processes are regulated by many means; examples include the control of gene expression, protein modification or interaction with a protein or substrate molecule.
Negative regulation of a biological process
Any process that stops, prevents or reduces the extent of a biological process. Biological processes are regulated by many means; examples include the control of gene expression, protein modification or interaction with a protein or substrate molecule.
The GO definition of regulation of biological process is thus broadly equivalent to the explicitly comprehensive definition “any process that has any effect on another biological process”. Furthermore, in a neutral biological context, the following pairs of statements are roughly synonymous according to the GO definitions:
“A affects B” → “A regulates B”
“A has a positive effect on B” → “A positively regulates B”
“A has a negative effect on B” → “A negative regulates B”
and the following hold :
“A causes B” ≈ “A positively regulates B”
“A prevents B” ≈ “A negatively regulates B”
One should also consider the exact GO synonyms of positive regulation (up regulation, up-regulation, upregulation of biological process and positive regulation of physiological process) and negative regulation (down regulation, down-regulation, downregulation of biological process and negative regulation of physiological process). Thus, whilst the observation that “causation” is rarely considered in general terms in domain curation, text annotation or IE, most of its scope covered in the many efforts that involve the general concept of regulation is physical causation.
Causality in pathway models
Pathway model curation is a specific biocuration task of particular interest to systems biology . Pathway curation efforts seek to characterise complex biological systems involving large numbers of entities and their reactions in detail using formal, machine-readable representations. The Systems Biology Markup Language (SBML) standard  (http://sbml.org) for pathway representation has been applied to a large number of curation efforts.
In particular, the SBML version used by the CellDesigner software  (http://celldesigner.org/) has been adopted by major efforts, such as PANTHER  (http://www.pantherdb.org/). As such, the SBML/CellDesigner reaction semantics are of significant interest to domain IE efforts seeking to support automatic pathway curation.
SBML/CellDesigner reaction modifications
Causality in biomedical corpora
Thus, general physical causality is broadly included in the scope of many domain resources annotated with structured representations for information extraction. However, the scopes of these annotations do exclude a variety of statements potentially involving causal associations. Restrictions include limitation to specific forms of expression such as only verbal and nominalised forms, annotation of explicit statements only and exclusion of statements that only suggest possible causal connections (“A happened after B”). Such limitations imply gaps between the full set of statements of interest and those annotated in domain resources and leave open a number of opportunities for further improvement of resources and tools for the analysis of causality in biomedical text.
Several other more discourse-oriented resources have also been created. The work most similar to ours is the BioDRB corpus , which is a collection of 24 open-access full-text biomedical articles selected from GENIA, containing annotations of 16 types of discourse relations, one of which is causality. It was created by adapting the framework of the Penn Discourse TreeBank , which annotates the argument structure, semantics and attribution of discourse relations and their arguments. The number of purely causal relations annotated in this corpus is 542. There are another 23 relations which are a mixture between causality and one of either background, temporal, conjunction or reinforcement relations. For machine learning purposes, this dataset is considered relatively small, as it might not capture sufficient contextual diversity to perform well on unseen data. Thus, a detailed comparison and combining this resource with the one described in this article represent an interesting oppurtunity for future work.
This section is concerned with the preparatory work required prior to the annotation of the causality corpus. We describe the data that we used and present an overview of the annotation scheme, the annotation tool and an evaluation of inter-annotator agreement.
It has been shown that there are significant differences between various biomedical sublanguages at the levels of syntax and discourse structure , as well as deeper semantics, such as named entity types [60, 61]. Therefore, observations made on one sublanguage may not necessarily be valid on another. We thus believe that attempting to train a machine-learning causality detection system on a mixture of subdomains would be detrimental to the learning process. Although we recognise that this choice is associated with high domain specificity, it is preferable to obtain a higher performance in a specific subdomain than a lower performance in a more general domain or a mixture of subdomains. Nevertheless, considering these differences, switching to a different subdomain should be simply a matter of re-training the classifier and re-creating the causality model. Of course, one can extend existing causality models by adding features that have not been encountered before. These would most probably be semantic features, such as a new typology for named entities and events, since these are specific to subdomains.
Furthermore, discourse causality is dependent on the named entities and events present in text. Therefore, in order to isolate the task of recognising causality from that of recognising entities and events, gold standard named entity and event annotations are needed.
Finally, it has been shown that although the information density is highest in abstracts, information coverage is much greater in full texts than in abstracts and thus these may be a better source of biologically relevant data [62, 63]. For these reasons, Causality annotation is added on the top of existing event annotations from the BioNLP Shared Task (ST) on Infectious Diseases (ID) . Whilst in other document sets, such as in those used for subdomain analysis , entity and event annotations are automatically created by NER and event extraction systems such as NERsuite (http://www-tsujii.is.s.u-tokyo.ac.jp/nersuite/) or EventMine , the BioNLP ST ID task has manually created annotations. Furthermore, the BioNLP ST ID corpus has a large size (19 documents) and is comprised of full-text articles.
The existing entity and event annotations have not been be modified in this causality annotation effort even if annotators have spotted mistakes.
Conceptually, the annotation involves two basic annotation primitives, spans and relations. Spans represent continuous portions of text with an assigned type, whilst relations are directed, typed, binary associations between two spans. Spans mark both the specific statements in text that play the roles of Cause and Effect in statements of causality, as well as expressions that explicitly state the existence of a causal relation.
A occurred. Thus, B happened.
On the other hand, relations identify connections between the various spans of text. The relation types identify the roles that the spans of text play in the association. The annotation involves three relation types: Effect, Cause and Evidence. Effect always marks the statement that is stated as the result, whilst Cause or Evidence mark the statement that leads to that result. All of these concepts are detailed below.
The sense type “Cause” is used when the two arguments of the relation are related causally and are not in a conditional relation. As previously mentioned, this definition is rather vague, so annotators must also use other methods in order to recognise causality. Thus, considering previous research [43, 44], they were asked to check for temporal assymetry and counterfactuality, try rewording and other linguistic tests.
Causality annotations are defined with reference to the following two discourse relation subtypes, in a similar manner to the BioDRB corpus. The relation subtype pair Reason/Result represents physical causality, whilst the other pair, Claim/Justification, represents causality within the discourse, rather than in the physical world it describes. Reason/Result holds when the situation described in one of the arguments is the cause of the situation described in the other argument. The other subtype, Claim/Justification, holds when the situation described by one of the arguments is the cause, not for the situation described by the other argument, but rather for the truth or validity of the proposition described by the argument.
All statements of causality falling within the scope of the annotation target should be marked. Consequently, any two possible spans that are not connected by causality annotations (implicitly) represent a “negative” example.
Argument and Trigger annotations should be created only as required for annotating associations between them, e.g., statements that are not part of any annotated association should not be marked. Thus, Argument and Trigger annotations are not exhaustive.
Statements of association other than those annotated as Causality are not in the scope of the annotation and are only defined for the reference of the annotators. Consequently, the primary purpose of permitting annotations other than Causality is to provide annotators a way to communicate the reason why a specific candidate pair was not marked as Causality. This annotation is entirely optional and does not need to be exhaustive. Thus, it has not been included in the final version of the corpus and its consistency is not considered in determining inter-annotator agreement.
All discourse relation types, as defined in BioDRB, are tentatively defined in the annotation tool as relation types. They are represented as relations directly associating Arguments and Triggers do not need to be marked to identify these associations. Some of these relation types, such as Background and Purpose, could be potential candidates for extending the scope of the Causality annotation.
Annotation software and format
The original event annotation of the BioNLP ID Shared Task corpus was performed using brat . This is a web-based annotation tool aimed at enhancing annotator productivity by simplifying and automating parts of the annotation process. Customising the settings of brat is reasonably straightforward, allowing users to change the information to be annotated and the way it is displayed. Furthermore, brat is freely available under the open-source MIT licence from its homepage (http://brat.nlplab.org). As such, we decided to continue to use this tool for our task of annotating causality relations in text.
This simple, yet highly efficient format allows for easy processing and full transformation into other formats (e.g., XML), thus increasing the portability between various annotations systems. Furthermore, since this schema is not very specific, it can be reused and easily applied to other datasets, not necessarily belonging to the biomedical domain. Moreover, being represented in an offset stand-off format, the schema can allow the existence of other annotations over the same source text without creating annotation conflicts, such as overlapping in XML. In this case, the text is already annotated with named entity and event information. Other types of annotation are allowed and can be successfully integrated (e.g., part-of-speech and dependency).
Annotators and training
Although it has been shown that linguists are able to identify certain aspects in biomedical texts reliably, such as negation and speculation , they could be overwhelmed in trying to understand the semantics. Identifying which events affect which events, especially when a causal trigger is not explicitly stated, is an extremely difficult task, as it requires vast, domain-specific background knowledge and an almost complete understanding of the topic. Therefore, due to the specificity of the biomedical domain, it is necessary for the annotators to be experts in this field of research. Furthermore, the annotators must have near-native competency in English.
For the purpose of this task, two human experts have been employed to create the annotations in the corpus. One of the annotators is the second author of this article.
Besides the biomedical expertise, the two selected annotators also have extensive experience in annotating text from the biomedical domain for text mining purposes. They have previously participated before in other annotation efforts focussing on creating gold standard corpora of named entities, events and meta-knowledge. The annotators undertook a period of training prior to commencing the annotation task proper. During this time, they were given a small set of documents to practice on. As a result, they became accustomed to both the annotation tool and the guidelines.
Both annotators were given the same subset of articles to annotate, independently of each other. This allowed the detection of annotation errors and disagreements between annotators. They produced annotations in small sets of documents, which were then analysed and in response to which the annotators obtained feedback detailing their errors. Also, the annotators offered feedback regarding the annotation tool and guidelines, in order to increase the speed of the process. This led to noticing potential problems with the guidelines, which were addressed accordingly. The final guidelines were produced after the training period finished and these were used for the actual annotation.
Evaluating inter-annotator agreement
Due to the complexity of the annotation task and the variety of types of spans and relations, inter-annotator agreement (IAA) cannot be computed using standard means. For instance, the Kappa statistic  cannot be used in our case, as this requires classifications to correspond to mutually exclusive and discrete categories. Instead, we have chosen to follow similar cases in selecting F-measure to calculate IAA [47, 66].
F-measure is usually used to combine the precision and recall in order to compare the performance of an information retrieval or extraction system against a gold standard. In our case, precision and recall can be computed by considering one set of annotations as the gold standard. The resulting F-score will be the same, regardless of which set is considered gold.
Because of the various angles of annotation, we have split the evaluation methodology into several subtasks of the annotation process. For each subtask, we calculated the inter-annotator agreement in terms of F-score. Initially, we computed the number of identical and overlapping triggers. For these triggers only, we then continued by counting the arguments, using both the exact match criterion and the relaxed match criterion introduced below. This is done separately for the Cause argument and for the Effectargument.
Trigger identification - how many causal associations have the same trigger. Two separate values are computed here:
‒ Exact match - trigger text spans match exactly.
‒ Relaxed match - trigger text spans overlap with each other, but do not necessarily match exactly.
Argument identification - for agreed triggers, how many have the same arguments. Four separate values are computed here, two for each argument:
‒ Exact match - argument text spans match exactly.
‒ Relaxed match - argument text spans overlap with each other, but do not necessarily match exactly.
Relation subtype assignment - for agreed arguments, how often do they have the same relation subtype.
Results and discussion
In this section, we firstly provide some key statistics regarding the causality annotation produced, together with a discussion of the characteristics of the corpus. Subsequently, we examine the explicit trigger phrases on which the causal relation is centred, followed by an analysis of causality arguments and the distribution of relation subtypes. Finally, we report on the inter-annotator agreement scores on the doubly annotated section of the corpus and investigate the disagreements between the two experts that were found in this part.
Corpus characteristics and statistics
The causality corpus is freely available under the Creative Commons Attribution Share-Alike Non-Commercial (CC BY-SA-NC) licence from the site of the National Centre for Text Mining (NaCTeM) (http://www.nactem.ac.uk/biocause). The corpus contains a total of 851 causal relation annotations spread over 19 open-access biomedical journal articles regarding infectious diseases.
No. of articles
No. of causal associations
No. of implicit associations
No. of unique explicit triggers
No. of unique lemmatised explicit triggers
Tokens per trigger
Tokens per Cause arg.
Tokens per Effect arg.
On the other hand, all tokens forming triggers were lemmatised prior to counting. This means that both suggest and suggests are counted for the same trigger type. There are 347 unique lemmatised triggers in the corpus, corresponding to an average usage of 2.30 times per trigger. Both count settings show the diversity of causality-triggering phrases that are used in the biomedical domain.
Furthermore, the causal argument of the relation is, on average, almost 1.32 times longer than the other argument, the effect. This is due to the specificity of the biomedical domain and also the nature of research articles, where usually a causal argument that leads to an effect is complex and is composed of several, concatenated causes. This is exemplified below.
Count (relative frequency)
these results suggest that
the results indicate that
these results indicate that
which suggests that
these data indicate that
these observations suggest that
Count (relative frequency)
these result suggest that
the results indicate that
these result indicate that
which suggest that
these observation suggest that
our finding indicate that
This acid pH-promoted increase appears to be specific to a subset of PhoP-activated genes that includes pmrD because expression of the PhoP-regulated slyA gene and the PhoP-independent corA gene was not affected by the pH of the medium.
Mlc is a global regulator of carbohydrate metabolism and controls several genes involved in sugar utilisation. Therefore Mlc also affects the virulence of Salmonella.
There was residual pbgP expression in the pmrB mutant induced with mild acid pH, which was in contrast to the absence of pbgP transcription in the pmrA mutant. This suggested that PmrA could become phosphorylated from another phosphodonor(s) when PmrB is not present.
In order to simplify the explanation we give below and avoid misunderstandings, we will use the following convention: the first argument will always refer to the Cause or Evidence argument of a relation, whereas the second argument will always correspond to the Effect.
Since HilD activates the transcription of hilA (14), which in turn can activate HilA-dependent invFA expression (10), and directly activates HilC/D-dependent invFD expression, these results establish that the mlc mutation exerts a negative effect on SPI1 gene expression, mainly by increasing the level of hilE expression.
Count (relative frequency)
There are no restrictions on how far the two arguments can be from each other in text. In other words, they may or may not be adjacent. Therefore, we have looked at the distance between the two arguments and show in Figure 14 the frequency of the various distances measured by the number of tokens. The average distance between the two arguments is of 13.5 tokens. It should be noted that this distance also includes the trigger if this is placed in between the two arguments. There are more than one hundred cases where the distance is two or three tokens (116 and 177, respectively). For the distance of four to six tokens, there are between 50 and 100 instances. It can be observed that the graph has a flat, yet long tail. There are almost 200 cases where the distance is greater than or equal to 10 tokens.
General IAA statistics
No. of causal associations
No. of Evidence arguments
No. of Cause arguments
No. of implicit triggers
No. of explicit triggers
No. of tokens per trigger
No. of tokens per Cause arg.
No. of tokens per Effect arg.
As can be observed from the table, there is little difference between the two annotators in terms of the different comparison criteria. The second annotator has identified 16 more causal associations than the first annotator. Nevertheless, the percentage of evidence arguments, cause arguments and implicit triggers remains rather stable over the two sets of annotations. This is also true with respect to the length in tokens of the triggers and the two arguments.
Subtask IAA statistics
Exact Cause arg.
Relaxed Cause arg.
Exact Effect arg.
Relaxed Effect arg.
Exact Cause arg.
Relaxed Cause arg.
Exact Effect arg.
Relaxed Effect arg.
The two annotators agreed only on two thirds of the total number of triggers using an exact match criterion. The agreement increases by a small amount when relaxed matching is used. This demonstrates that identifying causal discourse relations is a relatively difficult task, even for experienced human judges.
The agreement on argument spans, nevertheless, is extremely high. This strongly suggests that once the annotators decide to mark a causal relation, finding the arguments is a rather straightforward task to accomplish. The F-score for identifying the Cause argument with an exact match rule is just over 80%, whilst the Effect argument is around 94%. This is due to the difficulty in recognising the exact cause in a causal relation. When the relaxed matching is used, the F-score increases significantly, to 90% for the Cause argument and 98% for the Effect argument.
These agreement values are in line with similar semantic annotation efforts for which F-score has been computed. For instance, in the BioNLP ST ID task, the partial-match inter-annotator agreement for event annotation is approximately 75%. However, the arguments of these events have been already given as gold standard, therefore the task is significantly simpler than the one described in this article. Nevertheless, the best performing system participating in the shared task obtained an F-score of 56%.
After performing the double annotation and computing of the agreement scores, the disagreed cases were discussed between the annotators and the correct annotations were decided upon. Specifically, one of the two annotations was determined to be correct, an alteration was made or the annotation was removed completely. We also computed the agreement of each of the annotator with respect to the resulting gold standard corpus. In an exact-match setting, the F-score of each of the two annotators against the gold standard is 78.26% and 64.68%, respectively. Using a relaxed-match criterion, the F-scores increase to 86.17% and 87.73%, respectively.
We also looked at the differences between the two annotators. A number of these differences were simply annotation errors, where the selected spans contained extra characters from surrounding words or missed characters from the words on the boundaries. These have been corrected. The other differences relate to actual disagreements between the two annotators. Similarly to the subtasks on which we computed the agreement scores, the differences can be categorised in those relating to triggers or either of the two arguments.
Further bioinformatics analysis of the 89K island revealed a distinct two-component signal transduction system (TCSTS) encoded A n n1[A n n2[therein ]A n n2 appears to be ]A n n1 orthologous to the SalK/SalR system of S. salivarius, a salivaricin regulated TCSTS.
In all other cases, the triggers are either exactly agreed upon or completely distinct. The distinct triggers are not linguistically realised in a different manner than those which were agreed upon. The annotators simply did not agree on considering those cases as suggesting causality.
A n n1[Results of real-time quantitative RT-PCR also confirmed that, A n n2[in the complemented strain CDeltasalKR, only partial genes identified as down-regulated in the mutant rebounded to comparative transcript levels of the wild-type strain. ]A n n2]A n n1 Those unrecovered genes were probably irrelevant to the bacterial virulence of SS2.
The acid tolerance response of Salmonella results in A n n1[A n n2[the synthesis of over 50 acid shock proteins (Bearson et al., 1998) that are likely to function primarily when variations in internal pH occur ]A n n2, i.e. when Salmonella experiences severe acidic conditions (pH approximately 3). ]A n n1
In example (9), the Cause arguments chosen by the two annotators overlap. Whilst one annotator considered the entire first sentence as the Cause argument, the other expert did not include the first clause, related to the results. Thus, their argument was annotated as “in the complemented strain CDeltasalKR, only partial genes identified as down-regulated in the mutant rebounded to comparative transcript levels of the wild-type strain”. After discussions, the two annotators agreed to exclude the clause related to the results, as this is not necessary for the correct interpretation of the stated facts.
On the other hand, example (10) shows a case of overlapping Effect arguments. One annotator considered the effect to be “the synthesis of over 50 acid shock proteins (Bearson et al., 1998) that are likely to function primarily when variations in internal pH occur”. The other annotator, however, also included the span of text that further explains and describes the context, “i.e. when Salmonella experiences severe acidic conditions (pH approximately 3)”. The selected argument was the extended version annotated by the first annotator, mainly due to the fact that only the specification of the mentioned condition provides biologists with sufficient detail to correctly understand the biochemical processes that occur in the described situation.
Besides overlapping arguments, there are several cases of completely different arguments. More specifically, there are seven cases of disagreed Cause arguments and only one case of a disagreed Effect argument. As we mentioned above, identifying the Cause argument is a much more difficult task than that of identifying the Effect argument. Since this subtask depends on the background knowledge, expertise and interpretation of each annotator, they might have different biomedical points of view on how events connect to each other causally.
A n n1[In the animal model, attenuation of virulence has been noted for Salmonella strains that carry mutations in the pts, crr, cya or crp genes, which encode the general energy-coupling enzymes of the PTS, enzyme IIAGlc of the PTS, adenylate cyclase and cyclic AMP receptor protein, respectively. ]A n n1 A n n2[Mlc is a global regulator of carbohydrate metabolism and controls several genes involved in sugar utilization. ]A n n2 Therefore, it seemed possible that Mlc also affects the virulence of Salmonella.
This is due to the fact that Mlc is closely related functionally to the mentioned list of genes (pts, crr, cya and crp). On the one hand, the first sentence provides a more detailed explanation of the cause without mentioning Mlc, together with the observation of the attenuation of virulence. On the other hand, the second sentence mentions Mlc and the genes in general, but it is not linked to the virulence of Salmonella. Thus, the final decision in this case has the first sentence as the cause, since it includes the virulence of Salmonella and the genes that produce it.
Comparison to the BioDRB
Comparison between BioDRB and BioCause
No. of causal associations
No. of implicit triggers
No. of explicit triggers
No. of tokens per trigger
No. of tokens per Cause arg.
No. of tokens per Effect arg.
Considering these similarities and differences, we consider the BioCause and BioDRB corpora as complementing each other. Thus, a future combination of these two resources could prove useful for training a machine learning system capable of recognising causality.
We have designed and described an annotation scheme for biomedical causality. This scheme captures relevant information regarding causality as it is expressed in biomedical scientific articles, which is of key importance in many text mining tasks undertaken by biologists and biochemists. The scheme is designed to be portable, in order to allow integration with the various different schemes for named entity and event annotation that are currently in existence. Furthermore, the scheme is reusable and extensible, making it possible to apply it to different datasets and to extend it if necessary.
Moreover, we have produced BioCause, a gold standard corpus in which documents from existing bio-event corpora have been manually annotated according to our causality annotation scheme. The annotation was performed by two biomedical experts with extensive experience in producing resources for text mining purposes. We reported a high inter-annotator agreement rate, using both exact match and relaxed match evaluation criteria.
Finally, we have conducted an analysis of the nature of causality as it is expressed in biomedical research articles by examining the annotated corpus. We have described the characteristics of causal triggers and their arguments, looking at distributions of length, frequency and distance.
This corpus will serve as a useful resource for the development of automatic causality recognition systems in the biomedical domain.
This work was partially supported by the Engineering and Physical Sciences Research Council [grant number EP/P505631/1]. We would like to thank our external annotator, Dr. Syed Amir Iqbal (The University of Manchester), for their hard work and dedication to the task. Finally, we are grateful for the helpful comments of our anonymous reviewers.
- Ananiadou S, McNaught J(eds): Text Mining for Biology And Biomedicine. 2006, Boston, MA, USA: Artech House, Inc., [http://www.artechhouse.com/Detail.aspx?strBookId=1180]Google Scholar
- Cohen AM, Hersh WR: A survey of current work in biomedical text mining. Briefings in Bioinf. 2005, 6: 57-71. [http://bib.oxfordjournals.org/content/6/1/57.abstract]View ArticleGoogle Scholar
- Ananiadou S, Kell DB, Tsujii J: Text mining and its potential applications in systems biology. Trends in Biotechnol. 2006, 24 (12): 571-579.View ArticleGoogle Scholar
- Cohen KB, Hunter L: Getting Started in Text Mining. PLoS Comput Biol. 2008, 4: e20-[http://dx.plos.org/10.1371]PubMed CentralView ArticlePubMedGoogle Scholar
- Rzhetsky A, Iossifov I, Koike T, Krauthammer M, Kra P, Morris M, Yu H, Duboué AP, Weng W, Wilbur W, Hatzivassiloglou V, Friedman C: GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data. J Biomed Inf. 2004, 37: 43-53. [http://www.sciencedirect.com/science/article/pii/S1532046403001126]View ArticleGoogle Scholar
- Miyao Y, Ohta T, Masuda K, Tsuruoka Y, Yoshida K, Ninomiya T, Tsujii J: Semantic Retrieval for the Accurate Identification of Relational Concepts in Massive Textbases. Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. 2006, Sydney, Australia, 1017-1024.Google Scholar
- Zweigenbaum P, Demner-Fushman D, Yu H, Cohen KB: Frontiers of biomedical text mining: current progress. Briefings in Bioinformatics. 2007, 8 (5): 358-375.PubMed CentralView ArticlePubMedGoogle Scholar
- Fukuda K, Tsunoda T, Tamura A, Takagi T: Toward Information Extraction: Identifying protein names from biological papers. Proceedings of the Pacific Symposium on Biocomputing. 1998, Hawaii, USA, 707-718.Google Scholar
- Batista-Navarro RTB, Ananiadou S: Building a coreference-annotated corpus from the domain of biochemistry. Proceedings of BioNLP 2011 Workshop. 2011, Portland, OR, USA, 83-91. [http://aclweb.org/anthology-new/W/W11/W11-0210.pdf]Google Scholar
- Savova GK, Chapman WW, Zheng J, Crowley RS: Anaphoric relations in the clinical narrative: corpus creation. J Am Med Inf Assoc. 2011, 18 (4): 459-465. [http://jamia.bmj.com/content/18/4/459.abstract]View ArticleGoogle Scholar
- Miwa M, Sætre R, Miyao Y, Tsujii J: Protein-Protein Interaction Extraction by Leveraging Multiple Kernels and Parsers. Int J Med Inf. 2009, 78 (12): e39-e46.View ArticleGoogle Scholar
- Pyysalo S, Ohta T, Kim JD, Tsujii J: Static relations: a piece in the biomedical information extraction puzzle. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing. 2009, BioNLP ’09, Stroudsburg, PA, USA: Association for Computational Linguistics, 1-9.Google Scholar
- Miwa M, Sætre R, Kim JD, Tsujii J: Event Extraction with Complex Event Classification Using Rich Features. J Bioinf Comput Biol. 2010, 8: 131-146.View ArticleGoogle Scholar
- Miwa M, Thompson P, McNaught J, Kell DB, Ananiadou S: Extracting semantically enriched events from biomedical literature. BMC Bioinformatics. 2012, 13: 108-[http://www.biomedcentral.com/1471-2105/13/108/]PubMed CentralView ArticlePubMedGoogle Scholar
- Miwa M, Thompson P, Ananiadou S: Boosting automatic event extraction from the literature using domain adaptation and coreference resolution. Bioinformatics. 2012, 28 (13): 1759-1765. [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/bts237?ijkey=oTLQoB9dzQDyjzV%26keytype=ref]PubMed CentralView ArticlePubMedGoogle Scholar
- Kim JD, Ohta T, Pyysalo S, Kano Y, Tsujii J: Extracting Bio-Molecular Events From Literature-The BioNLP’09 Shared Task. Comput Intell. 2011, 27 (4): 513-540. 10.1111/j.1467-8640.2011.00398.x/abstract.View ArticleGoogle Scholar
- Kim JD, Pyysalo S, Ohta T, Bossy R, Nguyen N, Tsujii J: Overview of BioNLP Shared Task 2011. Proceedings of the BioNLP Shared Task 2011 Workshop. 2011, Portland, Oregon, USA: Association for Computational Linguistics, 1-6. [http://www.aclweb.org/anthology/W11-1801]Google Scholar
- Ling X, Jiang J, He X, Mei Q, Zhai C, Schatz B: Generating gene summaries from biomedical literature: A study of semi-structured summarization. Inf Process & Manage. 2007, 43 (6): 1777-1791. [http://www.sciencedirect.com/science/article/pii/S030645730700043X]View ArticleGoogle Scholar
- Shang Y, Li Y, Lin H, Yang Z: Enhancing Biomedical Text Summarization Using Semantic Relation Extraction. PLoS ONE. 2011, 6 (8): e23862-PubMed CentralView ArticlePubMedGoogle Scholar
- Yu H, Lee M, Kaufman D, Ely J, Osheroff JA, Hripcsak G, Cimino J: Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians. J Biomed Inf. 2007, 40 (3): 236-251. [http://www.sciencedirect.com/science/article/pii/S1532046407000202]View ArticleGoogle Scholar
- Abacha AB, Zweigenbaum P: Medical question answering: translating medical questions into sparql queries. Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium. 2012, Miami, FL, USA, 41-50.View ArticleGoogle Scholar
- Shatkay H, Chen N, Blostein D: Integrating image data into biomedical text categorization. Bioinformatics. 2006, 22 (14): e446-e453. [http://bioinformatics.oxfordjournals.org/content/22/14/e446.abstract]View ArticlePubMedGoogle Scholar
- Kontonatsios G, Korkontzelos I, Ananiadou S: Developing Multilingual Text Mining Workflows in UIMA and U-Compare. Proceedings of the 17th International conference on Applications of Natural Language Processing to Information Systems. Groningen, Netherlands, 2012-2012.
- Wang X, Thompson P, Ananiadou S: Biomedical Chinese-English CLIR Using an Extended CMeSH Resource to Expand Queries. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012). 2012, Istanbul, Turkey, 1148-1155. [http://www.lrec-conf.org/proceedings/lrec2012/pdf/316_Paper.pdf]Google Scholar
- Kano Y, Baumgartner W, McCrohon L, Ananiadou S, Cohen KB, Hunter L, Tsujii J: U-Compare: share and compare text mining tools with UIMA. Bioinformatics. 2009, 25 (15): 1997-1998.PubMed CentralView ArticlePubMedGoogle Scholar
- Rak R, Rowley A, Black W, Ananiadou S: Argo: an integrative, interactive, text mining-based workbench supporting curation. Database: J Biol Databases and Curation. 2012,, 2012. [http://database.oxfordjournals.org/content/2012/bas010.full?keytype=ref%26ijkey=i0zkIYyxcsdxhfN]Google Scholar
- Wilbur W, Rzhetsky A, Shatkay H: New directions in biomedical text annotation: definitions, guidelines and corpus construction. BMC Bioinformatics. 2006, 7: 1-10. 10.1186/1471-2105-7-356.View ArticleGoogle Scholar
- Thompson P, Nawaz R, McNaught J, Ananiadou S: Enriching a biomedical event corpus with meta-knowledge annotation. BMC Bioinformatics. 2011, 12: 393-[http://www.biomedcentral.com/1471-2105/12/393]PubMed CentralView ArticlePubMedGoogle Scholar
- Kilicoglu H, Bergler S: Recognizing speculative language in biomedical research articles: a linguistically motivated perspective. BMC Bioinformatics. 2008, 9 (Suppl 11): S10-[http://www.biomedcentral.com/1471-2105/9/S11/S10]PubMed CentralView ArticlePubMedGoogle Scholar
- Agarwal S, Yu H: Detecting hedge cues and their scope in biomedical text with conditional random fields. J Biomed Inf. 2010, 43 (6): 953-961.View ArticleGoogle Scholar
- Vincze V, Szarvas G, Farkas R, Mora G, Csirik J: The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics. 2008, 9 (Suppl 11): S9-PubMed CentralView ArticlePubMedGoogle Scholar
- Morante R, Sporleder C: Modality and Negation: An Introduction to the Special Issue. Comput Linguistics. 2012, 38 (2): 223-260. 10.1162/COLI_a_00095.View ArticleGoogle Scholar
- Agarwal S, Yu H: Biomedical negation scope detection with conditional random fields. JAMIA. 2010, 17 (6): 696-701.PubMed CentralPubMedGoogle Scholar
- Nawaz R, Thompson P, Ananiadou S: Identification of Manner in Bio-Events. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012). 2012, Istanbul, Turkey, 3505-3510. [http://www.lrec-conf.org/proceedings/lrec2012/pdf/818_Paper.pdf]Google Scholar
- Cohen KB, Johnson H, Verspoor K, Roeder C, Hunter L: The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinformatics. 2010, 11: 492-PubMed CentralView ArticlePubMedGoogle Scholar
- Nawaz R, Thompson P, Ananiadou S: Meta-Knowledge Annotation at the Event Level: Comparison between Abstracts and Full Papers. In. Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012). 2012, Istanbul, Turkey: European Language Resources Association, 24-31. [http://www.nactem.ac.uk/papers/Nawaz_BioTxtM_2012.pdf]Google Scholar
- Agarwal S, Yu H: Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion. Bioinformatics. 2009, 25 (23): 3174-3180.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim JD, Ohta T, Tsujii J: Corpus annotation for mining biomedical events from literature. BMC Bioinformatics. 2008, 9: 10-PubMed CentralView ArticlePubMedGoogle Scholar
- Tanabe L, Xie N, Thom L, Matten W, Wilbur WJ: GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinformatics. 2005, 6 (Suppl 1): S3-PubMed CentralView ArticlePubMedGoogle Scholar
- Girju R: Automatic detection of causal relations for Question Answering. Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12. 2003, MultiSumQA ’03, Stroudsburg, PA, USA: Association for Computational Linguistics, 76-83.View ArticleGoogle Scholar
- Blanco E, Castell N, Moldovan D: Causal Relation Extraction. Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08). Edited by: Calzolari N, Choukri K, Maegaard B, Mariani J, Odjik J, Piperidis S, Tapias D. 2008, Marrakech, Morocco: European Language Resources Association (ELRA), 310-313.Google Scholar
- Ríos Gaona, M A, Gelbukh A, Bandyopadhyay S: Recognizing Textual Entailment Using a Machine Learning Approach. Advances in Soft Computing, Volume 6438 of Lecture Notes in Computer Science. Edited by: Sidorov G, Hernández Aguirre A, Reyes García C. 2010, Germany, Berlin: Springer Berlin / Heidelberg, 177-185.Google Scholar
- Grivaz C: Human Judgements on Causation in French Texts. Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). Edited by: Calzolari N, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Rosner M, Tapias D. 2010, Valetta, Malta: European Language Resources Association, 2626-2631.Google Scholar
- Bethard S, Corvey W, Klingenstein S, Martin JH: Building a Corpus of Temporal-Causal Structure. Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08). Edited by: Calzolari N, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Tapias D. 2008, Marrakech, Morocco: European Language Resources Association, 908-915.Google Scholar
- Kleinberg S, Hripcsak G: A review of causal inference for biomedical informatics. J Biomed Inf. 2011, 44 (6): 1102-1112. [http://www.sciencedirect.com/science/article/pii/S1532046411001195]View ArticleGoogle Scholar
- Pyysalo S, Ginter F, Heimonen J, Bjorne J, Boberg J, Jarvinen J, Salakoski T: BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinformatics. 2007, 8: 50-PubMed CentralView ArticlePubMedGoogle Scholar
- Thompson P, Iqbal S, McNaught J, Ananiadou S: Construction of an annotated corpus to support biomedical information extraction. BMC Bioinformatics. 2009, 10: 349-PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. Nat Genet. 2000, 25: 25-29.PubMed CentralView ArticlePubMedGoogle Scholar
- Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R: The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucl Acids Res. 2004, 32 (suppl 1): D262—266-[http://nar.oxfordjournals.org/cgi/content/abstract/32/suppl_1/D262]Google Scholar
- Ghosh S, Matsuoka Y, Asai Y, Hsin KY, Kitano H: Software for systems biology: from tools to integrated platforms. Nat Rev Genet. 2011, 12 (12): 821-832.PubMedGoogle Scholar
- Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003, 19 (4): 524-531.View ArticlePubMedGoogle Scholar
- Funahashi A, Matsuoka Y, Jouraku A, Morohashi M, Kikuchi N, Kitano H: CellDesigner 3.5: a versatile modeling tool for biochemical networks. Proc IEEE. 2008, 96 (8): 1254-1265.View ArticleGoogle Scholar
- Mi H, Thomas P: PANTHER pathway: an ontology-based pathway database coupled with data analysis tools. Methods Mol Biol. 2009, 563: 123-140.View ArticlePubMedGoogle Scholar
- Ohta T, Pyysalo S, Tsujii J: From Pathways to Biomolecular Events: Opportunities and Challenges. Proceedings of BioNLP 2011 Workshop. 2011, Portland, Oregon, USA: Association for Computational Linguistics, 105-113.Google Scholar
- Ohta T, Pyysalo S, Tsujii J: Overview of the Epigenetics and Post-translational Modifications (EPI) task of BioNLP Shared Task 2011. Proceedings of BioNLP Shared Task 2011 Workshop. 2011, Stroudsburg, PA, USA: Association for Computational Linguistics, 16-25.Google Scholar
- Pyysalo S, Ohta T, Rak R, Sullivan D, Mao C, Wang C, Sobral B, Tsujii J, Ananiadou S: Overview of the Infectious Diseases (ID) task of BioNLP Shared Task 2011. Proceedings of the BioNLP Shared Task 2011 Workshop. 2011, Portland, Oregon, USA: Association for Computational Linguistics, 26-35. [http://www.aclweb.org/anthology/W11-1804]Google Scholar
- Prasad R, McRoy S, Frid N, Joshi A, Yu H: The Biomedical Discourse Relation Bank. BMC Bioinformatics. 2011, 12: 188-PubMed CentralView ArticlePubMedGoogle Scholar
- Prasad R, Dinesh N, Lee A, Miltsakaki E, Robaldo L, Joshi A, Webber B: The Penn Discourse TreeBank 2.0. In Proceedings of the 6th International Conference on language Resources and Evaluation (LREC). Edited by: Calzolari N, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Tapias D. 2008, Marrakech, Morocco, 2961-2968.Google Scholar
- Lippincott T, Seaghdha D, Korhonen A: Exploring subdomain variation in biomedical language. BMC Bioinformatics. 2011, 12: 212-PubMed CentralView ArticlePubMedGoogle Scholar
- Mihăilă C, Batista-Navarro RT: What’s in a Name? Entity Type Variation across Two Biomedical Subdomains. EACL. Edited by: Daelemans W, Lapata M. 2012, Màrquez L. The Association for Computer Linguistics: Avignon, France, 38-45.Google Scholar
- Mihăilă C, Batista-Navarro RT, Ananiadou S: Analysing Entity Type Variation across Biomedical Subdomains. Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012). Edited by: Ananiadou S, Cohen K, Demner-Fushman D, Thompson P. 2012, Istanbul, Turkey, 1-7.Google Scholar
- Schuemie MJ, Weeber M, Schijvenaars BJA, van Mulligen, van der Eijk, Jelier R, Mons B, Kors JA: Distribution of information in biomedical abstracts and full-text publications. Bioinformatics. 2004, 20 (16): 2597-2604. [http://bioinformatics.oxfordjournals.org/content/20/16/2597.abstract]View ArticlePubMedGoogle Scholar
- Shah P, Perez-Iratxeta C, Bork P, Andrade M: Information extraction from full text scientific articles: Where are the keywords?. BMC Bioinformatics. 2003, 4: 20-[http://www.biomedcentral.com/1471-2105/4/20]PubMed CentralView ArticlePubMedGoogle Scholar
- Stenetorp P, Pyysalo S, Topić G, Ohta T, Ananiadou S, Tsujii J: brat: a Web-based Tool for NLP-Assisted Text Annotation. Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics. 2012, Avignon, France, 102-107. [http://aclweb.org/anthology-new/E/E12/E12-2021.pdf]Google Scholar
- Fleiss JL: Statistical Methods for Rates and Proportions. 1981, New York: John Wiley & SonsGoogle Scholar
- Hripcsak G, Rothschild AS: Agreement, the F-Measure, and Reliability in Information Retrieval. J Am Med Inf Assoc. 2005, 12 (3): 296-298. [http://jamia.bmj.com/content/12/3/296.abstract]View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.