The biomedical discourse relation bank
© Prasad et al; licensee BioMed Central Ltd. 2011
Received: 14 October 2010
Accepted: 23 May 2011
Published: 23 May 2011
Identification of discourse relations, such as causal and contrastive relations, between situations mentioned in text is an important task for biomedical text-mining. A biomedical text corpus annotated with discourse relations would be very useful for developing and evaluating methods for biomedical discourse processing. However, little effort has been made to develop such an annotated resource.
We have developed the Biomedical Discourse Relation Bank (BioDRB), in which we have annotated explicit and implicit discourse relations in 24 open-access full-text biomedical articles from the GENIA corpus. Guidelines for the annotation were adapted from the Penn Discourse TreeBank (PDTB), which has discourse relations annotated over open-domain news articles. We introduced new conventions and modifications to the sense classification. We report reliable inter-annotator agreement of over 80% for all sub-tasks. Experiments for identifying the sense of explicit discourse connectives show the connective itself as a highly reliable indicator for coarse sense classification (accuracy 90.9% and F1 score 0.89). These results are comparable to results obtained with the same classifier on the PDTB data. With more refined sense classification, there is degradation in performance (accuracy 69.2% and F1 score 0.28), mainly due to sparsity in the data. The size of the corpus was found to be sufficient for identifying the sense of explicit connectives, with classifier performance stabilizing at about 1900 training instances. Finally, the classifier performs poorly when trained on PDTB and tested on BioDRB (accuracy 54.5% and F1 score 0.57).
Our work shows that discourse relations can be reliably annotated in biomedical text. Coarse sense disambiguation of explicit connectives can be done with high reliability by using just the connective as a feature, but more refined sense classification requires either richer features or more annotated data. The poor performance of a classifier trained in the open domain and tested in the biomedical domain suggests significant differences in the semantic usage of connectives across these domains, and provides robust evidence for a biomedical sublanguage for discourse and the need to develop a specialized biomedical discourse annotated corpus. The results of our cross-domain experiments are consistent with related work on identifying connectives in BioDRB.
Biomedical literature is a rich resource of biomedical knowledge. The desire to retrieve, organize, and extract biomedical knowledge from literature and then analyze the knowledge has boosted research in biomedical text mining. As described in recent reviews [1–4], the past 10 years have shown significant research developments in named entity recognition [5–7], relation extraction [8, 9], information retrieval [10, 11], hypothesis generation , summarization [13–16], multimedia [17–21], and question answering [22, 23]. Garzone and Mercer [24, 25] and Mercer and DiMarco  have explored how to connect a citing paper and the work cited. Light et al  have identified the use of speculative language in biomedical text. Wilbur et al. [28, 29] defined five qualitative dimensions (i.e., focus, polarity, certainty, evidence and directionality) for categorizing the intention of a sentence.
Looking at larger units of text, Mullen et al.  and Yu et al. [20, 31] defined discourse zones of biomedical text including introduction, method, result, and conclusion, and developed supervised machine-learning approaches to automatically classify a sentence into the rhetorical zone category. Biber and Jones  adapted unsupervised TextTiling methods  to segment biomedical text into different discourse units on the basis of lexical similarities among the units. "BioContrasts"  is an information extraction system that extracts contrastive information between proteins from texts on the basis of manually curated rules and regular expressions that focus on negation as an expression of contrast. Castano et al.  built a system for anaphora resolution in biomedical literature. Szarvas et al  annotated negation, speculation and scope in biomedical text. Agarwal and Yu [37, 38] have investigated the detection of hedges, negation, and their scopes in biomedical literature.
One important output of this research on biomedical text has been the creation of new annotated resources specific to the biomedical domain. For example, the GENIA corpus is a collection of biomedical literature, annotated with various levels of linguistic and semantic information, including coreference . The ART corpus [40, 41] contains sentence-wise annotations of scientific papers (covering topics in physical chemistry and biochemistry) with core scientific concepts (e.g. goal, hypothesis, experiment, method, result, conclusion, motivation, observation). These resources are valuable because they can be used to evaluate the effectiveness of text-mining methods developed for the biomedical domain. They can also be used to evaluate whether methods developed for the open domain can generalize to biomedical literature, which then determines whether new biomedical-specific training data needs to be created.
To date, there has been little work on processing or annotating discourse relations in biomedical text. A discourse is considered to be a coherent sequence of clauses, sentences or propositions. Discourse relations, such as causal, temporal, and contrastive relations, are relations between eventualities and propositions mentioned in a text, from which we can draw deep or complex inferences about the text. Often, discourse relations are realized in text by explicit words and phrases, called discourse connectives, but they can also be implicit.
Our studies suggest that MRL631 is not able to access intracellular γ-secretase for APP processing and APP traffoiking. However, it interacts with γ-secretase residing at the cell surface for Notch processing. From .
In both the rat otoconia and the Xenopus utricular (calcitic) otoconia, the presence of a major 90-to 100-kDa protein of unknown sequence has been reported . Therefore, calcitic otoconia probably contain a similar 90- to 100-kDa major protein, regardless of the species. In contrast, the Xenopus saccular (aragonitic) otoconia contain a major 22-kDa protein (otoconin-22) , which is a sPLA2-related 127-aa glycoprotein with two N-glycosylation sites. From .
Discourse relations can also be useful for categorizing citations and the relations between citations to enhance information retrieval: the connective in contrast in Example (2) signals a contrast relation between two cited articles, "3" and "5", mentioned in two different sentences.
The synovial membrane of rheumatoid arthritis (RA) is characterized by an infiltrate of a variety of inflammatory cells, such as lymphocytes, macrophages, and dendritic cells, together with proliferation of synovial fibroblast-like cells. Numerous cytokines are overproduced in the inflamed joint.
The challenge of processing discourse relations involves several subtasks, which have been tackled in the open (non-specialized) domain.
• Identifying discourse connectives. Many of the lexical items that can function as explicit connectives also have other non-connective functions [44, 45]. Thus, connectives need to be functionally disambiguated.
• Identifying the arguments of discourse connectives. In addition to identifying the connectives themselves, it is also important to accurately identify the two situations (called arguments) that the connectives relate, since they are not necessarily adjacent to each other [46–50]).
• Identifying the senses (i.e., semantics) of the relation. While detecting the senses of explicit connectives has met with a good degree of success [44, 51, 52], owing to the observation that explicit connectives are not very ambiguous, implicit relations, on the other hand, have proved to be much more challenging [53–58].
• Deriving Composite Discourse Structures. Once the elementary relation structures (i.e., a relation and its two arguments) have been identified, the task of combining these elementary structures into more complex structures has important ramifications for tasks such as summarization .
The largest effort at annotating discourse relations is the Penn Discourse Treebank, or the PDTB , which contains annotations of discourse relations on the open-domain Wall Street Journal corpus . To facilitate discourse processing research in the biomedical domain, we have adopted the PDTB framework to annotate discourse relations, their arguments, and their senses in biomedical literature. The corpus we have selected is a 24-article subset of the GENIA corpus , which is a collection of articles from the biomedical literature. It has been compiled and annotated within the scope of the GENIA project, and the 24 articles (with a total of approx. 112000 word tokens and approx. 5000 sentences) that form our Biomedical Discourse Relation Bank (BioDRB) have also been annotated for coreference relations and citation relations .
In this article, we describe our work towards the creation of the BioDRB. We show that the PDTB framework can be successfully adapted to the biomedical domain, and that discourse relations can be reliably annotated. We present classification experiments for sense disambiguation of explicit connectives, showing that the BioDRB sense classifier performs as well as the PDTB classifier. We also present experiments to show that the current size of the BioDRB corpus may be sufficient for this task. Finally, we explored whether NLP methods developed using the PDTB can be generalized to the biomedical domain. For the same task of explicit connective sense detection, we show that a classifier trained on the PDTB performs poorly on BioDRB. These results highlight the discourse-level differences between the open domain and the biomedical domain, and support the need for developing a specialized corpus of biomedical texts annotated with discourse relations. The results of our cross-domain experiments are consistent with our related work on identifying connectives in the BioDRB .
She hasn't played any music since the earthquake hit. (Temporal:Succession)
PDTB contains 100 distinct types of discourse connectives. Of the total 40,600 tokens in the corpus, 19053 are realized by explicit expressions, either connectives or alternative lexicalizations. Over the years, the PDTB research group has developed an effective set of discourse annotation tools, guidelines, work flows, and validation methodologies that we have used as a basis for our work.
The PTDB annotation framework has several important advantages over alternative approaches. First, the framework focuses on identifying individual relations and their arguments, which are important for text mining, while remaining neutral on the higher-level discourse organization. This is important because there is little agreement among researchers on the specification of the most descriptively adequate data structure for representing discourse . The structures proposed so far range from tree structures (e.g., Rhetorical Structure Theory (RST) , Linguistic Discourse Model (LDM) , and RST-based binary trees  to more complex forms that incorporate multiple inheritance (D-LTAG  and Segmented Discourse Representation Theory (SDRT) ), to full-fledged graphs (Discourse Graphbank ). The PDTB is, therefore, a particularly attractive framework since it aims to remain neutral with respect to higher-level discourse organization, and instead focuses on annotating the more local discourse relations. Higher-level structures in this approach are left to "emerge" from the annotations of low-level relations. Some recent investigations on the combinatorial possibilities of discourse relations in the PDTB suggests that directed acyclic graphs (DAGs), and not trees, may be the most appropriate structural representation for discourse [71, 72].
Second, discourse relations in the PDTB are lexically anchored, for both explicit and implicit connectives. In the latter case, annotators "insert" a connective expression to express the implicit relation, and then proceed to annotate the sense of the inserted connective. Such a lexically-grounded approach substantially increases the inter-annotator agreement , as confirmed in our pilot annotation study [74, 75].
Finally, since its release, the PDTB has been successfully used by many researchers for both linguistic and computational studies [44, 46–48, 50–52, 54–57, 71, 72, 76–84], which shows that there is much to be gained from adopting this approach. The PDTB framework has also been adopted for discourse annotation in other languages (e.g., Turkish , Hindi [86, 87], Chinese , Czech  and Italian ) as well as other domains such as conversational dialogues .
Results and Discussion
Biomedical Discourse Relation Bank: BioDRB
In the BioDRB, we have annotated all explicit and implicit discourse relations, the arguments of discourse relations, and the senses of discourse relations. In keeping with the theory-neutral approach of PDTB, we annotate only individual relations and do not attempt to show dependencies across relations. We have adapted the PDTB guidelines to better incorporate discourse-level features specific to biomedical texts. Here we present some salient aspects of the BioDRB annotation guidelines. Further details are provided in the complete documentation of the guidelines , available from http://spring.ims.uwm.edu/uploads/biodrb_guidelines.pdf
Discourse Relations and their Realization
Relations realized by Explicit discourse connectives,
Relations realized by alternatively lexicalized expressions (AltLex),
Absence of a discourse relation, or No Relation (NoRel).
Because RA PBMC include several cell types in addition to T cells, some inflammatory cytokines released from macrophages and other lymphocytes might have affected the production of IL-17 from T cells. (Cause:Reason)
IL-17 was also detected in the PBMC of patients with osteoarthritis, but their expression levels were much lower than those of RA PBMC. (Concession:Contra-expectation)
IL-17 production by activated RA PBMC is completely or partly blocked in the presence of the NF-κB inhibitor pyrrolidine dithiocarbamate and the PI3K/Akt inhibitor wortmannin and LY294002, respectively. However, inhibition of activator protein-1 and extracellular signal-regulated kinase 1/2 did not affect IL-17 production. (Contrast)
Recent observations demonstrated that IL-17 can also activate osteoclastic bone resorption by the induction of RANKL (receptor activator of nuclear factor κ B [NF- κ B] ligand), which is involved in bony erosion in RA . (Purpose:Enablement)
Annotation of an explicit connective proceeds by first identifying and marking the connective text span, then identifying and annotating the text spans associated with its two arguments, and finally, labeling the sense of the relation. Thus, for Example (5), the following information is annotated:
Relation type: Explicit
Connective span: "Because"
Arg1 span: "some inflammatory cytokines released from macrophages and other lymphocytes might have affected the production of IL-17 from T cells"
Arg2 span: "RA PBMC include several cell types in addition to T cells"
These data show that ITK is required for IL-2 production induced by SEB in vivo, and may regulate signals leading IL-2 production, in part by regulating phosphorylation of c-jun. The data also suggest that perturbing T cell activation pathways leading to IL-2 does not necessarily lead to improved responses to SEB toxicity. (Conjunction)
To determine whether CD123+ cells in synovial tissue were also nuclear RelB+, formalin-fixed tissue was double-stained for RelB and CD123 without hematoxylin counterstaining.
Expression of the brca1 mutant in a p21-null background caused little rescue of the cells in the thymus, but provided a recovery in the lymph nodes that was equivalent to that produced in the p53-null background. Implicit = On the other hand
Introduction of the brca1 gene in cells carrying an antiapoptotic Bcl2 transgene induced significant rescue of cells in the thymus, but produced little recovery of cells in peripheral (lymph node) compartments. (Contrast)
For implicit relations, it is crucial that the annotator does not perceive any "redundancy" in the expression of the relation after inserting the connective. A redundancy effect would instead lead to the annotation of the AltLex relation type, discussed next.
As shown in Figure 3a,3b, the intensity of IL-10R1 expression on CD4+ T cells was signicantly increased in RA patients compared with in healthy controls.
These results suggest that the intracellular signal transduction pathway of IL-10 may be impaired in CD4+ T cells of active RA. (Cause:Claim)
Syntactically, AltLex expressions are open class lexical items that cannot be defined as explicit connectives . In particular, while explicit connective expressions are fixed, or lexically invariant, AltLex expressions result from a more productive and compositional process. They often appear as subject-verb sequences (Example 12), although other syntactic patterns are found as well, such as prepositional phrases and verb phrases. Semantically, they are typically composed of two elements - one that denotes the relation, and the other that refers anaphorically to Arg1. In Example (12), the verb suggest denotes the relation, whereas the subject These results refers anaphorically to Arg1.
Background: CC Chemokine Receptor 3 (CCR3), the major chemokine receptor expressed on eosinophils, binds promiscuously to several ligands including eotaxins 1, 2, and 3. (...) It is therefore important to elucidate the molecular mechanisms regulating receptor expression. Implicit = NoRel Results: In order to define regions responsible for CCR3 transcription, a DNAse hypersensitive site was identified in the vicinity of exon 1.
The second kind of NoRel was annotated for typological errors that led, for example, to some sentences being duplicated in the article. Since we didn't want to admit a non-semantic repetition relation, these were annotated as NoRel. Such cases are rare in the corpus.
For NoRel, Arg1 and Arg2 are, by convention, the immediately adjacent and complete sentences.
Arguments of Discourse Relations
She was originally considered to be at high risk due to the familial occurrence of breast and other types of cancer, (Cause:Reason)
We show here that mice lacking ITK have much reduced IL-2 production and T cell expansion in response to SEB in vitro and in vivo. We also show that SEB induced the activation of the JNK MAPK pathway in responding T cells in vivo, and that ITK null T cells were defective in the activation of this pathway in vivo. However, toxicity analysis indicated that both WT and ITK null animals were similarly affected by SEB exposure. Our data suggest that ITK is required for full IL-2 secretion following SEB exposure, and that this may be due to the regulation of the JNK pathway by ITK in vivo. However, reducing T cell signals does not necessarily lead to better physiological responses to SEB exposure. (Restatement:Generalization)
The studies concerning the functional interaction between the NF-κ B pathway and members of the steroid hormone receptor family, and their role in synovial inflammation, have advanced significantly, although with controversial results [10, 11]. In particular, after binding with E2, oestrogen receptors have been shown to interact with NF-κ B factors, via transcriptional co-factors, resulting in mutual or non-mutual antagonism. Other studies hypothesize that, since oestrogen receptors may repress both constitutive and inducible NF-κ B, the overexpression of NF-κ B-inducible genes in oestrogen receptor-negative cells might contribute to malignant cell growth and chemotherapeutic resistance [12, 13]. On the contrary, further studies report that E2 blocks the transcriptional activity of p65 in macrophages . However, these opposite observations arise using different cell lines (human/animals) and culture conditions as well as different hormone concentrations . ...
Senses of Discourse Relations
BioDRB sense classification for discourse relations
For any relation, the sense annotation consists of selecting a sense subtype label whenever subtypes are available for a type. Thus, for the "Cause" sense, the annotator is required to select one of its four subtypes, i.e., the type level label cannot be chosen. Type-level labels can only be selected when the sense does not have any subtypes available, for example "Contrast". Refinements at the subtype level are of two kinds. One kind specifies refinements of the semantics, while the other kind specifies the directionality of the arguments. Thus, for example, the three subtypes of the "Condition" sense type specify in more detail the nature of the conditional dependence between Arg1 (antecedent) and Arg2 (consequence), by indicating whether the antecedent describes a hypothetical situation ("Hypothetical"), an assumed fact ("Factual"), or a non-fact ("Non-Factual"). On the other hand, the two subtypes of the "Concession" sense (in which one argument creates an expectation denied by the other argument) indicate the directionality of the concession: In the "Contra-expectation" subtype, Arg1 raises the expectation that Arg2 denies, while in the "Expectation" subtype, Arg2 raises the expectation that Arg1 denies.
Tumors detected by this new technology could have unique etiologies and/or presentations, and may represent an increasing proportion of clinical practice as new screening methods are validated and applied. (Temporal:Synchronous/Cause:Justification)
The BioDRB sense classification was adapted from the PDTB sense classification . Below, we first define the BioDRB senses, and then discuss the major differences with PDTB.
The sense type "Cause" is used when the two arguments of the relation are related causally and are not in a conditional relation. There are four subtypes for this sense. "Reason" and "Result" hold when the situation described in one of the arguments is the cause of the situation described in the other argument. They differ from each other only in the directionality of the causality. "Reason" is used when Arg2 is the cause and Arg1 the effect, while "Result" is used when Arg1 is the cause and Arg2 the effect. The other two subtypes, "Claim" and "Justification", hold when the situation described by one of the arguments is the cause, not for the situation described by the other argument, but rather for the truth or validity of the proposition described by the argument. The difference between the two is again in directionality, with "Claim" used when Arg1 presents the evidence for the truth of Arg2, and "Justification" used when Arg2 presents the evidence for the truth of Arg1.
The sense type "Condition" is used to describe all subtypes of conditional relations. There are three subtypes. The subtype "Hypothetical" holds when if Arg2 holds true, Arg1 is caused to hold at some instant in all possible futures. However, Arg1 can be true in the future independently of Arg2. The subtype "Factual" is a special case of the subtype "Hypothetical", and applies when Arg2 is a situation that has either been presented as a fact in the prior discourse or is believed by somebody other than the speaker/writer. The subtype "NonFactual" applies when Arg2 describes a condition that either does not hold at present or did not hold in the past. Arg1 then describes what would also hold if Arg2 were true. (There were no occurrences of the Non-Factual conditionals in the corpus.)
The sense type "Purpose" is used when one argument presents a situation and the other argument presents an action, and the engagement of the action enables the situation to occur. The two subtypes "Goal" and "Enablement" capture difference in directionality: "Goal" applies when Arg1 presents an action that enables the situation in Arg2 to obtain, whereas "Enablement" applies when Arg2 presents an action that enables the situation in Arg1 to obtain.
The sense type "Temporal" is used when the events described in the arguments are related temporally. There are three subtypes, which reflect the ordering of the arguments. "Precedence" is used when the Arg1 event precedes the Arg2 event; "Succession" applies when the Arg1 event follows the Arg2 event; and "Synchronous" applies when the Arg1 and Arg2 events overlap.
The sense type "Concession" applies when one of the arguments describes a situation A that creates an expectation for a situation C, while the other asserts (or implies) ¬C. Two "Concession" subtypes capture a difference in the roles of the arguments. "Expectation" is used when Arg2 creates an expectation that Arg1 denies, while "Contra-Expectation" is used when Arg1 creates an expectation that Arg2 denies.
The sense type "Contrast" is used when the values for some shared property in Arg1 and Arg2 are in opposition to each other. These oppositions need not be at opposite ends of a graded scale and can be context-dependent. There are no subtypes for this sense.
The sense type "Similarity" is like "Contrast" in that it involves the comparison of the values for some shared property of Arg1 and Arg2. The compared values in this case are similar to each other (and not in opposition).
The sense type "Alternative" is used when the two arguments denote alternative situations. There are three subtypes. The "Conjunctive" subtype is used when both alternatives hold or are possible. The "Disjunctive" subtype is used when two situations are evoked in the discourse but only one of them holds. The "Chosen Alternative" subtype is used when multiple alternatives are evoked in the discourse, and one argument asserts that one of the alternatives was chosen.
The sense type "Instantiation" is used when Arg1 evokes a set and Arg2 instantiates one or more elements of the set. What is evoked may be a set of events, a set of reasons, or a generic set of events, behaviors, attitudes, etc. There are no subtypes for this sense.
The sense type "Restatement" is used when the situation described by Arg2 restates the situation described by Arg1. The three subtypes "Specification", "Generalization", and "Equivalence" further specify the ways in which Arg2 restates Arg1. "Specification" applies when Arg2 describes the situation described in Arg1 in more detail. "Generalization" applies when Arg2 summarizes Arg1, or in some cases expresses a conclusion based on Arg1. "Equivalence" applies when Arg1 and Arg2 describe the same situation from different perspectives. (There are no occurrences of the "Equivalence" sense in the corpus.)
The sense type "Conjunction" is used when Arg1 and Arg2 are members of a list, defined in the prior discourse, explicitly or implicitly. No subtypes are defined for this sense.
The sense type "Exception" applies when Arg2 specifies an exception to the generalization specified by Arg1. In other words, Arg1 is false because Arg2 is true, but if Arg2 were false, Arg1 would be true. No subtypes are defined for this sense.
The sense type "Reinforcement" is used when Arg2 is provided as fact to support claims or effects associated with Arg1. No subtypes are defined for this sense.
The sense type "Continuation" applies when Arg1 expands the discourse by identifying an entity (concrete or abstract) in Arg1 and saying something about it. Crucially, for this relation, it must be the case that no other discourse relation holds. "Continuation" occurs frequently as an implicit relation, but it can also be associated with the explicit connective and.
The sense type "Circumstance" is used when one argument provides the circumstances under which the situation in the other argument was obtained. No causal relation is implied here. In BioDRB, this relation was introduced specifically to capture the circumstantial relation between an experimental set-up and the observations and results obtained from the experiments. Two subtypes capture difference in directionality. In "Backward Circumstance", Arg1 describes the circumstance and Arg2 describes the resulting situation. In "Forward Circumstance", Arg2 describes the circumstance and Arg1 describes the resulting situation.
The sense type "Background" is used when one argument provides information that is deemed necessary or desirable for interpreting the other argument. Two subtypes capture difference in directionality. In "Backward Background" Arg1 provides the background information for Arg2, while in "Forward Background", Arg2 provides the background information for Arg1. No further subtypes are specified for this sense.
The BioDRB sense classification reflects the following changes from the PDTB classification:
First, in the PDTB, the sense classification consists of three tiers, with four sense classes at the top tier. Three of the four class-level senses in the PDTB (namely, "Contingency", "Temporal", "Comparison", and "Expansion") are eliminated as we felt they were too broadly-defined to be useful. The only class-level sense we retained is "Temporal", but this has been reassigned as a type-level sense in the two-level BioDRB hierarchy.
Second, we have collapsed some of the subtype-level senses. For the "Condition" sense type, for example, we do not maintain the PDTB distinction between the subtypes "Present-Factual" and "Past-Factual", and label both as "Factual". A similar reduction is done for "Non-Factual".
Third, we have introduced some new senses, namely "Purpose", "Similarity", "Continuation", "Background", "Reinforcement". "Continuation" and "Background" are reformulations of the PDTB EntRel (Entity Relation) relation type, whereas "Purpose", "Similarity", and "Reinforcement" are senses that we believe were confounded with other senses in PDTB. For example, "Purpose" relations were annotated as "Result", "Similarity" relations were annotated as "Conjunction", and "Reinforcement" relations were annotated as either "Conjunction" or "Restatement".
Finally, we have eliminated the separate type-level representation of pragmatic senses and have instead listed them as subtypes. These apply to the current subtypes for "Cause", namely "Claim" and "Justification". We did not find instances of the other pragmatic senses listed in PDTB.
Grouping of BioDRB sense types into PDTB generalized classes
BioDRB Type-level Senses
PDTB Class-level Sense
Cause, Condition, Purpose
Alternative, Background, Circumstance, Conjunction, Continuation, Exception, Instantiation, Reinforcement, Restatement, Similarity
Summary of BioDRB Annotations
BioDRB distribution of relation types
No. of Tokens (%)
Distribution of senses in BioDRB.
Contextual ambiguity of explicit connectives
2: Cause, Conjunction
2: Concession, Contrast
6: Cause, Concession, Conjunction, Continuation, Purpose, Temporal
3: Cause, Purpose, Temporal
2: Circumstance, Temporal
2: Concession, Contrast
3: Cause, Purpose, Temporal
2: Conjunction, Temporal
2: Concession, Contrast
in part by
2: Cause, Purpose
2: Instantiation, Restatement
in response to
3: Cause, Circumstance, Temporal
3: Cause, Conjunction, Temporal
2: Circumstance, Purpose
2: Circumstance, Reinforcement
on the other hand
2: Concession, Contrast
2: Circumstance, Temporal
2: Conjunction, Temporal
2: Cause, Temporal
2: Cause, Restatement
2: Restatement, Temporal
2: Cause, Restatement
2: Cause, Restatement
2: Cirsumstance, Temporal
3: Circumstance, Condition, Temporal
4: Concession, Conjunction, Contrast, Temporal
2: Concession, Contrast
Column 2 provides the number and names of different senses associated with the connectives, while column 3 provides the total number of tokens for the connective. The total number of tokens for all these ambiguous connectives is 1328, which constitutes 50.4% (1328/2636) of the total number of explicit connective tokens.
Annotation Task Procedure
For the task of annotating discourse relations, each annotator was given an article and instructed to read the article from beginning to end while marking up relations. No pre-defined lists of connectives were provided to annotators, although the connective list from PDTB was provided as an example of what to look for. Annotators were strongly encouraged to identify additional connectives when they were observed. At a high-level, the annotation procedure is encapsulated as follows:
First determine if there is an explicit connective that relates the sentence to the prior context via a discourse relation. If so, mark this explicit connective, its arguments, and its sense(s). Label the relation type as Explicit.
If there is no explicit connective present to relate the sentence with the prior context, try to insert an implicit connective to express the inferred implicit relation, annotate its sense, and mark its arguments. In case the inferred relation is one of the senses of "Continuation", "Background", or "Circumstance", no connective can be inserted, so use the dummy label "NONE" in place of an implicit connective. Label the relation type as Implicit.
If insertion of an implicit connective leads to redundancy in the expression of the relation, identify and mark the AltLex expression that expresses the relation, annotate its sense, and mark its arguments. Label the relation type as AltLex.
If the sentence does not seem to relate coherently to any sentence in the prior text, label the relation type as NoRel, mark the current sentence as Arg2 and the previous sentence as Arg1.
After annotating the relation of the sentence with the previous context, identify and annotate any sentence-internal explicit connectives that have both their arguments in the same sentence.
While we believe that the scope of discourse relations captured in BioDRB is larger than that of the framework from which it was adapted, there are two types of relations that are currently not handled. We describe these below. The main reason for their exclusion is the challenge associated with their annotation. We plan to address these challenges in future extensions to the corpus.
First, we have not annotated implicit or AltLex relations between events and situations mentioned within a single sentence. For example, in the sentence "In particular, after binding with E2, oestrogen receptors have been shown to interact with NF-κ B factors, via transcriptional co-factors, resulting in mutual or non-mutual antagonism.", an Altlex "Result" relation can be inferred between the "interaction of oestrogen receptors with NF-κ B factors" and "mutual or non-mutual antagonism", anchored in the verb resulting. Such relations were excluded because it is challenging to identify the clausal boundary "sites" where they are inferred. Although the syntactic parse of a sentence can be used for this purpose, we did not have a sufficiently accurate sentence parser for our texts.
Second, coordinating conjunctions (e.g., and, or) that conjoin verb phrases in a sentence can potentially indicate discourse relations between two situations. What's more, the conjunction and can often express more than the sense of "Conjunction", including at least the "Temporal" and "Result" senses. For example, the cojunction and in the sentence "Thus SEB can interact directly with MHC class II molecules on APCs and activate T cells bearing the proper TcR Vβ chains." can be taken to express a conjunction of two independent situations, namely "SEB interacting with MHC class II molecules on APCs" and "SEB activating T cells bearing the proper TcR Vβ chains". In addition, either a causal, temporal or enablement relation might be inferred here. While such conjunctions appear often in the BioDRB, we decided to exclude them because it is difficult to distinguish them from conjunctions that don't have a discourse function.
Evaluation of Annotation Reliability
Each article was annotated by two annotators who were premed students at the University of Pennsylvania. The domain expertise of the annotators is crucial for allowing them to identify the correct sense of discourse connectives and to identify the existence of implicit relations. The annotators were extensively trained (by the first author) with regard to knowledge of linguistic syntax, semantics, and discourse, following which they were given a tutorial on the biomedical discourse annotation guidelines. The annotation was carried out over a period of three years, with annotators annotating at an average speed of 7 minutes per relation.
We computed agreement for connective identification, argument identification and sense labeling. Explicit and AltLex relations were treated separately from implicit relations.
For agreement on the identification of explicit connectives and AltLex expressions, we calculated the percentage of overlapping tokens identified by the annotators, since one annotator could have selected some connectives or AltLex's that the other did not. For example, if one annotator identified 20 connectives and the other identified 30 connectives, this could mean that there were 15 tokens that were common to both, and that there were 35 tokens some of which were identified by one annotator while the others were identified by the other annotator. The agreement was then reported as the percentage of common over common and uncommon tokens (i.e., 43% (15/35) for the artificial case illustrated above). We achieved 82% agreement. The major sources of mismatch were subordinators, which are harder to identify than conjunctions and adverbials, and AltLex's.
For agreement on argument spans, we used both the exact match criterion as well as the more relaxed partial match criterion . With the exact match criterion, annotators are taken to agree on an argument only when their respective selections are identical or fully overlapping, whereas the partial match criterion allows agreement even in the case of partial overlap. Argument agreement was computed only on the connectives where the annotators agreed. For Explicit and AltLex relations, we achieved an exact match of 88% and 81% on Arg2 and Arg1, respectively. This difference is understandable, since Arg1s are generally harder to identify than Arg2s. With partial match, we achieved an agreement of 93% and 86% for Arg2 and Arg1, respectively. Agreement on implicit relations was lower, at 88% and 75% for Arg2 and Arg1, respectively. The most likely reason for lower agreement for implicits is that non-adjacent arguments were allowed in the BioDRB, which makes the task of identifying the arguments harder.
Since sense guidelines allow an annotator to select multiple senses for a given connective, we took annotators to agree on sense labeling if at least one sense for a connective was the same across both annotators. Furthermore, since the sense labeling task involved classifying a given set of connectives into multiple nominal categories, namely 31 sense categories in total (see Table 1), we report the agreement by computing the kappa score. For explicit and AltLex relations, the kappa score was 0.71, with the observed agreement at 0.85 and the expected agreement at 0.48. For implicit relations, the kappa score was 0.63, with the observed agreement at 0.82 and the expected agreement at 0.52. The kappa scores for both explicit and implicit relations are therefore in the range generally accepted as substantial agreement.
Following the double-blind annotation and agreement calculations, the disagreements were adjudicated by an expert. We also made further reviews of the corpus to correct for any remaining guideline-related errors.
BioDRB Data, Tools and Representation
The source corpus over which the BioDRB has been annotated consists of 24 full-text articles from the GENIA corpus . The GENIA corpus is a collection of articles from the biomedical literature. It has been compiled and annotated within the scope of the GENIA project http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/.
The 24 GENIA articles were selected by the GENIA group in 2006 by searching the PubMed entries with two MeSH terms "Blood cells" and "Transcription factors". Among the returned entries, 24 articles were open-access that are considered representative of the scientific text style of this domain . This full-text data collection has been annotated with coreference (by the GENIA group) and citation relations , and therefore represents one of the most comprehensively annotated full-text biomedical corpora. Our annotation of discourse relations on this corpus will further enrich the data resource, and will assist future text mining applications.
Altogether, the articles have a total of 112483 words and 4911 sentences. Sentence counts were obtained with the UIUC sentence segmentation tool http://cogcomp.cs.illinois.edu/page/tools_view/2.
Annotation Tools and Representation
We used a recently released version of the discourse annotation tool, called "Annotator", distributed by the PDTB group. It is freely available from http://www.seas.upenn.edu/~pdtb/PDTBAPI, and differs from earlier versions primarily with respect to its simpler data representation. The tool allows for the annotation of relations, their arguments, as well as senses, all within the same interface.
Annotation fields in the BioDRB data representation
Relation type (Explicit, Implicit, AltLex, NoRel)
(Sets of) Span o sets for connective (when explicit)
Connective string "inserted" for Implicit relation
Sense1 of Explicit Connective (or Implicit Connective)
Sense2 of Explicit Connective (or Implicit Connective)
(Sets of) Span o sets for Arg1
(Sets of) Span o sets for Arg2
Implicit||||||||as a resultjCause.Result||||||3418..3655||||||3657..3714||||||
Sense Detection of Explicit Connectives
Predicting the sense of discourse relations is an important subtask of discourse parsing. Prior work on discourse relation sense detection has tackled the task of identifying the senses of explicit connectives separately from implicit relations. Sense prediction for explicit connectives in the open-domain PDTB has been shown to be an easy task, with most connectives being unambiguous [44, 52]. As a result, the connectives themselves serve as highly reliable predictors of their sense.
In this section, we describe our preliminary experiments for classifying the senses of explicit connectives in BioDRB. Similar to prior work with the PDTB, one of our goals here is to establish a baseline for this task by using just the (case-insensitive) connective text string as the predictive feature. We also carried out the same experiments with the PDTB data, in order to compare the results across the two domains, as well as to explore how well a classifier trained on the open-domain PDTB data generalizes to the domain-specific data of the BioDRB (described in the next section). For all experiments, we used SLIPPER , a learning system that generates rulesets based on confidence-rated boosting.
To effectively compare BioDRB and PDTB, we need to group the BioDRB sense types into the 4 generalized classes in the PDTB (Table 2), and perform 4-way classification for these generalized senses. The main reason for designing the comparative study at the class-level instead of the type-level is that sense annotation in the PDTB follows a " flexible" approach, wherein annotators are allowed to back-o to the most general class-level in the hierarchical classification. As a result, many connectives in PDTB are labeled with only class-level senses, which makes their comparison difficult with the type-level senses in BioDRB.
Since explicit connectives can have up to two senses (see Table 4), we allowed for three scenarios. In the first scenario, only the first sense of a connective was considered, yielding a total of 2636 sense instances. In the second scenario, only the second sense was considered. There are 195 such instances (7.4%) in the BioDRB. Selecting the second sense also yielded a total of 2636 sense instances. Finally, in the third scenario, we allowed for both senses to be selected, so that the data set consists of new sense instances for the 195 multiple-sense connectives. This yielded a total of 2831 (2636+195) sense instances. Our hypothesis was that the third scenario increases sense ambiguity in the data, and that the classifier performance should therefore decrease.
For the PDTB experiments, we used the same data set used in other previous work, and considered the same three scenarios described above for connectives with two senses. Of the 18459 explicit connectives in PDTB, 999 (5.4%) appear with two senses.
Ten-fold cross validation accuracies for explicit connective sense classification in BioDRB and PDTB.
In all remaining experiments here, we use the data from the first sense scenario, for which the classifier performs best. Macro average F1 score for both corpora was 0.91.
Explicit sense classification in BioDRB: Class-wise Precision, Recall and F1.
Explicit sense classification in PDTB: Class-wise Precision, Recall and F1.
Next, we considered whether the size of the BioDRB corpus is sufficient for sense detection. Given that the accuracy of the BioDRB classi er is at the same level as that trained on the more than 8 times larger PDTB, this suggests that the BioDRB corpus size may be sufficient for this task. We tested our conjecture by partitioning the data into a training set (2360 instances) and test set (276 instances), and incrementally increasing the size of the training examples, in order to see if the classifier performance stabilizes as the training size reaches the maximum, n = 2360. We used 8 increments (236 examples in each increment), using the same test set of 276 examples with each incremented training set. The results show that the peformance of the classifier improves up to n = 1888, achieving an accuracy of 90.6%, but further increments up to n = 2360 do not significantly improve the performance. We therefore conclude that the size of the BioDRB corpus is sufficient for the task of explicit connective sense identification. Furthermore, these results are consistent with our related work on connective identification in BioDRB , where we show that the performance of the classifier becomes stable when the training size reaches over 5000 words.
Finally, since the BioDRB sense classification was designed to provide more refined and, therefore, more informative sense distinctions, we performed classification with the 15 type-level senses for explicit connectives. (Note that the 16th sense, "Background", does not appear for explicit connectives.)
The majority class (the "Purpose" sense) baseline accuracy for the type-level senses was 23.5%. Again, we performed a ten-fold cross-validation on the full data set of 2636 connectives, considering only the first sense of the connective where multiple senses were provided. Not surprisingly, the accuracy of the classifier for more refined classification is lower, at 69.2%, although still significantly higher than the baseline. The macro average F1 score was 0.28, mainly because many senses are too sparse for rules to be learned reliably. Examination of class-wise scores shows that rules were reliably learned for three senses - "Temporal" (F1 score 0.94), "Conjunction" (F1 score 0.97), "Cause" (F1 score 0.81) - all of which have more than 300 instances each in the corpus (see Table 4). While these results suggest that we may need more annotated training data for reliable refined sense classification, our immediate goal is to first explore the use of richer features for the classifier. We conjecture that for more refined sense classification, the connective is not sufficient as the sole predictive variable.
Lessons to be Learned from a New Domain
A natural question that arises in the context of our work is whether it is necessary to develop an independently annotated biomedical corpus of discourse relations, instead of using tools that have already been developed for the open domain. In this section, we present two studies showing that developing an independent domain-specific corpus is indeed beneficial. Our conclusions are consistent with sublanguage theories[96–98] for technical domains such as the biomedical domain.
Cross-domain sense classification: Class-wise Precision, Recall and F1.
Second, given that texts from the biomedical literature are typically segmented into the rhetorical categories of Introduction, Methods, Results and Discussion (IMRAD) [99–102], we explored whether discourse relations within each of these segments exhibit regular patterns.
Sense distributions in IMRAD segments
0 0 (0.0%)
It is revealing to see that the Methods segments contain "Temporal" relations more frequently than the other segments, since these segments describe the various steps of experiments that have been conducted. The segments from Methods also have negligible "Concession" relations, suggesting that these sections lack reasoning or argumentation. Indeed, "Contrast" and "Concession" relations are found more frequently in the Results and Discussion segments, where comparisons are made with related work, and arguments are made about the presented work. Also frequent in the Discussion section are "Causal", "Instantiation", and "Reinforcement" relations, since authors give justifications, reasons, and, in general, reinforcing arguments for their experiments and conclusions. There is a high proportion of "Circumstance" relations in the Results section, where outcomes of experiments are presented. "Background" relations are, curiously, not more frequent in the Abstract and Introduction sections, as one would expect, but rather in the Result and Discussion section. Overall, these senses show several useful patterns in the distribution of senses across the different IMRAD segments, suggesting that biomedical literature contains a highly domain-specific distribution of relations that can benefit text-mining applications. In future work, we plan to explore the feasibility of using the IMRAD segment type as a feature for classifying the senses of explicit connectives.
We have developed the Biomedical Discourse Relation Bank (BioDRB), which contains discourse-level annotations of explicit and implicit discourse relations and their abstract object arguments, and the senses of discourse relations. Starting with the Penn Discourse Treebank (PDTB) as the underlying discourse annotation framework because of its theory-neutral and lexically grounded approach, we have successfully adapted the PDTB annotation guidelines for the biomedical discourse annotation, while introducing some features specific to, and necessary for, the biomedical domain. We have also carried out experiments on sense detection of explicit connectives. Our results show that using the connective as the only feature for the classification creates a very high baseline for the task, as in the open domain. At the same time, there are significant differences in the semantic usage of connectives across the two domains, since a sense classifier trained on the PDTB data does not generalize to the BioDRB. Together with similar results that we have obtained in our related work on identifying explicit connectives, we conclude that it is beneficial to take a "sublanguage" approach for discourse processing of biomedical literature, and develop an independent biomedical corpus of discourse annotations. Finally, we have also found that while the size of the BioDRB corpus is sufficient for coarse-sense classification, more training data might be needed for more refined sense classification, although future research should first explore the use of richer features. One such additional feature may be the IMRAD segments of these articles, which show some useful patterns of sense distributions.
Availability and Requirements
Project name: Biomedical Discourse Relation Bank Project
Project home page: http://www.biodiscourserelation.org
Operating system(s): Platform independent
Programming language: None
Other requirements: Java 1.5 or higher (for annotation tools)
Any restrictions to use by non-academics: None
This work was partially supported by a seed grant from University of Wisconsin-Milwaukee Graduate School to Hong Yu, and NSF grant IIS-07-05671 (PIs: Aravind Joshi, Rashmi Prasad). We thank Geraud Campion for tool support. We are grateful to the anonymous reviewers for their helpful and insightful comments.
- Jensen L, Saric J, Bork P: Literature mining for the biologist: from information retrieval to biological discovery. Nature Reviews Genetics 2006, 7: 119–129. 10.1038/nrg1768PubMedGoogle Scholar
- Krallinger M, Valencia A: Text-mining and information-retrieval services for molecular biology. Genome Biol 2005, 6: 224. 10.1186/gb-2005-6-7-224PubMed CentralPubMedGoogle Scholar
- Shatkay H, Feldman R: Mining the biomedical literature in the genomic era: an overview. J Comput Biol 2003, 10: 821–855. 10.1089/106652703322756104PubMedGoogle Scholar
- Zweigenbaum P, Demner-Fushman D, Yu H, Cohen KB: Frontiers of biomedical text mining: current progress. Briefings in Bioinformatics 2007, 8: 358–375. 10.1093/bib/bbm045PubMed CentralPubMedGoogle Scholar
- Fukuda K, Tamura A, Tsunoda T, Takagi T: Toward information extraction: identifying protein names from biological papers. Proceedings of the Pacific Symposium on Biocomputing 1998, 707–718.Google Scholar
- McDonald R, Pereira F: Identifying gene and protein mentions in text using conditional random fields. BMC Bioinformatics 2005, 6(Suppl 1):S6. 10.1186/1471-2105-6-S1-S6PubMed CentralPubMedGoogle Scholar
- Liu J, Huang M, Zhu X: Recognizing Biomedical Named Entities Using Skip-Chain Conditional Random Fields. Proceedings of the Workshop on Biomedical Natural Language Processing, Uppsala, Sweden 2010, 10–18.Google Scholar
- Friedman C, Kra P, Yu H, Krauthammer M, Rzhetsky A: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 2001, 17(Suppl 1):S74–82. 10.1093/bioinformatics/17.suppl_1.S74PubMedGoogle Scholar
- Li Z, Liu F, Antieau L, Yu H: Lancet: a high precision medication event extraction system for clinical text. Journal of the American Medical Informatics Association (JAMIA) 2010, 17(5):563–567. 10.1136/jamia.2010.004077Google Scholar
- Wilbur WJ: A thematic analysis of the AIDS literature. Proceedings of Pacific Symposium on Biocomputing 2002, 386–397.Google Scholar
- Cao Y, Li Z, Liu F, Agarwal S, Zhang Q, Yu H: An IR-aided machine learning framework for the BioCreative II.5 Challenge. IEEE/ACM Transactions on Computational Biololgy and Bioinformatics 2010, 7(3):454–461.Google Scholar
- Srinivasan P, Libbus B: Mining MEDLINE for implicit links between dietary substances and diseases. Bioinformatics 2004, 20(Suppl 1):I290-I296. 10.1093/bioinformatics/bth914PubMedGoogle Scholar
- Ling X, Jiang J, He X, Mei Q, Zhai C, Schatz B: Automatically generating gene summaries from biomedical literature. Proceedings of the Pacific Symposium on Biocomputing, Maui, Hawaii 2006, 40–51.Google Scholar
- Agarwal S, Yu H: FigSum: automatically generating structured text summaries for figures in biomedical literature. Proceedings of the 2009 AMIA Annual Symposium, San Francisco, CA 2009, 6–10.Google Scholar
- Naderi N, Witte R: Ontology-Based Extraction and Summarization of Protein Mutation Impact Information. Proceedings of the ACL Workshop on Biomedical Natural Language Processing, Uppsala, Sweden 2010, 128–129.Google Scholar
- Plaza L, Stevenson M, Diaz A: Improving Summarization of Biomedical Documents Using Word Sense Disambiguation. Proceedings of the ACL Workshop on Biomedical Natural Language Processing, Uppsala, Sweden 2010, 55–63.Google Scholar
- Chen SC, Zhao T, Gordon GJ, Murphy RF: Automated image analysis of protein localization in budding yeast. Bioinformatics 2007, 23(13):i66–171. 10.1093/bioinformatics/btm206PubMedGoogle Scholar
- Shatkay H, Chen N, Blostein D: Integrating image data into biomedical text categorization. Bioinformatics 2006, 22: e446–453. 10.1093/bioinformatics/btl235PubMedGoogle Scholar
- Yu H, Lee M: Accessing bioscience images from abstract sentences. Bioinformatics 2006, 22: e547–556. 10.1093/bioinformatics/btl261PubMedGoogle Scholar
- Yu H, Agarwal S, Johnston M, Cohen A: Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension. Journal of Biomedical Discovery and Collaboration 2009, 4: 1. 10.1186/1747-5333-4-1PubMed CentralPubMedGoogle Scholar
- Yu H, Liu F, Ramesh BP: Automatic Figure Ranking and User Interfacing for Intelligent Figure Search. PLoS ONE 2010, 5(10):e12983. 10.1371/journal.pone.0012983PubMed CentralPubMedGoogle Scholar
- Yu H, Lee M, Kaufman D, Ely J, Oshero JA, Hripcsak G, Cimino J: Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians. Journal of Biomedical Informatics 2007, 40: 236–251. 10.1016/j.jbi.2007.03.002PubMedGoogle Scholar
- Cao YG, Cimino JJ, Ely J, Yu H: Automatically extracting information needs from complex clinical questions. Journal of Biomedical Informatics 2010, 43: 962–971. 10.1016/j.jbi.2010.07.007PubMed CentralPubMedGoogle Scholar
- Garzone M: Automated classification of citations using linguistic semantic grammars. PhD thesis. The University of Western Ontario, Ontario, Canada; 1996.Google Scholar
- Garzone M, Mercer R: Towards an automated citation classifier. Proceedings on 13th Biennial Conference of the Canadian Society for Computational Studies of Intelligence 2000, 337–346.Google Scholar
- DiMarco C, Mercer R: Toward a catalogue of citation-related rhetorical cues in scientific texts. Proceedings of Pacific Association for Computational Linguistics (PACLING 2003), Halifax, Canada 2003.Google Scholar
- Light M, Qiu X, Srinivasan P: The language of bioscience: fact, speculations, and statements in between. Proceedings of the HLT-NAACL 2004 Workshop: BioLINK, Linking Biological Literature, Ontologies and Databases, Boston, MA 2004, 17–24.Google Scholar
- Shatkay H, Pan F, Rzhetsky A, Wilbur WJ: Multi-Dimensional Classification Of Biomedical Text: Toward Automated, Practical Provision of High-Utility Text to Diverse Users. Bioinformatics 2008, 24(18):2086–2093. 10.1093/bioinformatics/btn381PubMed CentralPubMedGoogle Scholar
- Wilbur WJ, Rzhetsky A, Shatkay H: New directions in biomedical text annotation: definitions, guidelines and corpus construction. BMC Bioinformatics 2006, 7: 356. 10.1186/1471-2105-7-356PubMed CentralPubMedGoogle Scholar
- Mullen T, Mizuta Y, Collier N: A baseline feature set for learning rhetorical zones using full articles in the biomedical domain. ACM SIGKDD Explorations Newsletter 2005, 7: 52–58. 10.1145/1089815.1089823Google Scholar
- Agarwal S, Yu H: Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion. Bioinformatics 2009, 25(23):3174–3180. 10.1093/bioinformatics/btp548PubMed CentralPubMedGoogle Scholar
- Biber D, Jones JK: Merging corpus linguistic and discourse analytic research goals: Discourse units in biology research articles. Corpus Linguistics and Linguistic Theory 2005, 1(2):151–182.Google Scholar
- Hearst MA: TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics 1997, 23: 33–64.Google Scholar
- jae Kim J, Zhang Z, Park JC, Ng SK: BioContrasts: extracting and exploiting protein-protein contrastive relations from biomedical literature. Bioinformatics 2006, 22(5):597–605. 10.1093/bioinformatics/btk016Google Scholar
- Castano J, Zhang J, Pustejovsky J: Anaphora resolution in biomedical literature. International Symposium on Reference Resolution 2002.Google Scholar
- Szarvas G, Vincze V, Farkas R, Csirik J: The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts. Proceedings of BioNLP 2008: Current Trends in Biomedical Natural Language Processing, Columbus, Ohio 2008, 38–45.Google Scholar
- Agarwal S, Yu H: Detecting Hedge Cues and their Scope in Biomedical Literature with Conditional Random Fields. Journal of Biomedical Informatics 2010, 43(6):953–961. 10.1016/j.jbi.2010.08.003PubMed CentralPubMedGoogle Scholar
- Agarwal S, Yu H: Biomedical Negation Scope Detection with Conditional Random Fields. Journal of the Americian Medical Informatics Association (JAMIA) 2010, 17: 696–701. 10.1136/jamia.2010.003228Google Scholar
- Kim J, Ohta T, Tateisi Y, Tsujii J: GENIA corpus - semantically annotated corpus for bio-textmining. Bioinformatics 2003, 19(Suppl 1):i180–182. 10.1093/bioinformatics/btg1023PubMedGoogle Scholar
- Liakata M, Soldatova L: Guidelines for the annotation of General Scientific Concepts.2008. [http://ie-repository.jisc.ac.uk] [JISC Project Report]Google Scholar
- Liakata M, Q C, Soldatova LN: Semantic Annotation of Papers: Interface & Enrichment Tool (SAPIENT). Proceedings of the BioNLP 2009 Workshop, Boulder, Colorado: Association for Computational Linguistics 2009, 193–200. [http://www.aclweb.org/anthology/W09–1325]Google Scholar
- Tarassishin L, Yin YI, Bassit B, Li YM: Processing of Notch and amyloid precursor protein by gamma-secretase is spatially distinct. Proceedings of the National Academy of Sciences USA 2004, 101(49):17050–17055. 10.1073/pnas.0408007101Google Scholar
- Verpy E, Leibovici M, Petit C: Characterization of otoconin-95, the major protein of murine otoconia, provides insights into the formation of these inner ear biominerals. Proceedings of the National Academy of Sciences USA 1999, 96(2):529–534. 10.1073/pnas.96.2.529Google Scholar
- Pitler E, Nenkova A: Using Syntax to Disambiguate Explicit Discourse Connectives in Text. Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP (ACL-IJCNLP 2009: Short Papers), Suntec, Singapore 2009, 13–16.Google Scholar
- Ramesh BP, Yu H: Identifying Discourse Connectives in Biomedical Text. Proceedings of the AMIA 2010 Symposium, Washington, D.C 2010, 657–661.Google Scholar
- Dinesh N, Lee A, Miltsakaki E, Prasad R, Joshi A, Webber B: Attribution and the (Non)-Alignment of Syntactic and Discourse Arguments of Connectives. Proceedings of the ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, Ann Arbor, MI 2005, 29–36.Google Scholar
- Wellner B, Pustejovsky J: Automatically Identifying the Arguments of Discourse Connectives. Proceedings of EMNLP-CoNLL, Prague, Czech Republic 2007, 92–101.Google Scholar
- Elwell R, Baldridge J: Discourse connective argument identification with connective specific rankers. Proceedings of the IEEE International Conference on Semantic Computing (ICSC), Santa Clara, CA 2008, 198–205.Google Scholar
- Prasad R, Dinesh N, Lee A, Miltsakaki E, Robaldo L, Joshi A, Webber B: The Penn Discourse TreeBank 2.0. Proceedings of 6th International Conference on Language Resources and Evaluation (LREC), Marrackech, Morocco 2008.Google Scholar
- Prasad R, Joshi A, Webber B: Exploiting Scope for Shallow Discourse Parsing. Proceedings of the Seventh International Conference on Language Resources and their Evaluation (LREC), Valletta, Malta 2010, 2076–2083.Google Scholar
- Miltsakaki E, Dinesh N, Prasad R, Joshi A, Webber B: Experiments on Sense Annotation and Sense Disambiguation of Discourse Connectives. Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT), Barcelona, Spain 2005.Google Scholar
- Pitler E, Raghupathy M, Mehta H, Nenkova A, Lee A, Joshi A: Easily Identifiable Discourse Relations. Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008: Posters), Manchester, U.K 2008, 87–90.Google Scholar
- Marcu D, Echihabi A: An Unsupervised Approach to Recognizing Discourse Relations. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA 2002, 368–375.Google Scholar
- Lin Z, Kan MY, Ng HT: Recognizing Implicit Discourse Relations in the Penn Discourse Treebank. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Suntec, Singapore 2009, 343–351.Google Scholar
- Pitler E, Louis A, Nenkova A: Automatic sense prediction for implicit discourse relations in text. Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, Suntec, Singapore 2009, 683–691.Google Scholar
- Wellner B: Sequence Models and Re-ranking Methods for Discourse Parsing. PhD thesis, Brandeis University, Boston, MA 2009.Google Scholar
- Zhi-Min Z, Man L, Yu X, Zheng-Yu N, Jian S: Predicting Discourse Connectives for Implicit Discourse Relation Recognition. Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010: Posters), Beijing, China 2010, 1507–1514.Google Scholar
- Louis A, Joshi A, Prasad R, Nenkova A: Using Entity Features to Classify Implicit Discourse Relations. Proceedings of the SIGDIAL Conference, Tokyo, Japan 2010, 59–62.Google Scholar
- Marcu D: The rhetorical parsing, summarization and generation of natural language texts. PhD thesis, University of Toronto 1997.Google Scholar
- Marcus MP, Santorini B, Marcinkiewicz MA: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 1993, 19(2):313–330.Google Scholar
- Agarwal S, Choubey L, Yu H: Automatically Classifying the Role of Citations in Biomedical Articles. Proceedings of American Medical Informatics Association Fall Symposium (AMIA), Washington, D.C 2010, 11–15.Google Scholar
- Webber B, Joshi A: Anchoring a Lexicalized Tree-Adjoining Grammar for Discourse. In Discourse Relations and Discourse Markers: Proceedings of the Conference. Edited by: Stede M, Wanner L, Hovy E. Somerset, New Jersey: Association for Computational Linguistics; 1998:86–92.Google Scholar
- Webber B, Joshi A, Stone M, Knott A: Anaphora and Discourse Structure. Computational Linguistics 2003, 29(4):545–587. 10.1162/089120103322753347Google Scholar
- Asher N: Reference to Abstract Objects. Dordrecht: Kluwer; 1993.Google Scholar
- Knott A: Review of 'coherence in natural language: data structures and applications'. Computational Linguistics 2007, 33: 591–595. 10.1162/coli.2007.33.4.591Google Scholar
- Mann W, Thompson S: Rhetorical Structure Theory. Toward a Functional Theory of Text Organization. Text 1988, 8(3):243–281.Google Scholar
- Polanyi L: The Linguistic Discourse Model: Towards a Formal Theory of Discourse Structure. Tech. Rep. 6409, Bolt Beranek and Newman, Inc., Cambridge, Mass; 1987.Google Scholar
- Clegg A, Shepherd A: Evaluating and integrating treebank parsers on a biomedical corpus. Proceedings of the Workshop on Software, Ann Arbor, Michigan 2005, 14–33.Google Scholar
- Asher N, Lascarides A: Logics of conversation. Cambridge University Press; 2003.Google Scholar
- Wolf F, Gibson E: Representing Discourse Coherence: A corpus-based study. Computational Linguistics 2005, 31(2):249–288. 10.1162/0891201054223977Google Scholar
- Lee A, Prasad R, Joshi A, Dinesh N, Webber B: Complexity of Dependencies in Discourse: Are Dependencies in Discourse More Complex Than in Syntax? Proceedings of the 5th International Workshop on Treebanks and Linguistic Theories (TLT), Prague, Czech Republic 2006.Google Scholar
- Lee A, Prasad R, Joshi A, Webber B: Departures from Tree Structures in Discourse: Shared Arguments in the Penn Discourse Treebank. Proceedings of the Constraints in Discourse III Workshop, Potsdam, Germany 2008.Google Scholar
- Miltsakaki E, Prasad R, Joshi A, Webber B: Annotating discourse connectives and their arguments. Proceedings of the HLT/NAACL Workshop on Frontiers in Corpus Annotation, Boston, MA 2004, 9–16.Google Scholar
- Yu H, Frid N, McRoy S, Prasad R, Lee A, Joshi A: A Pilot Annotation to Investigate Discourse Connectivity in Biomedical Text. Proceedings of the ACL:HLT 2008 BioNLP Workshop, Columbus, Ohio 2008, 92–93.Google Scholar
- Yu H, Frid N, McRoy S, Simpson P, Prasad R, Lee A, Joshi A: Exploring Discourse Connectivity in Biomedical Text for Text Mining. Proceedings of the 16th Annual International Conference on Intelligent Systems for Molecular Biology BioLINK SIG Meeting, Toronto, Canada 2008.Google Scholar
- Blair-Goldensohn S, McKeown KR, Rambow O: Building and Refining Rhetorical-Semantic Relation Models. Proceedings of NAACL-HLT, Rochester, NY 2007, 428–435.Google Scholar
- Webber B, Prasad R: Sentence-Initial Discourse Connectives, Discourse Structure and Semantics. Proceedings of the Workshop on Formal and Experimental Approaches to Discourse Particles and Modal Adverbs, Hamburg, Germany 2008.Google Scholar
- Webber B: Genre distinctions for discourse in the Penn TreeBank. Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, Suntec, Singapore 2009, 674–682.Google Scholar
- Prasad R, Joshi A: A Discourse-based Approach to Generating Why-Questions from Texts. Proceedings of the Workshop on the Question Generation Shared Task and Evaluation Challenge, Arlington, VA 2008.Google Scholar
- Robaldo L, Miltsakaki E, Hobbs J: Refining the Meaning of Sense Labels in PDTB: "Concession". Proceedings of Symposium on Semantics in Text Processing (STEP), Venice, Italy 2008, 207–219.Google Scholar
- Prasad R, Joshi A, Webber B: Realization of Discourse Relations by Other Means: Alternative Lexicalizations. Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010: Posters), Beijing, China 2010, 1023–1031.Google Scholar
- Hernault H, Bollegala D, Ishizuka M: A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2010), Cambridge, MA 2010, 399–409.Google Scholar
- Louis A, Joshi A, Nenkova A: Discourse Indicators for Content Selection in Summarization. Proceedings of the SIGDIAL Conference, Tokyo, Japan 2010, 147–156.Google Scholar
- Lin Z, Ng HT, Kan MY: A PDTB-Styled End-to-End Discourse Parser. Tech. Rep. TRB8/10, School of Computing, National University of Singapore 2010.Google Scholar
- Zeyrek D, Webber B: A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus. Proceedings of the 6th Workshop on Asian Language Resources, Hyderabad, India 2008, 65–71.Google Scholar
- Oza U, Prasad R, Kolachina S, Sharma DM, Joshi A: The Hindi Discourse Relation Bank. Proceedings of the Third Linguistic Annotation Workshop (LAW-III), ACL-IJCNLP-2009, Suntec, Singapore 2009, 158–161.Google Scholar
- Oza U, Prasad R, Kolachina S, Meena S, Sharma DM, Joshi A: Experiments with Annotating Discourse Relations in the Hindi Discourse Relation Bank. Proceedings of the 7th International Conference on Natural Language Processing (ICON-2009), Hyderabad, India 2009.Google Scholar
- Xue N: Annotating Discourse Connectives in the Chinese Treebank. Proceedings of the ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, Ann Arbor, MI 2005, 84–91.Google Scholar
- Mladova L, Zikanova Sarka, Hajicova E: From Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank. Proceedings of the Sixth International Language Resources and Evaluation (LREC'08) 2008.Google Scholar
- Tonelli S, Riccardi G, Prasad R, Joshi A: Annotation of Discourse Relations for Conversational Spoken Dialogs. Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta 2010, 2084–2090.Google Scholar
- Prasad R, Mcroy S, Frid N, Yu H: The Biomedical Discourse Relation Bank (BioDRB) Annotation Guidelines.2010. [Http://spring.ims.uwm.edu/uploads/biodrb_guidelines.pdf]Google Scholar
- Karttunen L: Presupposition and Linguistic Context. Theoretical Linguistics 1974, 1: 181–94. 10.1515/thli.1974.1.1-3.181Google Scholar
- Miltsakaki E, Robaldo L, Lee A, Joshi A: Sense Annotation in the Penn Discourse Treebank. Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science 2008, 4919: 275–286. 10.1007/978-3-540-78135-6_23Google Scholar
- Verspoor K, Cohen KB, Hunter L: The textual characteristics of traditional and Open Access scientific journals are similar. BMC Bioinformatics 2009, 10: 183. 10.1186/1471-2105-10-183PubMed CentralPubMedGoogle Scholar
- Cohen WW, Singer Y: A simple, fast, and effective rule learner. Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence (AAAI '99/IAAI '99), Orlando, FL 1999, 335–342.Google Scholar
- Harris Z: A Grammar of English on mathematical principles. New York: Wiley; 1982.Google Scholar
- Harris Z: A theory of language and information: a mathematical approach. Oxford: Clarendon Press; 1991.Google Scholar
- Friedman C, Kra P, Rzhetsky A: Two biomedical sublanguages: A description based on the theories of Zellig. Journal of Biomedical Informatics 2002, 35(4):222–235. 10.1016/S1532-0464(03)00012-1PubMedGoogle Scholar
- Gabbay I, Sutcliffe R: A qualitative comparison of scientific and journalistic texts from the perspective of extracting definitions. Proceedings of the ACL Workshop on Question Answering in Retricted Domains, Barcelona, Spain 2004, 16–22.Google Scholar
- Salanger-Meyer F: Discoursal movements in medical English abstracts and their linguistic exponents: A genre analysis study. INTERFACE: Journal of Applied Linguistics 1990, 4(2):107–124.Google Scholar
- Swales J: Genre Analysis: English in Academic and Research Settings. Cambridge, England: Cambridge University Press; 1990.Google Scholar
- Sollaci LB, Pereira MG: The introduction, methods, results, and discussion (IMRAD) structure: a fifty-year survey. Journal of the Medical Library Association 2004, 92(3):364–371.PubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.