Biomedical relation extraction via knowledge-enhanced reading comprehension

Background In biomedical research, chemical and disease relation extraction from unstructured biomedical literature is an essential task. Effective context understanding and knowledge integration are two main research problems in this task. Most work of relation extraction focuses on classification for entity mention pairs. Inspired by the effectiveness of machine reading comprehension (RC) in the respect of context understanding, solving biomedical relation extraction with the RC framework at both intra-sentential and inter-sentential levels is a new topic worthy to be explored. Except for the unstructured biomedical text, many structured knowledge bases (KBs) provide valuable guidance for biomedical relation extraction. Utilizing knowledge in the RC framework is also worthy to be investigated. We propose a knowledge-enhanced reading comprehension (KRC) framework to leverage reading comprehension and prior knowledge for biomedical relation extraction. First, we generate questions for each relation, which reformulates the relation extraction task to a question answering task. Second, based on the RC framework, we integrate knowledge representation through an efficient knowledge-enhanced attention interaction mechanism to guide the biomedical relation extraction. Results The proposed model was evaluated on the BioCreative V CDR dataset and CHR dataset. Experiments show that our model achieved a competitive document-level F1 of 71.18% and 93.3%, respectively, compared with other methods. Conclusion Result analysis reveals that open-domain reading comprehension data and knowledge representation can help improve biomedical relation extraction in our proposed KRC framework. Our work can encourage more research on bridging reading comprehension and biomedical relation extraction and promote the biomedical relation extraction.

representations into a standard CNN-based relation extraction model. Verga et al. [3] forms pairwise predictions over entire abstracts using a self-attention encoder. Zheng et al. [4] uses CNN and LSTM to learn the document semantic formation and integrated knowledge representation. Li et al. [5] utilizes recurrent piecewise convolutional neural networks integrating knowledge features. Sahu et al. [2] proposes to build a labeled edge graph convolutional neural network on a document to capture local and non-local context dependency information for inter-sentence biomedical relation extraction. Zhou et al. [1] proposes a knowledge-guided convolution network to leverage prior knowledge representation on the SDP sequence for CID extraction.
Machine reading comprehension (MRC) aims to answer a query according to its corresponding contexts, one of which is to extract answer spans from contexts. The task is formulated as a multi-classification task to classify the start index and the end index of the answer over its contexts. Inspired by the performance and the comprehension ability, MRC has been a trend to solve other natural language processing (NLP) tasks. Levy et al. [12] reduces the zero-shot relation to the problem of answering simple reading comprehension questions to potentially extract facts of new types that were neither specified nor observed a priori. Li et al. [13] casts the entity-relation task as a multi-turn question answering problem and identifies the answer spans from the context. Li et al. [14] proposes to formulate the flat and nested named entity recognition problems as a machine reading comprehension task instead of a sequence labeling task. Additionally, tasks such as summarization, machine translation and so on are framed as question answering by making task specifications to take the form of a question, a context and an answer [15].
Motivated by the capability of context understanding on documents, we regard biomedical relation extraction as a reading comprehension problem. We utilize a question formulated by the chemical and relation description to query the context for diseases or chemicals, hence acquiring the relation between chemical and disease entities or the relation between chemical entities. In this paper, we are interested in handling biomedical relation extraction with the reading comprehension framework based on the efficient pretrained language model (LM), effectively integrating knowledge with context together and distinguishing different knowledge in this framework. Hence, we propose a knowledge-enhanced RC (KRC) framework for biomedical relation extraction, which integrates knowledge by effective two-step attention layers. The proposed method was evaluated on the BioCreative V CDR dataset and the CHR dataset respectively. Experiments show that our proposed model achieved competitive performance on both datasets compared with other state-of-the-art methods. Our contributions are as follows: • To the best of our knowledge, this paper first proposes a novel reading comprehension (RC) framework to address the biomedical relation extraction from the literature. Our work may encourage more research on bridging MRC and biomedical relation extraction so as to take advantage of MRC. • To make full use of the pretrained language model (LM) and knowledge representation, this paper proposes a knowledge-enhanced RC model based on pretrained LMs to improve biomedical relation extraction.
• Through experiments, we demonstrate the effectiveness of using open-domain reading comprehension data and knowledge information in our proposed RC framework for biomedical relation extraction. We show that our method can achieve competitive performance on two document-level datasets.

Problem formulation
Given a context sequence C = (w 1 c , w 2 c , . . . , w n c ) and two entities c ) in the context, relation extraction(RE) aims to clarify the relation r between e 1 and e 2 , where r ∈ R is selected from a predefined relation list R. For the chemical-induced disease (CID) relation extraction task, the relation r is 'induce' . Here, we reduce the relation extraction task as a reading comprehension task with unanswerable answers. We transform the annotated RE data (Context, Entity e 1 , Entity e 2 , relation r) to the RC data (Context, Query(e 1 , r), Answer e 2 ) . Given the context sequence C = (w 1 c , w 2 c , . . . , w n c ) , the entity e 1 = (w c ) from the context by answering a query Q = (w 1 q , w 2 q , . . . , w m q ) constructed by e 1 and relation r description. s e2 and e e2 respectively denote the start index and the end index, where s e2 ∈ [1, n] , e e2 ∈ [1, n] and s e2 ≤ e e2 .
Given context C and question Q, our method either returns an answer span or indicates that there is no answer.

Methods
We adopt a competitive pretrained language model BERT [16] as our backbone that deals with SQuAD [17] and suits the condition of no answer. Our model consists of three major layers: (1) BERT encoder layer; (2) knowledge-enhanced attention layer; (3) prediction layer. Details are described as follows.

BERT encoder layer
To be in line with BERT, given the context sequence C = (w 1 c , w 2 c , . . . , w n c ) and the query sequence Q = (w 1 q , w 2 q , . . . , w m q ) , the input is formulated as a sequence indicates the start token of Q and [SEP] separates Q and C. Then, the word sequence input is tokenized to token sequence s = [s i ] k i=1 concatenating with their position embedding and segment embedding. Denote the BERT encoder which consists of L stacks of transformers as BERT (·) as follows: for the token sequence obtained from BERT is h = BERT (s).

Knowledge-enhanced attention layer
To obtain the knowledge-enhanced context representation, this layer is designed to integrate knowledge representation with the context representation of BERT. Here, we describe the details of this layer based on CID relation extraction. In the (1) knowledge base, the same entity pair in different documents may contain different relation types. This layer shows how to integrate noisy knowledge representation into the context representation simply and effectively. It takes the BERT hidden representation h and the knowledge representation r as inputs, and outputs the knowledgeenhanced representation h ′ .
To integrate prior knowledge representation, we first extract chemical-disease triples from the Comparative Toxicogenomics Database(CTD) [18] and employ TransE [19] to learn knowledge representation. Following [1], we extract (chemical, disease, relation) triples from both the CDR corpus and the CTD knowledge base. In the CTD base, there are three types of relations, including 'marker/mechanism' , 'therapeutic' and 'inferredassociation' , where only 'marker/mechanism' indicates the CID relation. For those pairs in CDR but not in CTD, we set their relations to a specific symbol 'null'. Thus, there are four types of relations among all the triples and we finally obtain 2577184 triples for knowledge representation learning. Then, all the generated triples are regarded as correct examples to learn lower-dimension chemical representation, disease representation and relation representation r t by TransE, where r t ∈ R d 2 , and d 2 denotes the representation dimension. Here, the chemical, disease and relation representations are initialized randomly with the normal distribution for training. It is worth noting that there may be more than one relation type between an entity pair (Fig. 2). Then, the probable relation representation r = [r t ] 3 t=1 between candidate entities of each instance is introduced into the RC model to provide evidence. Here, we take two-step attention to combine the knowledge information and context information. First, we adopt an attention mechanism to select the most relevant KB relation representation for hidden representation of each token. A bilinear [20] operation is employed to calculate the attention weights between hidden representation h i ∈ R d 1 and relation representation r t ∈ R d 2 , where W 1 ∈ R d 1 ×d 2 and b 1 ∈ R d 2 are trainable parameters.
Then, each relation representation r t is aligned to each hidden state h i . Here, k i is regarded as the weighted relation representation corresponding to each token.
Second, we adopt a knowledge-context attention mechanism between the token's knowledge representation k i on each position of the token sequence and the hidden representation h j . A bilinear operation is employed between k i and h j to achieve weights on the hidden representation, while W 2 ∈ R d 2 ×d 1 and b 2 ∈ R d 1 are parameters.
Finally, the hidden representation h of tokens is aligned to the weighted knowledge representation k and weighted to each position i. Here, we denote the output after our twostep attention as Here, h ′ i is the context representation enhanced with knowledge representation.

Prediction layer
To obtain the final representation for prediction, the hidden representation h i and the knowledge enhanced representation h ′ i are first combined with a linear operation to achieve the weighted representation  [21] activation is applied to the knowledge attention result, which works in some existing work. Finally, the output is applied to predict the start and end indexes of answers. For the situation where there is a null answer, its start and end indexes are both zero for the optimization of the objective function. Actually, the index of zero indicates the start token '[CLS]' . It is not the real text in the context and does not influence the optimization of the model for the indexes of non-null answers.
Here, W 4 , b 4 , W 3 and b 3 are trainable parameters. Along the sequence dimension, the start probability distribution and the end probability distribution for each token s i are calculated as: After answer prediction, a predicted disease text or a null answer can be achieved. If the predicted disease text matches the gold disease name, a CID relation will be detected between the disease and the chemical which is described in its corresponding question.
After relation extraction on intra-sentential and inter-sentential data, two sets of prediction results are merged. Since all the candidate instances with respect to mention pairs are extracted, we judge that an entity pair has a CID relation as long as at least one instance was detected in which the CID relation exists. Since several documents may have no candidate CID relations after data preprocessing, similar to many other systems, we take the following heuristic rules to find the likely CID pairs in them: All chemicals in the title are associated with all diseases in the abstract.

Objective function
To predict the start and end index of answer spans, the optimization objective is to maximize the conditional probability p(y s , y e |s) of start index y s and end index y e on the given input sequence s . The loss is defined as the average of log probabilities of the ground truth start and end position based on the predicted distributions. N is the number of examples. The answer span index by (i, j) with maximum p s i p e j is chosen as the answer span.

Dataset
We evaluated our model on two document-level biomedical relation extraction datasets, including the BioCreative V CDR dataset and the CHR dataset. Table 1 shows the overall statistics of the two datasets.
The BioCreative V CDR dataset 1 was derived from the Comparative Toxicogenomics Database (CTD) for CID relation extraction. The position and MeSH IDs of chemicals and diseases are annotated by experts. It contains 1500 titles and abstracts of PubMed articles, where the training, development and test sets each consist of 500 abstracts. Following [1], we combine the training set and development set together as a set due to the limited number of CDR examples, 80% of which is used as training and 20% of which is used as validation.
Experiments are also conducted on the CHR dataset [2]. It was created by distant supervision and is a document-level dataset with relations between chemicals. It contains 12,094 titles and abstracts of PubMed articles, 7298, 3158 and 9578 each for training, development and test datasets.
The experimental results are evaluated by comparing the set of annotated chemicaldisease relations in the document with the set of predicted chemical-disease through precision (P), recall (R) and F1-measure (F1).

Data preprocessing
To transform data to RC format instances, data was preprocessed as follows.
Instance Construction We extracted entity pair candidate instances from the original data, including intra-sentential and inter-sentential instances. For intra-sentential instances, all the entity pairs existing in the same sentence are extracted. For the inter-sentential instances, we follow some of the rules [1] to extract candidates: (1) In a document, all intra-sentential chemical-disease instances will not be considered as inter-sentential instances. (2) The sentence distance of all the inter-sentential instances will not be more than 3. Thus, the chemical-disease entity candidate pairs and their corresponding contexts are extracted. To be in line with the RC model, we will remove (mask) the other disease mentions except for the disease in the current pairs when multiple diseases occur in the context.
Hypernym Filtering According to the annotation guideline of the CID task, it is to extract the most specific chemical-disease pair. Following [1], we remove the instances containing hyper entities that have more specific entities in the document according to the entity index in the Medical Subject Headings (MeSH) [22]. Some of positive chemical-disease entity pairs may be filtered by this strategy and are treated as false negative instances.
Query Construction After extracting the chemical-disease candidate instances, we format a natural language query combining entity e 1 mention and relation r description, here r is the chemical-induced disease relation. Taking the candidate instance (flunitrazepan, pain) in S 7 in Fig. 1 as an example, we formulate a query "what disease does flunitrazepan induce" to ask the context expecting the answer is pain. Also, we adopt another strategy to format a pseudo query for comparison with the natural language query. We concatenate the entity e 1 , the relation r description and the type of entity e 2 to construct the pseudo query.
On the CHR dataset [2], all the chemical-chemical entity pairs and their full titles and abstracts are extracted for instance construction. After extracting the chemical candidate instances, queries were also constructed with the entity e 1 mention and relation r description. Here, r is the chemical reaction relation description.

Implementation details
We trained our knowledge on TransE 2 with 1000 epochs and the relation embedding size was set to 256. For the CDR dataset, we tuned the hyperparameters on the new development set to optimize our proposed model. We use the uncased BioBERT(base) as the pretrained language model. We set the batch size to 12, 32 respectively on the CDR dataset and the CHR dataset. The learning rates of the bert encoder and the downstream structure are set to 3e-5 and 1e-4 on the experiments without KBs, while their learning rates are set to 2e-5 and 3e-5 on the experiments with KBs.

Compared models
To evaluate our approach, we compared the proposed model with the existing relevant models. As shown in Table 2, the comparison models for the CDR dataset were divided into two categories: methods with knowledge bases (KBs) and methods without knowledge bases. Each category includes traditional machine learning (ML) based methods and neural network (NN) based methods. Here, we mainly introduce the NN-based methods.
• CNN+SDP [6] proposed using CNN to learns features on the shortest dependency path between a disease and a chemical for CID relation extraction. • BRAN (Transformer) [3] utilized an efficient transformer to encode abstracts and form pairwise predictions using a bi-affine operation to score all pairs of mentions and aggregating over mention pairs. • Bio-Seq (LSTM+CRF) [23] proposed a sequence labeling framework for biomedical relation extraction and extended it with multiple specified feature extractors, especially for inter-sentential level relation extraction. • LSTM+CNN [4] utilized LSTM and CNN to hierarchically extract from documents integrating relation knowledge of CTD. • RPCNN (PCNN) [5] proposed using PCNN and RNN to extract document-level representations integrating the knowledge features of four medical KBs for CID relation classification. • KCN (GCNN) [1] combined the shortest dependency path (SDP) sequence and knowledge representation for CID relation classification. It adopted the gated convolutional network (GCNN) with attention pooling combining entity and relation knowledge representations. • Ours Different from other models, we propose a new RC framework for biomedical relation extraction and utilize the pretrained LM combined with the knowledge of relation representation between the possible chemical and disease. It is worth noting that our model can automatically distinguish different types of relation knowledge from CTD.

Performance comparison with previous methods
We compare our proposed model with previous work respectively on the CDR and CHR datasets. For the CDR dataset, we divide previous models into two categories: models without knowledge bases and models with knowledge bases. Here, the compared models are rich and diverse, such as heuristic rules, joint training with NER, relation classification, sequence labeling and so on.
Among the systems without KBs, much work is based on the neural networks (NNs). As shown in Table 2, the graph kernel-based SVM is competitive among the traditional ML methods, but most of NN-based methods outperform the traditional ML methods which indicates the NN's more effective ability of context modeling. They almost use entity pair relation classification methods except that Bio-Seq [23] adopts sequence labeling to deal with the CID extraction task. Under the condition of no extra knowledge, our RC-based model outperforms the previous work and achieves an improvement of 0.19%. Bio-Seq [23] is a sequence labeling framework adopting LSTM and CRF. Compared with Bio-Seq [23], there is an improvement of 2.57%. CNN+SDP [6] is a relation classification method. It extracted the CID relation with CNN over the shortest dependency path (SDP) of context to deal with the long sequence and achieved an F1 score of 65.88%. Our method transforms the relation classification to reading comprehension, no matter on intra-sentential data or intersentential data. Different from CNN+SDP [6], our model does not need to transform the context sequence into SDP and just uses the pretrained LMs to extract context information directly from the context sequences. We conduct pointer prediction over the context sequences instead of classification and achieve the state-of-the-art performance of 66.07% on the CDR extraction data under the condition of no KBs.
In addition to the context information, the knowledge information also plays an important role in our RC-based model for CID relation extraction. Among the systems with KBs, it mainly includes feature-based traditional ML methods and neural network (NN)-based methods. Most of the feature-based traditional ML methods adopt support vector machines (SVMs) and other kernel-based models. More details and differences of the compared NN-based methods can be seen in Section 4.4. Inspired by [1], we utilize the knowledge low-dimension representation to guide our RC-based model for chemical-induced disease extraction and then derive the CID relation. Different from [1], our method does not need to extract the SDP of the sequence and can integrate more than one type of relation from CTD into the RC model through the knowledge attention mechanism. On the NN-based models with knowledge representation, our KRC model achieves competitive performance. Compared with the only KB method and our RC model without KBs (RC in Table 2), our RC model with KBs (KRC in Table 2) performs better, which indicates that our KRC model can discern different knowledge and incorporate the knowledge with our attention layers effectively. To test our KRC model on other KBs, we also conduct experiments with the disease-drug association network (DCh-Miner) 3 of the Stanford SNAP database. Compared with the performance without KBs, it is slightly better. Compared with the performance on CTD knowledge, it performs not so well. It may be caused by the difference between the two KBs. DCh-Miner only contains a relation that means the drug is associated with disease and we name it "association", while the CTD knowledge contains three relations including "therapeutic", "inferred-association", "marker /mechanism" and only "marker /mechanism" indicates the relation of chemical-induced disease. It is worth noting that DCh-Miner is extracted from the CTD and the relation "association" covers three relations of CTD. The knowledge in DCh-Miner may mislead the model and decrease the performance.
For the CHR dataset, there are only models without knowledge bases for comparison. Since it was created by distant supervision with the graph database Biochem4j and chemical relation labels in the dataset are the same as those in Biochem4j, we did not add this knowledge to our proposed model for prediction. As shown in Table 3, three previous NN-based methods are compared with our proposed model. GCNN [2] built a labeled edge graph convolutional neural network model on a document graph for document-level biomedical relation extraction. Different from GCNN [2], we transform the entity pair classification over a document to reading comprehension over a document. Compared with GCNN [2], we observe that our proposed RC model is 5.8 percentage points higher than the best F1 score and achieves the stateof-the-art performance.

The effect of different pretrained models
As shown in Table 4, we compared models finetuned on extra open-domain reading comprehension data. Since our data is biomedical text, we choose the biobert [28] as the base model which is a language model named bert pretrained on a large scale of biomedical text. To investigate the effect of reading comprehension pretrained tasks, we further utilize the biobert model finetuned on the SQuAD [17] data by [29]. We named this model biobert+SQuAD in Table 4   the CID relation extraction on the CDR data and there is an improvement of 1.87%. Benefitting from this RC framework, we can utilize large-scale data from an open domain reading comprehension task to help biomedical relation extraction especially when the biomedical relation extraction data is not enough.

The effect of query construction
As described in Section 4.2, two kinds of question construction methods are designed for our RC model. To select a better query construction strategy, we investigate the effect of these construction methods on performance. The experiments are conducted on our RC model based on the SQuAD finetuned biobert model under the condition of no knowledge. The performance on the pseudo query and the natural language query is shown in Table 5. The results show that using the natural language query achieves a higher document-level F1 of 66.07%. The reason may be that the natural query provides fine semantic hints for CID relation extraction while the pseudo query is just the simple concatenation of one entity mention, a relation type and another entity type which may confuse the model.

The effect of knowledge representation
As shown in Table 6, we investigate the effect of knowledge combination, including only KB, no KB(RC), and adding KB. In the part of adding KB, we compared two kinds of methods for knowledge-enhanced attention in our RC model. Only KB means we just directly match the relation of entity pairs in the CDR test set with the triples in CTD. From the result of only KB, we can see that the recall is high and the precision is not so well. It indicates that there is noise in triples extracted from CTD. Also, these triples can not fully cover CDR data. Thus, it is necessary to combine the CDR data and CTD knowledge in the RC model. Compared with the results of only KB, our proposed RC  Table 6 Ablation study over our proposed KRC model on the CDR dataset 1,2 metrics on intra and inter sentential levels, we used the calculation methods in [1]. If using the calculation methods in [24],  model performs better which indicates that the context information can be somehow captured by our RC model for CID relation extraction. As described in Section 3.2, there may be more than one KB relation representation between an entity pair. To further investigate the effect of our knowledge-enhanced RC model, we compared two operations of KB relation representation in the first step of our knowledge-enhanced attention layer. RC+KB(atten2) means we adopt the average operation for different KB relation representations. RC+KB(atten1+atten2) means we adopt the attention mechanism to automatically select the relevant KB relation representation in the model. The results on RC+KB(atten2) and RC+KB(atten1+atten2) show that the automatic attention selection works when more than one relation representation occurs and achieves a higher F1 of 71.18%. To further analyze the results of instances accompanied by different types of relations extracted from CTD, a detailed case study of no KBs and using KBs can be seen in the following section.
To investigate the robustness of our approach dealing with knowledge, we selected knowledge from the CTD knowledge base (KB) respectively for intra sentential and inter sentential data by four ratios, including 0.25, 0.5, 0.75, and 1.0 to conduct the knowledge-enhanced experiments on the CID task. The performance gradually increases with the increase of the knowledge proportion. When the selection ratio is 1.0, there is about 74% of data guided by CTD knowledge. When the selection ratio is 0.75, there is about 55% of data guided by CTD knowledge. As shown in Table 7, the performance of our KRC model surpasses that of our model without KBs and improves obviously when the selection ratio is more than 0.75, that is to say, the proportion of data guided by knowledge is more than 55%. Otherwise, it will decrease the performance.  . 3 The proportion of relation combination types extracted from CTD in correctly predicted cases on the intra sentential and inter sentential level

Case study of knowledge effect
This section analyzes relation complexity of integrating knowledge, good and bad cases with and without knowledge. According to the CTD guideline, there are three relation types in the knowledge base. Therefore, more than one candidate relation type can be extracted from CTD for an entity pair in some cases and they may be consistent or inconsistent with the true relation type of the entity pair.
To discuss the relation complexity when integrating knowledge, we counted the proportion of different relation combinations extracted from CTD for corrected extracted CID pairs as shown in Fig. 3. Here, 'inferred-association' means chemicals are inferred associated with diseases via CTD-curated chemical-gene interactions. 'marker/mechanism' indicates that a chemical may cause a disease. 'therapeutic' means a chemical that has a known or potential therapeutic role in a disease. '#' is a separator. The extracted relations to guide prediction for each CID pair are complex. Some of them are composed of more than one relation. 'inferred-association#marker/mechanism' contains two potentially related relation types. 'marker/mechanism#therapeutic' contains two conflict relation types. 'inferred-association#marker/mechanism#therapeutic' contains both related and conflict relation types. Except for the related relation types, the statistics in Fig. 3 show that our method can also deal with the cases with some noisy relation knowledge and extract the correct relations for entity pairs.
Additionally, we analyze some cases with different relation types extracted from CTD. As shown in Table 8, in the first example, the two relation types in CTD are potentially related. In this case, the CTD knowledge is consistent with the true relation type of the candidate entity pair. According to the CTD knowledge, 'inferred-association' and 'marker/mechanism', our KRC model extracted the answer 'syndrome of inappropriate secretion of antidiuretic hormone' for the vincristine-induced disease correctly and predicted that vincristine induces the syndrome of inappropriate secretion of antidiuretic hormone, while our RC model without KBs extracted no answer and could not detect the CID relation in this case.  In the second example, we explore the case with two conflict relation types in CTD. In this case, 'therapeutic' is inconsistent with the true relation type of the candidate entity pair. Integrating two different relation types, our KRC model learned from the context and successfully picked the most relevant relation, predicting that the answer to the xylometazoline-induced disease was 'cataleptic'. While integrating no KBs, our RC model predicted that there was no answer in the context for the naphazoline-induced disease.
There are still implicit instances that are difficult for our model to extract disease spans, although the relation types from CTD knowledge indicate that the chemical may cause the disease. Take the context "Switching the immunosuppressive regimen from tacrolimus to cyclosporine did not improve the clinical situation. The termination of treatment with any calcineurin inhibitor resulted in a complete resolution of that complication. CONCLUSIONS: Posterior reversible encephalopathy syndrome after liver transplant is rare. " of the third case as an example. When asked by a query "what disease does tacrolimus induce?", an answer "Posterior reversible encephalopathy syndrome" is expected to be extracted to indicate the CID relation between the chemical and the disease. However, no answer ("null") was predicted. Here, the context did not obviously reveal the CID relation between "tacrolimus" and "Posterior reversible encephalopathy syndrome". It just implies that the termination of the calcineurin inhibitor "tacrolimus" results in a complete resolution of that complication "Posterior reversible encephalopathy syndrome", which indicates the CID relation between the two entities.

Error analysis
To detect the main error sources, we performed error analysis on the final results of our proposed model on the CDR data as shown in Fig. 4. Among these errors, 293 negative chemical-disease entity pairs were wrongly classified as positive, accounting for 48.19%. In these instances, disease mentions were extracted by our KRC model while actually no answer should be predicted. Some inconsistent annotations may lead to some incorrectly annotated instances. Some curated CTD knowledge may mislead the predictions. Some instances containing complex context may make it difficult for our KRC model to extract the correct chemical-induced disease. Taking the sentence "Ethambutol-induced toxic optic neuropathy was suspected and tablet ethambutol was withdrawn. " as an example, 'induced' appears near 'Ethambutol' and 'toxic optic neuropathy' which may mislead the model to extract 'optic neuropathy' as the ethambutol-induced disease instead of the null answer. 315 positive chemical-disease entity pairs were wrongly classified as negative, accounting for 51.81%. Some positive instances were removed by the instance construction rules and hypernym filtering and they did not appear to be predicted by the model, which resulted in 132 errors accounting for 21.71%. In some other positive instances, no answer was extracted while actually disease mentions should be extracted. Taking the sentence "METHODS: Newborn piglets received levobupivacaine until cardiovascular collapse occurred. " as an example, there is no obvious trigger and expression about the relation 'induce' which may make it difficult to extract the levobupivacaine-induced disease 'cardiovascular collapse'. Besides, a few cases with the relation type 'null' of CTD knowledge led to incorrect prediction.

Conclusions
In this paper, we propose a novel knowledge-enhanced reading comprehension framework for biomedical relation extraction, incorporated with an effective knowledgeenhanced attention mechanism to combine noisy knowledge. In the RC framework, it shows open-domain reading comprehension data and knowledge representation can significantly improve the performance of biomedical relation extraction. The experiments on the CDR data and the CHR dataset show that our proposed model achieved competitive F1 values of 71.18% and 93.3%, respectively, compared with other methods. In the future, we would like to design a more sophisticated reading comprehension model for biomedical relation extraction and apply it to other more complex biomedical tasks.