A resource-saving collective approach to biomedical semantic role labeling

Background Biomedical semantic role labeling (BioSRL) is a natural language processing technique that identifies the semantic roles of the words or phrases in sentences describing biological processes and expresses them as predicate-argument structures (PAS’s). Currently, a major problem of BioSRL is that most systems label every node in a full parse tree independently; however, some nodes always exhibit dependency. In general SRL, collective approaches based on the Markov logic network (MLN) model have been successful in dealing with this problem. However, in BioSRL such an approach has not been attempted because it would require more training data to recognize the more specialized and diverse terms found in biomedical literature, increasing training time and computational complexity. Results We first constructed a collective BioSRL system based on MLN. This system, called collective BIOSMILE (CBIOSMILE), is trained on the BioProp corpus. To reduce the resources used in BioSRL training, we employ a tree-pruning filter to remove unlikely nodes from the parse tree and four argument candidate identifiers to retain candidate nodes in the tree. Nodes not recognized by any candidate identifier are discarded. The pruned annotated parse trees are used to train a resource-saving MLN-based system, which is referred to as resource-saving collective BIOSMILE (RCBIOSMILE). Our experimental results show that our proposed CBIOSMILE system outperforms BIOSMILE, which is the top BioSRL system. Furthermore, our proposed RCBIOSMILE maintains the same level of accuracy as CBIOSMILE using 92% less memory and 57% less training time. Conclusions This greatly improved efficiency makes RCBIOSMILE potentially suitable for training on much larger BioSRL corpora over more biomedical domains. Compared to real-world biomedical corpora, BioProp is relatively small, containing only 445 MEDLINE abstracts and 30 event triggers. It is not large enough for practical applications, such as pathway construction. We consider it of primary importance to pursue SRL training on large corpora in the future.


Results:
We first constructed a collective BioSRL system based on MLN. This system, called collective BIOSMILE (CBIOSMILE), is trained on the BioProp corpus. To reduce the resources used in BioSRL training, we employ a tree-pruning filter to remove unlikely nodes from the parse tree and four argument candidate identifiers to retain candidate nodes in the tree. Nodes not recognized by any candidate identifier are discarded. The pruned annotated parse trees are used to train a resource-saving MLN-based system, which is referred to as resource-saving collective BIOSMILE (RCBIOSMILE). Our experimental results show that our proposed CBIOSMILE system outperforms BIOSMILE, which is the top BioSRL system. Furthermore, our proposed RCBIOSMILE maintains the same level of accuracy as CBIOSMILE using 92% less memory and 57% less training time.
Conclusions: This greatly improved efficiency makes RCBIOSMILE potentially suitable for training on much larger BioSRL corpora over more biomedical domains. Compared to real-world biomedical corpora, BioProp is relatively small, containing only 445 MEDLINE abstracts and 30 event triggers. It is not large enough for practical applications, such as pathway construction. We consider it of primary importance to pursue SRL training on large corpora in the future.

Background
Biomedical semantic role labeling (BioSRL) Biomedical semantic role labeling (BioSRL) is an important natural language processing technique for life scientists who are interested in uncovering information related to biological processes within literature. In BioSRL, sentences are represented by one or more predicate argument structures (PASs), also known as propositions [1]. Each PAS is composed of a predicate (a verb) and several arguments (e.g., noun phrases) that have different semantic roles, including main arguments such as agent and patient, as well as adjunct arguments, such as time, manner, and location. Here, the term argument refers to a syntactic constituent of the sentence related to the predicate, and the term semantic role refers to the semantic relationship between a predicate and an argument of a sentence. For example, in Figure 1, the sentence "IL4 and IL13 receptors activate STAT6, STAT3, and STAT5 proteins in the human B cells," describes a molecular activation process. It can be represented by a PAS in which "activate" is the predicate, "IL4 and IL13 receptors" and "STAT6, STAT3, and STAT5 proteins" comprise ARG0 (agent) and ARG1 (patient), respectively, with "in the human B cells" as the location. Thus, the agent, patient, and location are the arguments of the predicate.
Given a sentence, the SRL task executes two steps: predicate identification and argument recognition. The first step can be achieved by using a part-of-speech (POS) tagger with some filtering rules. Then, the second step recognizes all arguments, including grouping words into arguments and classifying the arguments into semantic role categories. Some studies refer to these two sub-steps as argument identification and argument classification, respectively [2,3]. In the second step, it is often difficult to determine the word boundaries and semantic roles of an argument as they depend on many factors, such as the argument's position, the predicate's voice (active or passive) and the sense (usage). The second step can be formulated as a sentence tagging problem. A sentence can be represented by a sequence of words, a sequence of phrases, or a parse tree; the basic units of a sentence are words, phrases, and constituents (a node on a full parse tree) arranged in the above representations, respectively. Hacioglu et al. [4] showed that tagging phrase-by-phrase (P-by-P) is better than wordby-word (W-by-W). However, Punyakanok et al. [3] showed that constituent-by-constituent (C-by-C, or node-by-node) tagging is better than P-by-P. Based on Punyakanok et al.'s findings, BIOSMILE [5] also adopted C-by-C tagging for BioSRL and achieved an accuracy close to that of top general SRL systems.
C-by-C approaches can be called "discrete approaches" because they do not consider dependencies among constituents/nodes. For example, a parent node and any of its children nodes cannot both be labeled with semantic roles simultaneously. In SRL, collective approaches which label several or all nodes simultaneously have been proposed and outperform discrete approaches [6]. The Markov logic network (MLN) model is a good representative collective approach. It offers the flexibility to model dependencies with first-order-logic formulae.
In this paper, we explore the collective approach to BioSRL by building an MLN-based system. Despite the convenience of modeling dependencies and the high accuracy of MLN, we have observed that it requires more memory and longer training times on a large corpus. This is an obstacle to applying MLN to BioSRL, which requires a large amount of training data to cover the wide variety of specialized biomedical subdomains. To reduce the resources used in BioSRL training, we employ a tree-pruning filter to remove unlikely nodes from the parse tree and four argument candidate identifiers to retain candidate nodes in the tree. Nodes not recognized by any candidate identifier are discarded. The pruned annotated parse trees are used to train a resource-saving MLN-based system, which is referred to as resource-saving collective BIOS-MILE (RCBIOSMILE).

Methods
In this section, we will firstly describe our main contribution: resource-saving preprocessing. Then, we will illustrate the Markov-logic-network-based collective learning approach. Before entering into the explanation of our methods, we define the terms used in this section. Given a sentence s and its full parse tree p, every node n i in p corresponds to some substring of s, referred to as sub i . For the convenience of explanation, sub i is referred to as n i 's span. Take the sentence in Figure 2 for example, the node NP[ARG1]'s span is "HTLV-1 transcription". ARG1 is this node's semantic role or argument type. All nodes with semantic roles are referred to as "argument nodes". In SRL, predicate means the verb while in MLN, predicate represents relationships among objects in the first order logic. To avoid the ambiguity between the predicates in FOL and SRL, we follow Punyakanok et al. [3] to refer to the predicates in SRL as "verb predicate (VP)" afterwards.

Resource-saving preprocessing
To reduce the resources used in BioSRL training, we employ a tree-pruning filter to remove unlikely nodes from the parse tree and four argument candidate identifiers to retain candidate nodes in the tree. Nodes not recognized by any candidate identifier are discarded. In the following subsections, we describe these components.

Tree pruning filter (TPF)
We employ four rules to prune nodes unlikely to be arguments. The first rule is based on the intuition that if a node's span overlaps with the verb predicate, the node is unlikely to have a semantic role. Such nodes are located in the same path as the node whose span is the verb predicate and must be removed. An example is shown in Figure 3a. Figure 3b shows the application of rules 2, 3, and 4. The second rule r 2 is to remove all leaf nodes, which are enclosed in a dotted box. The third rule r 3 is to remove all nodes without any siblings. These nodes are enclosed in dotted circles and labeled with "r 3 ". The fourth rule r 4 is to remove all nodes whose spans are stop-words. These nodes are enclosed in dotted circles and labeled with "r 4 ".

Association rule candidate identifier (ARCI)
Rules containing several predicates are effective for identifying argument candidates. For example, if a  node i starts with "in" and ends with "cell", then it is very likely to be a location argument (ARGM-LOC). This rule can be translated into the following first-order logic formula: We can see that this rule is composed of three predicates: firstword (i, "in"), lastword (i, "cell"), and role (p, i, "ARGM -LOC").
However, compiling the set of rules is labor-intensive. To automatically generate a rule, the main task is to decide which predicates must be included. We select several basic predicate types listed as follows.

The pool of predicates
Node type First word and last word stem POS of the first word and last word POS of the verb predicate Voice of the verb predicate Before or after the verb predicate Semantic role The verb predicate Syntactic path from the verb predicate These features are selected from basic features used in the BIOSMILE [5] system. We formulate the rulegeneration problem as a problem of association rule mining [7].
To formulate the rule generation problem as an association-rule-mining task, it is necessary to define four things including item, transaction, support and confidence. An item is a predicate appearing in a rule, such as the predicates firstword (i, "in"), lastword (i, "cell"), and role (p, i, "ARGM -LOC") from the above example. To acquire I, the set of all items, we process the training set and extract predicates from arguments. A transaction is a collection of items. In this work, all arguments (nodes with semantic roles) in the training set are treated as transactions. For instance, the two sentences shown in Figure 4 can be transformed into the transactions shown in 'The transactions extracted from the sentences in Figure 4. The full name of all abbreviations can be found in Table 1'. The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain X. The confidence of a rule For example, Rule 1 has a confidence of 0.021/0.022 in the database, which means that 95% of the transactions that follow the rule are correct. Using the apriori algorithm [8], rules such as Figure 4 Examples of association rule mining. a. T3 efficiently induced erythroid differentiation in these cells, thus overcoming the v-erbA-mediated differentiation arrest. b. In contract mRNA representing pAT 591/EGR2 was not expressed in these cells.
can be generated.
The generated association rules are employed to identify argument candidates. Nodes not matching any rule are discarded.
The transactions extracted from the sentences in Figure 4. The full name of all abbreviations can be found in Table 1 FW

Word-based candidate identifier (WCI)
Some types of arguments can be identified by checking if they exactly match specific words with other conditions. We compile lists of words corresponding to three types of arguments. These argument types are: Discourse Argument (ARGM-DIS): Discourse arguments connect sentences to preceding sentences. If a node's span can be found in the word list for ARGM-DIS, the node is regarded as an ARGM-DIS candidate. The word lists for ARGM-DIS, ARGM-MOD and ARGM-NEG are shown in Table 2. Modal Argument (ARGM-MOD) and Negation Argument (ARGM-NEG): If a node's span appears right before a verb predicate and can be found in the word list for either ARGM-MOD or ARGM-NEG, it is regarded as an ARGM-MOD or ARGM-NEG candidate, respectively.

Pattern-based candidate identifier (PCI)
Extent Marker(ARGM-EXT): The extent marker indicates the amount of change caused by an action, such as "approximately 12-fold" in Figure 2. We have observed that extent markers are usually siblings of the verb-predicate node (the VBD[Verb Predicate] node in Figure 2). For each sibling sib, the identifier checks whether the subtree with root sib has any nodes whose spans match the extent • Position -Whether the phrase is located before or after the verb predicate • Voicepassive if the verb predicate has a POS tag VBN, and its chunk is not a VP, or it is preceded by a form of "to be" or "to get" within its chunk; otherwise, it is active • Head word -Calculated using the head word  Table 3). Matching nodes are regarded as ARGM-EXT candidates.
Temporal Marker (ARGM-TMP): The temporal marker indicates when an action takes place. Like extent markers, temporal markers are usually siblings of the verb-predicate node and are identified in the same manner. The identifier finds ARGM-TMP candidates by checking whether the subtrees of verb-predicate node with root sib have any nodes whose spans match the temporal marker pattern. In addition, temporal markers sometimes appear at the beginning of a sentence. Therefore, the identifier also checks if any nodes whose spans start with the first word of a sentence match the temporal marker pattern (shown in Table 3). Such nodes are also considered ARGM-TMP candidates.

Parse-tree-based candidate identifier (PTCI)
If a node n is not identified as a candidate by the above components, this module will check if the path from the verb-predicate node to n is equal to any path from the verb-predicate node to an argument node m in the training set. If yes, n will be treated as a candidate of m's argument type.

Markov logic
First-order logic MLN combines first order logic (FOL) and Markov networks. In FOL, formulae consist of four types of symbols: constants, variables, functions, and predicates. Constant symbols represent objects in a specific domain (e.g., Annie, Bob, Cathy, etc.). Variable symbols range over the objects in the domain. Function symbols (e.g. MotherOf) represent mappings from tuples of objects to objects.
Predicates represent relationships among objects (e.g. Friends), or attributes of objects (e.g. Smokes). Constants and variables may belong to specific types. An atom is a predicate symbol applied to a list of arguments, which may be constants or variables. A ground atom is an atom whose arguments are all constants. A world is an assignment of truth values to all possible ground atoms. A knowledge base (KB) is a partial specification of a world; each atom in it is true, false, or unknown.

Markov networks
A Markov network represents the joint distribution of a set of variables X = {X 1 , …, X n } ∈ X as a product of factors: where each factor f k is a non-negative function of a subset of the variables x k , and Z is the normalization constant. The distribution is usually equivalently represented as a log-linear form: are arbitrary functions of (a subset of) the variables' states.

Markov logic networks
An MLN is a set of weighted first-order formulae. Together with a set of constants representing objects in the domain, it defines a Markov network with one variable per ground atom and one feature per ground formula. The probability distribution over possible worlds is given where Z is the partition function, F is the set of all first-order formulae in the MLN, G i is the set of groundings of the i-th first-order formula, and g j (x) = 1 if the j-th ground formula is true, and g j (x) = 0 otherwise. Markov logic enables us to compactly represent complex models in non-i.i.d. domains. General algorithms for inference and learning in Markov logic are discussed in Richardson and Domingos [9]. We use the1-best MIRA online learning method [10] for learning weights and employ cutting plane inference [11] with integer linear programming as its base solver for inference at test time as well as during the MIRA online learning process. As aforementioned, to avoid the ambiguity between the predicates in FOL and SRL, we refer to SRL predicates in as "verb predicate".

Formulae
Local formulae (L) As shown in Table 1, local formulae are derived from the features used in the SRL systems [2,[12][13][14] based on the maximum entropy (ME) model and support vector machine (SVM) model. We used these features in BIOSMILE [5], and we have transformed them into formulae here to employ them in our MLN model.  where w is the headword of the node i. If the "+" symbol appears before a variable, it indicates that each different value of the variable has its own weight.

Collective formulae (C)
Collective classification is a methodology that simultaneously classifies related instances. It can improve classification accuracy over non-collective methods when instances are interrelated [15][16][17]. MLN performs well in many collective classification tasks such as entity linking [18][19][20], coreference resolution [21,22] and biomedical event extraction [23]. In node-by-node SRL, related instances are nodes having linguistic dependencies. There are two main types of linguistic dependencies in SRL: tree dependency and path dependency. They have been shown to be effective in improving the consistency of SRL results [3]. Nodes with tree dependencies and path dependencies can be treated as tree collectives and path collectives, respectively. In MLN-based SRL, collectives can be implemented with collective formulae that model dependencies among nodes.
Given a sentence sen and a verb predicate p, a tree collective is composed of all nodes in sen's parse tree. In this tree collective, there is a constraint that each core semantic role of p can only be assigned to one node, which can be expressed in the following formula: verb À predicate p ð Þ∧core À arg þr ð Þ⇒ role p; i; þr ð Þ j j ≤1 In addition, a path collective is composed of all nodes in a path. Spans of nodes in the same path collective overlap. Therefore, only one node in a path can play a semantic role. This dependency can be formulated as follows:

Candidate identification formulae (CI)
In our resource-saving preprocessing step, candidate identifiers recognize the most likely semantic roles for each node. The information can be transformed into formulae to improve the accuracy of MLN inference. For a node i retained as a candidate node by our resource-saving preprocessing, an observed predicate candidate_role(p, i, r) is added to our MLN-based inference system.

Dataset
We use BioProp [24] as our evaluation dataset. BioProp is a semantic role labeling corpus which contains 445 biomedical abstracts containing 1,982 PAS's labeled with the 30 most common biomedical verb predicates and their semantic roles. Table 4 shows the statistics of the BioProp corpus. Core arguments such as ARGX, R-ARGX and C-ARGX play the main semantic roles in a PAS. ARGX (ARG0-ARG5, ARGX) are the most necessary arguments of a given verb predicate. C-ARGX is used to represent multinode arguments. A node labelled C-ARGX is assumed to be a continuation of the closest node to the left labelled ARGX. A node labelled R-ARGX is assumed to be a relative pronoun of the closest node to the left labelled ARGX. Adjunctive arguments (ARGM-X) play the semantic roles of location, manner, time, or extent in a PAS.

Evaluation metric
The argument-wide results are given as F-score using the CoNLL-05 [25] evaluation script and defined as F ¼ 2ÂPÂR PþR , where P denotes the precision and R denotes the recall. The formulae for calculating P and R are as follows: P ¼ the number of correctly recognized arguments the number of recognized arguments R ¼ the number of correctly recognized arguments the number of arguments Furthermore, we also evaluate the results in terms of the PAS-wide F-score (F PAS ), which is defined as F PAS ¼ 2ÂP PAS ÂR PAS P PAS þR PAS . The formulae for calculating P PAS and R PAS are as follows:

t-test
In order to evaluate our performance under an unbiased circumstance, we apply a two-sample paired t-test, which is defined as follows: The null hypothesis, which states that there is no difference between the two configurations A and B, is given as where μ A is the true mean F-score of configuration A and μ B is the mean of the configuration B, while the alternative hypothesis is A two-sample paired t-test is applied since we assume the samples are independent. As the number of samples is large and the samples' standard deviations are known, the following two-sample t-test can be administered: If the resulting t-score is equal to or less than 1.67 with a degree of freedom of 29 and a statistical significance level of 95%, the null hypothesis is accepted; otherwise it is rejected.
To retrieve the average F-scores and their deviations required for the t-test, we randomly sampled thirty training sets (g 1 , …,g 30 ) and thirty test sets (d 1 , …, d 30 ) from the 445 abstracts. Each training set and test set contains 365 and 89 abstracts, respectively. We trained the model on g i and tested it on d i . Afterwards, we summed the scores for all thirty test sets and calculated the averages for performance comparison.

Configuration settings
We construct three configurations of our system for comparison. The BIOSMILE configuration uses the local formulae (L), which is equivalent to our previous work. The CBIOSMILE configuration uses both the local (L) and collective formulae (C). In the RCBIOSMILE configuration, we firstly employ our proposed resource-saving preprossing (RP) step. Then, the local (L), collective (C), and candidate identification formulae (CI) are all used. The settings of these three configurations are shown in Table 5. Table 5 shows the argument-wide performance of all configurations of our system on the CoNLL evaluation metrics, which measure whether each argument is independently correct. Table 6 shows configuration performance on PAS-wide evaluation metrics, in which a PAS is regarded as successfully extracted only if all its member arguments are correctly extracted. We use '*' to indicate that the configuration shows a statistically significant improvement over BIOSMILE. CBIOSMILE is BIOSMILE with integrated collective learning. Table 5 shows that, across all arguments, CBIOSMILE outperforms BIOSMILE by 1.02% on average in terms of F-score. RCBIOSMILE boosts the improvement to 1.18%.

Extraction performance
In PAS-wide evaluation (Table 6), CBIOSMILE shows an improvement of 4.60% (F-score) over BIOSMILE, and an even larger advantage can be observed on core arguments (6.74%). RCBIOSMILE enlarges the improvement to 4.78% (F-score) over BIOSMILE, and an even larger advantage can be observed on core arguments (7.15%).

Resource savings
We first examine the effects of tree pruning filters. On average, rules 1,2,3, and 4 removes 10%, 36%, 5%, and 13% of all nodes, respectively. Applying all four tree pruning rules can filter 43% of all nodes. Table 7 displays time and memory costs. We compare each configuration's training time per iteration and test time per instance. Compared to CBIOSMILE, which has similar performance (Table 6), RCBIOSMILE requires 92% less memory and 57% less training time, which are dramatic savings. We believe that RCBIOSMILE could be further improved by adding new SRL patterns written manually by biological experts.

Discussion
Advantages of (R) CBIOSMILE over BIOSMILE CBIOSMILE and RCBIOSMILE excel at correcting two error types, (1) duplicate arguments and (2)  Here BIOSMILE, working node by node, labels both "cDNA clones" and "multiple criteria" as ARG0-the former because it appears before "which", and the latter because is the nearest noun phrase to the verb predicate "encode". (R) CBIOSMILE avoids this error because the tree collective formulae limit the maximum number of any core argument type to one. Therefore, it labels only the node with the highest likelihood of being ARG0.
Similarly In this example, BIOSMILE incorrectly labels the two overlapping nodes "Isolation of a rel-related human cDNA" and "a rel-related human cDNA" as ARG0. (R) CBIOSMILE does not make such errors because path collective formulae assert that only one node in a path of a parse tree can be an argument.

Advantages of RCBIOSMILE over CBIOSMILE
According to the results of individual arguments for all argument types (shown in Table 8), we can see that RCBIOSMILE significantly outperforms CBIOSMILE in ARGM-ADV, ARGM-TMP, and R-ARG0. This may be because RCBIOSMILE employs several candidate identifiers to enhance the likelihood of true arguments being correctly labeled. Take the following sentence for example: [Although lymphokine genes are coordinately regulated upon antigen stimulation ARGM-ADV ], [they ARG1 ] are [regulated Verb Predicate ] [by the mechanisms common to all as well as those which are unique to each gene ARG0 ] .
The parse-tree-based candidate identifier employed by RCBIOSMILE recognizes the node whose span is "Although lymphokine genes are coordinately regulated upon antigen stimulation" as a candidate of ARGM-ADV or ARGM-TMP, and RCBIOSMILE adds the predicates "candidate_role(p, i, ARGM-ADV)" and "candidate_role (p, i, ARGM-TMP)" into the inference system. Therefore, this node can be successfully labeled as ARGM-ADV. Because CBIOSMILE does not use the candidate identifier, it has difficulty labeling such long-span nodes with the correct argument types.

Disadvantages of (R) CBIOSMILE
RCBIOSMILE and CBIOSMILE, which use collective learning, outperform BIOSMILE in most argument types. Surprisingly, they perform worse than BIOSMILE in ARGM-DIS and ARGM-ADV. We believe that this may be because ARGM-DIS can be easily recognized by position information (ARGM-DIS usually appears at the beginning of a sentence), and duplication and overlapping errors (which collective approaches excel at handling) seldom occur in ARGM-ADV.

Pruning errors of RCBIOSMILE
We observed that RCBIOSMILE cannot recognize some arguments that can be recognized by BIOSMILE and CBIOSMILE. After analysis, we found that such errors In this example, since the syntactic path from the node whose span corresponds to "the thymus" does not link to an ARG0 node and a verb-predicate node, this node is pruned. Therefore, RCBIOSMILE predicts a nearer noun phrase "the most potent site" as ARG0.

Related work Biomedical semantic role labeling corpus
PASBio [26] is the first PAS standard used in the biomedical field, but it does not provide a SRL corpus. GREC [27] is an information extraction corpus focusing on gene regulation events. However, GREC does not support the Treebank format SRL annotations [28]. Bio-Prop is the only corpus that provides SRL annotations and annotates semantic role labels on syntactic trees. It is created by [24]. BioProp selects 30 most frequent or significant verbs found in biomedical literatures, and defines the standard of the biomedical PAS. Furthermore, in accordance with the style of PropBank [29], which annotates PAS on Penn Treebank (PTB) [28], BioProp annotates their PAS on the GENIA TreeBank(GTB) beta version [30]. GTB includes a collection of 500 MED-LINE abstracts selected from the search results with the following keywords: human, blood cells, and transcription factors and contains a TreeBank that follows the style of Penn Treebank.

Biomedical semantic role labeling system
Most semantic role labeling systems follow the pipeline method, which includes predicate identification, argument identification and argument classification. However, in recent years, instead of using the pipeline method, several researches have shown that using the collective learning method can outperform the traditional pipeline method. Riedel et al. [11] uses Markov Logic to collectively learn these stages on SRL. However, to the best of our knowledge, there seems to be no existing SRL system using MLN in the biomedical field. Dahlmeier et al. [31] uses the domain adaption approaches to improve SRL in biomedical field. Bethard et al. [32] considers SRL as a tokenby-token labeling problem and focuses on the SRL in transport proteins. BIOSMILE is a biomedical SRL system that focuses on 30 frequently appearing or important verbs in biomedical literatures and trained on the BioProp, and it is based on the Maximum Entropy (ME) Model.

Conclusions
Currently, a major problem of BioSRL is that most systems label every node in a full parse tree independently; however, some nodes always exhibit dependency. In general SRL, collective approaches based on the Markov logic network (MLN) model have been successful in dealing with this problem. In this paper, we explore the collective approach to BioSRL by building an MLN-based system.
Despite the convenience of modeling dependencies and the high accuracy of MLN, we have observed that it requires more memory and longer training times on a large corpus. This is an obstacle to applying MLN to BioSRL, which requires a large amount of training data to cover the wide variety of specialized biomedical subdomains. To

ARGM-NEG
Negation, this tag is used for elements such as "not", "n't", "never", "no longer" and other markers reduce resource usage, we designed a pattern-based method to prune parse-tree nodes that may not have semantic roles. This method is applied to the parse trees in BioProp. To minimize the efforts of domain experts in manual pattern compilation, we developed an automatic pattern generation approach. The pruned annotated parse trees are used to train a resource-saving MLN-based system, which is referred to as resource-saving collective BIOSMILE (RCBIOSMILE).
Our experimental results show that our proposed CBIOSMILE system outperforms BIOSMILE, which is the top BioSRL system. Furthermore, our proposed RCBIOS-MILE maintains the same level of accuracy as CBIOS-MILE using 92% less memory and 57% less training time. This greatly improved efficiency makes RCBIOSMILE potentially suitable for training on much larger BioSRL corpora over more biomedical domains. Compared to real-world biomedical corpora, BioProp is relatively small, containing only 445 MEDLINE abstracts and 30 event triggers. It is not large enough for practical applications, such as pathway construction. We consider it of primary importance to pursue SRL training on large corpora in the future.