- Research
- Open access
- Published:
Biomedical named entity recognition using improved green anaconda-assisted Bi-GRU-based hierarchical ResNet model
BMC Bioinformatics volume 26, Article number: 34 (2025)
Abstract
Background
Biomedical text mining is a technique that extracts essential information from scientific articles using named entity recognition (NER). Traditional NER methods rely on dictionaries, rules, or curated corpora, which may not always be accessible. To overcome these challenges, deep learning (DL) methods have emerged. However, DL-based NER methods may need help identifying long-distance relationships within text and require significant annotated datasets.
Results
This research has proposed a novel model to address the challenges in natural language processing. The Improved Green anaconda-assisted Bi-GRU based Hierarchical ResNet BNER model (IGa-BiHR BNERM) is the model. IGa-BiHR BNERM model has shown promising results in accurately identifying named entities. The MACCROBAT dataset was obtained from Kaggle and underwent several pre-processing steps such as Stop Word Filtering, WordNet processing, Removal of non-alphanumeric characters, stemming Segmentation, and Tokenization, which is standardized and improves its quality. The pre-processed text was fed into a feature extraction model like the Robustly Optimized BERT –Whole Word Masking model. This model provides word embeddings with semantic information. Then, the BNER process utilized an Improved Green Anaconda-assisted Bi-GRU-based Hierarchical ResNet BNER model (IGa-BiHR BNERM).
Conclusion
To improve the training phase of the IGa-BiHR BNERM, the Improved Green Anaconda Optimization technique was used to select optimal weight parameter coefficients for training the model parameters. After the model was tested using the MACCROBAT dataset, it outperformed previous models with a tremendous accuracy rate of 99.11%. This model effectively and accurately identifies biomedical names within the text, significantly advancing this field.
Introduction
The increasing volume of health and biomedical text data, driven by electronic health records, biomedical publications, and online health information, has led to a surge in standard terminologies and knowledge bases by Govindarajan et al. [1]. While data collection is no longer a bottleneck, the challenge lies in effectively utilizing these resources and building scalable models to process vast text amounts Sung et al. [2]. As much of the data is unstructured and narrative, the quality of essential NLP tools significantly influences higher-level tasks like information retrieval, extraction, and discovery Kaswan et al. [3]. To tackle this, researchers focus on enhancing NLP techniques and developing advanced models capable of handling large-scale text processing in the biomedical domain [4].
Biomedical Named Entity Recognition (BNER) is a crucial task in processing biomedical text data, which aims to precisely identify and classify items, including genes, proteins, illnesses, and medications Ahmad et al. [5]. Many different tactics are used in this sector, but they can be broadly divided into rule-based, dictionary-based, and machine-learning approaches by Li et al. [6]. These strategies have strengths and limitations, making them suitable for different scenarios and applications in Huang et al. [7]. Rule-based techniques identify items in text data by applying specified rules and patterns. These approaches can sometimes be practical, mainly when dealing with simple entities and well-defined patterns. However, they may need help with the complexity and ambiguity of biomedical texts, requiring careful crafting of rules and potentially limited generalization to new or unseen data Chen et al. [8]. Dictionary-based approaches rely on curated dictionaries containing terms related to biomedical entities. These dictionaries offer high precision in identifying known entities but are limited by their coverage, often failing to recognize newly emerging terms not present in the dictionary. Despite this limitation, dictionary-based approaches can be helpful in specific domains with well-established terminologies, providing a reliable and efficient method for entity recognition Ramachandran et al. [9]. Machine learning approaches for biomedical NER involve training models on annotated corpora, enabling them to learn patterns and associations between words and entity categories automatically Perera et al. [10]. These models can effectively handle complexity and ambiguity, making them adaptable to new entities and domains. However, they require large annotated corpora and careful tuning, which can be time-consuming and expensive Cariello et al. [11].
Recently, Deep learning (DL) approaches have gained substantial momentum in the biomedical NER community due to their significant advantages over traditional methods Wei et al. [12]. These techniques reduce costs, time, and the need for manual feature engineering by using training in end-to-end models and extraction of automatic features Pavlova et al. [13]. DL, especially recurrent neural networks (RNNs) Hao et al. [14] and their variants, are well-suited for handling sequential data and can effectively capture long-term dependencies between entities in biomedical texts Cui et al. [15]. Additionally, architectures such as convolutional neural networks (CNNs) by Deng et al. [16] and transformer-based models like BERT developed by Cao et al. [17] have shown remarkable success in capturing contextual information and semantic representations from textual data. These models perform exceptionally well in tasks like NER, where comprehending context is critical for precise entity recognition because they use self-attention processes to capture global dependencies and relationships between words in a phrase Li et al. [18]. However, these models do possess limitations. For instance, they may need help with long sequences due to the quadratic complexity of self-attention Chen et al. [19]. Additionally, they might not effectively capture specific linguistic nuances, such as negation or coreference, which are prevalent in biomedical text Jeon et al. [20]. Hence, developing a novel model is imperative to advance the field, particularly in BNER systems.
Motivation and problem statement
Despite recent progress in BNER using DL approaches, challenges still need to be addressed, such as the limitations of existing models in handling long sequences and capturing linguistic nuances. These limitations can lead to reduced accuracy and reliability of BNER systems, impacting downstream applications. To tackle these difficulties, a unique BNER model that integrates a hierarchical ResNet model with a Bi-GRU-based architecture and an Improved Green Anaconda algorithm is recommended to optimize performance. This model can significantly improve BNER and downstream applications in biomedical text analysis.
-
A novel method was introduced to improve biomedical named entity recognition using an improved green anaconda-assisted hierarchical ResNet model with bi-GRU, which accurately recognizes biomedical names.
-
To enhance the quality of data, a pre-processing stage is performed, which includes Stop Word Filtering (SWF), WordNet (WNet) processing, Removal of non-alphanumeric characters (RnonAC), stemming, Segmentation, and Tokenization.
-
To extract the feature from pre-process data, the Robustly Optimized BERT –Whole Word Masking (ROBERT-WWM) model is utilized.
-
To detect the biomedical text efficiently, the Improved Green Anaconda-assisted Bi-GRU-based Hierarchical ResNet BNER model (IGa-BiHR BNERM) is used.
-
Improved Green Anaconda Optimization (IGAO) is used to tune the hyperparameters from the detection method.
The remaining paper's content is arranged as follows: Sect. "Related works" describes the survey of several strategies. Subsequently, Sect. "Proposed methodology" presents the specifics and procedure of the proposed methodology together with Figures. In Sect. "Results and discussion", the result analysis of the proposed technique with various related techniques is discussed briefly. Finally, the overall conclusion of the proposed method is described in Sect. "Conclusion".
Related works
A survey of some related techniques over the BNER using the DL technique is described as follows.
A clinically named entity recognition model using deep neural networks and pre-trained word embeddings was created by Dash et al. [21]. To obtain an amazing F1-Score of 88.34%, the model uses a Bidirectional-Long Short-Term Memory (Bi-LSTM) network with a Conditional Random Field (CRF), doing away with the requirement for heuristic rules or post-processing. Nevertheless, the model works only in specific languages and subjects. Fabregat et al. [22] created a word-level neural network architecture that concatenates word embeddings with one-word vector representation and integrates Bi-LSTM with CRF for clinical-named entity recognition. With an F1-Score of 88.34%, the technique outperforms heuristic rules or post-processing. The model’s significant time consumption is a noticeable drawback, too. Tian et al. [23] created a hybrid method to include syntactic information in biological named entity identification by relating a CNN with a conditional random field (CRF). On the BioCreative II GM dataset, the technique outperforms multiple state-of-the-art models with an F1 score of 87.12%.
Nevertheless, the method is only ideal for some kinds of biomedical texts. Asghari et al. [24] introduced BINER, a practical and reliable biomedical named entity recognition model that implements three different architectures: Bi-LSTM, CRF, and Embedding layer. The model can be trained on low-end GPU machines and is intended to be computationally efficient. For datasets requiring sophisticated contextual awareness, BINER performs worse than larger models, such as BioBERT, but requires fewer computer resources. TaughtNet is a knowledge distillation-based framework for multi-task biomedical named entity identification, created by Moscato et al. [25]. Even with sparse annotations for several entity types, TaughtNet improves a single multi-task student model by utilizing the expertise of teachers who only teach one activity. The approach executes better than robust state-of-the-art baselines regarding precision, recall, and F1 scores for identifying mentions of illnesses, drugs, and genes. However, one significant drawback of the model is its enormous complexity.
Using a pre-trained BioBERT model termed E-BioBER, Guan et al. [26] improved biomedical named entity recognition (BioNER) by introducing a novel word-pair categorization technique and a straightforward attention mechanism. The authors evaluated their method on five BioNER benchmark datasets, achieving state-of-the-art performance with F1 scores of 92.55%, 85.45%, 87.53%, 94.16%, and 90.55%. However, the model’s complex training processes are a limitation. Multi-task learning and fine-tuning are combined in a hierarchical shared transfer learning approach created by Chai et al. [27] for BioNER, allowing for multi-level information fusion between top data characteristics and underlying entity features. The method uses a conditional random field (CRF) decoder using XLNet as the encoder to achieve better generality and stability. However, the limitation of this study is the high time consumption.
Sun et al. [28] addressed the drawbacks of Clinical Named Entity Recognition (CNER), including a shortage of labeled data and the high expense of manual annotation, by proposing a weak supervision and clustering-based sample selection method. The proposed method outperformed multiple state-of-the-art algorithms on a newly produced dataset with an F1 score of 0.83. However, the limitation of this method is that its performance may decrease if the clustering algorithm fails to group similar samples accurately. Jeon et al. [29] created an Edge Weight Updating Neural Network (EWUNN) for named entity normalization to capture the relationships between entities and their corresponding normalized forms. This neural network updates the edge weights of a graph. On the CoNLL-2003 and Wiki80 datasets, the model obtained F1 scores of 92.1% and 95.5%, respectively. The drawback of this model is that a substantial amount of annotated data is needed for the model to be trained.
The goal of Košprdić et al. [30] transformer model for biomedical named entity recognition (BioNER) in zero-shot and few-shot circumstances was to enhance the models’ performance when labeled data is complex to arrive by or not accessible. The model achieved state-of-the-art performance when tested on various biomedical NER datasets. However, this study’s drawback is that it only employed a small dataset. Alamro et al. [31] developed a multi-feature model called BioBBC that combines various features for detecting biomedical entities in text using a neural network architecture with a Bi-LSTM layer and a CRF layer. Using the BioCreative VI Chemical Disease Relation (BC6CDR) dataset, the model demonstrated enhanced precision (85.31%) and recall (89.13%). However, a limitation of this method is that it may incorrectly recognize BNER.
Tian et al. [32] developed a graph-based approach for NER, where the input sentence is represented as a graph, and syntactic dependencies are used to construct the graph. The authors first used a graph convolutional network (GCN) to learn the nodes’ representations in the graph for NER. The strategy used the syntactic relationships between words in a sentence to enhance NER performance. The CDR Dataset was used to evaluate the model's performance, and the authors reported the accuracy, precision, and F1-score. However, a limitation of this model is that it may only sometimes be accurate. Table 1 describes the related work, proposed method, aim, limitation, and performance.
After reviewing several related techniques, several drawbacks were found that reduced their performance. These drawbacks include low performance, high time and computational complexity, and limited dataset size. The accuracy of these techniques also depends on factors such as data quality, individual differences, and the environment. To overcome these limitations, a new approach called IGa-BiHR BNERM is proposed for the BNER model. This model attempts to enhance biomedical named entity recognition performance by addressing the drawbacks of conventional techniques.
Proposed methodology
In biomedical text mining, NER is an indispensable tool for extracting vital information from scientific articles. Traditionally, NER methods rely heavily on extensive dictionaries, specific rules, or meticulously curated corpora, which may not always be accessible or feasible. DL approaches have emerged as a solution to these challenges but also have limitations. DL-based NER methods encounter challenges, such as their inability to grasp long-range dependencies in text and their demand for sizable annotated datasets. In response to these challenges, a novel approach has been introduced: a biomedical NER method utilizing an enhanced Green Anaconda-assisted Bi-GRU-based hierarchical ResNet model. This innovative model aims to mitigate the shortcomings of existing DL methods by effectively capturing contextual information while preserving the structural dependencies within the text. Figure 1 shows the process flow of the proposed BENR technique.
Initially, the BNER text dataset was obtained from Kaggle. Subsequently, the collected dataset underwent several pre-processing steps, including Stop Word Filtering (SWF), WordNet (WNet) processing, Removal of non-alphanumeric characters (RnonAC), stemming, Segmentation, and Tokenization. These steps were taken to ensure the data's standardization and quality. The pre-processed text was then fed into a feature extraction model, such as the Robustly Optimized BERT–Whole Word Masking (ROBERT-WWM) model, which is pre-trained to provide word embeddings with semantic information. Subsequently, the BNER process utilized an Improved Green Anaconda-assisted Bi-GRU-based Hierarchical ResNet BNER model (IGa-BiHR BNERM). To enhance the training phase of the IGa-BiHR BNERM, the Improved Green Anaconda Optimization (IGAO) technique was employed to select optimal weight parameter coefficients for training the model parameters. The pre-processed text was then fed into a feature extraction model, such as the Robustly Optimized BERT –Whole Word Masking (ROBERT-WWM) model, which is pre-trained to provide word embeddings with semantic information. Subsequently, the BNER process utilized an Improved Green Anaconda-assisted Bi-GRU-based Hierarchical ResNet BNER model (IGa-BiHR BNERM). This novel feature enhances sequential understanding and hierarchical feature extraction: it combines the architectures of the Bi-GRU and ResNet in a single model. Although Bi-GRU has been applied to sequence processing, the innovation combines it with a ResNet structure that uses skip connections to store significant information at several layers. Both short- and long-term dependencies are provided by this hybrid architecture with little information loss. A feature not often found in models that use ordinary Bi-GRU or CNN alone, the hierarchical ResNet preserves important semantic information that can get lost in deep layers. To enhance the training phase of the IGa-BiHR BNERM, the Improved Green Anaconda Optimization (IGAO) technique was employed to select optimal weight parameter coefficients for training the model parameters. Ultimately, the novel IGa-BiHR BNERM model accurately identifies biomedical labeled predictions.
Pre-processing
The pre-processing phase is crucial in NLP and text mining since it improves the raw data quality. The pre-processed data is more accurate and efficient for analysis and modeling. At this point, various methods are used to convert the unstructured data into a format that can be readily examined. In the proposed technique, the following six pre-processing techniques are used:
SWF [33]
Stop words are common words that contain little important information and can be removed from the text without losing critical meaning. Examples of stop words include "the," "a," "an," "in," and "of." Stop word filtering is identifying and removing these stop words from the text.
WNet processing [34]
WordNet is an English lexical database that records the semantic relationships between synonym sets and groups English words into synsets, or synonyms. It also offers brief descriptions for each synonym set. WordNet processing entails utilizing this database to identify significant connections between words and increase the text's vocabulary by locating synonyms for the terms used.
RnonAC [35]
Non-alphanumeric characters are not letters or numbers. These characters can include punctuation marks, memorable characters, and white spaces. Eliminating non-alphanumeric characters can contribute to text simplification and facilitate analysis.
Stemming [36]
The act of reducing words to their fundamental form is known as stemming. As an illustration, the roots of the phrase "running," "runner," and "ran" are all "run." The implementation of stemming can aid in reducing text dimensionality, thereby facilitating analysis.
Segmentation [37]
Breaking up material into smaller pieces, such as sentences or phrases, is known as Segmentation. Sentence segmentation is the most used kind of Segmentation in NLP. Sentence segmentation is the process of breaking up a paragraph into separate sentences. Because it enables us to examine each phrase independently and extract useful information from them, this is a crucial step in natural language processing (NLP).
Tokenization [38]
Tokenization involves breaking a string of characters or a whole sentence into its components. The most common tokens are discrete pieces of text, such as words or phrases, although tokens can also be characters. Tokenization is an integral part of processing and analyzing natural text in NLP. Splitting text into smaller portions helps examine and digest text more efficiently.
Data quality and suitability for analysis and modeling can be enhanced by the suggested method using various pre-processing procedures.
Feature extraction using ROBERT-WWM model
The study uses the ROBERT-WWM [39] model to extract features from pre-processed data for Biomedical Named Entity Recognition (BNER).
A pre-training language model called BERT is the foundation for the ROBERT-WWM model. Using a two-way transformer, BERT encodes a word's left and right sides, effectively fusing their information. BERT’s training aims to predict the following sentence (NSP) and make a masked language model (MLM). The MLM algorithm picks 15% of words from biomedical sentences to replace them at random. There is an 80% chance that these words will be replaced with a masking mark, a 10% chance that they will be replaced with random words, and a 10% chance that the original word will not be changed. However, ROBERT employs a dynamic mask, randomly selecting a specific percentage of the biological text sample for replacement in every iteration cycle. As a result, the model demonstrates enhanced proficiency in comprehending phrase patterns, improving its ability to recognize biomedical text items associated with diverse biomedical terminology accurately.
The wwm suffix indicates that samples use word granularity to cover up when using the whole-word mask approach during the pre-training phase. Getting semantic representation at the word level can be accomplished this way. The benefits of the BERT and ROBERT models are combined in the ROBERT-WWM model. Every time BERT performs MLM on tokenization data, it only covers one biomedical text and cannot acquire word-level semantic information. The biomedical text whole word masking is adopted by ROBERT-WWM. To begin with, the tokenization data is divided into segments, with words that belong together being predicted collectively. Finally, word-level characteristics are generated into dynamic word vectors. It is better and more appropriate for the BNER assignment.
During the training phase, the model parameters are modified depending on the data provided by ROBERT-WWM to enable the model to better grasp the semantic features of biomedical text data. ROBERT-WWM consists of 12-layer transformers that employ a multi-head attention mechanism to maintain a constant spacing between words in the input tokenization text. The model structure is seen in Fig. 2.
The biomedical text semantic information acquired by ROBERT-WWM during the pre-training phase is contained in the \(Tok = \left\{ {Tok_{[cls]} ,Tok_{1} ,Tok_{2} ,......Tok_{n} ,Tok_{[SEP]} } \right\}\) vector \(E = \left\{ {E_{[cls]} ,E_{1} ,E_{2} ,....E_{n} ,E_{[sep]} } \right\}\), corresponding to \(Tok\) Each input token in the Biomedical text MACCROBAT dataset was acquired using ROBERT-WWM. This information is used to extract features for the BNER task.
BNER using IGa-BiHR BNERM
The extract features are fed into IGa-BiHR BNERM, though the model accurately identifies biomedical labeled predictions. This model integrates various components, including Hierarchical Residual Learning, Attention mechanisms, and Bi-GRU. Hyperparameters are fine-tuned using the Improved Green Anaconda optimization algorithm to optimize the performance of the IGa-BiHR BNERM. Figure 3 in the research paper visually represents the IGa-BiHR BNERM model and its architecture.
DL has an issue called the vanishing gradient during their training process. This issue causes a decline in accuracy, even though these models are good at learning and expressing features. The accuracy reaches a saturation point and then rapidly decreases as the depth of the network rises. Overfitting is not the root of this issue, as increasing the number of layers merely raises the training error. To solve this issue, relative learning [40] adds identity mapping to the network structure’s backbone path. This method uses a shortcut to propagate the underlying error, effectively solving the notorious gradient vanishing problem during the deep residual network’s training process. Using residual learning does not necessitate the inclusion of supplementary parameters, hence obviating the need for new parameters or augmenting computing complexity compared to the original network. The deep residual network comprises a solitary residual unit, as depicted in Fig. 3. In the deep residual network, numerous stacked residual units are present. There is an identity mapping between each layer comprising a residual unit: the convolutional, batch normalization (BN), and rectified linear unit (ReLU) layers. The residual unit's fundamental configuration is expressed as follows.
Here, \(p_{x}\) \(p_{x + 1}\) it is denoted as the input and output of the unit and \(R\) as the residual function, batch normalization improves the model's training efficiency after each convolutional layer. To get nonlinear traits, one uses the rectified linear unit layer. Residual networks employ the skip connection approach to develop deep network architectures while mitigating the issue of gradient vanishing. ResNet’s residual connections are one of its primary features, allowing the model to avoid the vanishing gradient problem in very deep networks. This is essential because the model can learn well even with many stacked layers when working with huge, complicated datasets like MACCROBAT. ResNet can also readily adjust to different input sizes and data complexity because to its excellent scalability. ResNet's adaptability in managing deep and broad networks may allow it to capture intricate linkages and structural dependencies in the data, especially considering the MACCROBAT dataset’s potential size. Additionally, by enhancing gradient flow during backpropagation, ResNet's usage of skip connections facilitates the training of deeper models. When dealing with huge biological datasets such as MACCROBAT, it is helpful to have models developed over an extended period of time (epochs) to get superior accuracy while preserving crucial information during the training process. Residual networks employ the skip connection approach to develop deep network architectures while mitigating the issue of gradient vanishing.
To accurately BNER, it is essential to extract features at different scales. Most current Convolutional Neural Networks (CNNs) use more layers to improve multiscale representation. NLP frameworks that include language models, token classification algorithms, and word embeddings are commonly used to accomplish NER tasks. Because ResNet is an image-based model, it does not easily fit into current NLP pipelines. ResNet would need to be heavily customized and hybridized with other models to be used for biomedical NER, which would increase complexity. The approach of Hierarchical Residual Learning [41] enhances the representation of multiscale data by incorporating numerous receptive fields at a higher level of detail. The approach of Hierarchical Residual Learning [41] enhances the representation of multiscale data by incorporating numerous receptive fields at a higher level of detail. The proposed approach involves partitioning input feature maps into smaller subsets and employing convolution operators on each subgroup. A network's receptive fields can be enhanced, and multiscale features can be more efficiently represented thanks to the integration of feature maps, which is made possible by the unique receptive fields of each subgroup of feature maps. Although these networks have fixed receptive fields, alternative convolutional networks use convolutional layers to gather multiscale features. Hierarchical residual learning incorporates the previously established dimensions of depth, breadth, and cardinality by introducing a new dimension called scale. A hierarchical residual unit's feature group count is the scale dimension in IGa-BiHR BNERM. Using three different scales, Fig. 3 shows the hierarchical residual unit. The terms "split operation" and "concatenation operation" are used to denote the process of separating and combining feature maps, respectively.
To input and output, hierarchical features \(p\)\(q\) are built, and the classified residual unit splits the input feature map \(p\) into subsets \(c\), denoted as \(p_{x}\) here \(x \in \left\{ {1,2,...,c} \right\}\). Each sub \(p_{x}\) the group has the identical spatial size as the input \(p\), but only \(1/8\) channels. Except for \(p_{1}\), each subset \(p_{x}\) has a corresponding convolution operator denoted by \(H_{x} (\,)\). The output \(H_{x - 1} \,(\,)\) is added to each feature subset \(p_{x}\) and then fed into \(p_{x} (\,)\) it to obtain hierarchical features. In general, \(q_{x}\) the process can be written as follows:
Here, the hierarchical residual structure, each convolution operation \(H_{x} (\,)\) can obtain information from subsets \(p_{y} (y \le x)\). Thus, feature splitting \(p_{x}\) has a larger receptive field than \(p_{y}\) before, many feature maps were integrated into the hierarchical residual unit to generate a unified output. The concatenation action is performed after the unit. This unit’s split and concatenation strategy helps it process features more efficiently. Using more prominent scale factors, the unit can learn features with a broader range of receptive field sizes. After every convolutional layer, the IGa-BiHR BNERM model uses rectified linear unit activation functions and batch normalization to improve the training of the hierarchical residual unit. The hierarchical residual unit incorporates residual-like connections to effectively capture global and local information at a detailed level.
The IGa-BiHR BNERM is an advanced method used for BNER. It uses an attention mechanism to focus on the essential features useful for classification and ignores irrelevant information. This method is better than using a single attention mechanism because it can consider both the channel and the spatial features of the BNER. To change the channel and spatial properties at different scales, the channel attention module and the spatial attention module work together. A comprehensive explanation of these two attention modules is provided below.
Channel attention module
The channel attention [42] module \(P \in T^{i \times i}\) is calculated from the initial input \(F \in T^{i \times j \times k}\), which \(j \times k\) denotes the spatial size while \(s\) symbolizing the unique features of channels. At first, redesign transpose.\(F^{Z} \in T^{i \times m}\), and perform a matrix multiplication between \(F\) and \(F^{Z}\). To obtain the attention map \(P\), the outcomes are inputted into the softmax layer.
Here, it \(p_{xy}\) symbolizes the influence of the \(x^{th}\) channel on the \(y^{th}\) channel. Furthermore, a matrix multiplication is performed amongst \(P^{Z}\) \(F\) them, resulting in the transformation of their output \(T^{i \times j \times k}\). Ultimately, the spectral attention map \(M \in T^{i \times j \times k}\) is obtained by employing a scale parameter \(\alpha\) to assign weights to the result and execute an element-wise sum operator with the input \(F\).
In this context, the parameter \(\alpha\) is initially assigned a value of 0 and can be further optimized during the training phase. Mapping channel attention \(M\) integrates all of the initial channels into a weighted representation. This map can enhance information channels while suppressing less often used ones selectively. Hence, the capacity to differentiate between channel properties can be enhanced by utilizing this spectral attention module.
Spatial attention module
The first input \(F \in T^{i \times j \times k}\) of the spatial attention [43] module is passed through two distinct convolution layers to produce two new feature maps, \(D\) and \(E\), which \(\left\{ {D,E} \right\} \in T^{i \times j \times k}\). The shape of these two feature maps is changed to \(T_{{}}^{i \times m}\) stands for the quantity of spatial pixels. After that, a softmax layer receives the output of a matrix multiplication between \(D^{Z}\) and \(E\), which creates a spatial attention map \(C \in T^{m \times m}\).
After passing the first input feature, \(F\) through a convolution layer, a new feature map \(W \in T^{i \times j \times k}\) is also created, which is then transformed into \(T^{i \times m}\). After that, \(W\) and \(C^{Z}\) are multiplied as a matrix, and the output is molded into \(T^{i \times j \times k}\). Ultimately, the output is weighted using a scale parameter \(\beta\), and an element-wise sum operator is executed using the early input \(F\) to obtain a spatial attention map \(M \in T^{i \times j \times k}\) as follows.
In this case, the parameter \(\beta\) has a starting value of 0 and can be gradually optimized during the training phase. With a weighted combination of all the original pixels at each point in the spatial attention feature map \(M\), which has a global view and selectively highlights informative spots, the spatial domain feature discriminability will be enhanced. The above two attention mechanisms extract diverse features for an efficient BNER system.
Ultimately, the model incorporates a BiGRU [44] layer, a type of RNN capable of capturing temporal patterns. By leveraging BiGRU, the model can learn the complex temporal relationships between sleep stages, enabling it to better understand the sequential progression of sleep stages and ultimately make more accurate predictions. The following equation BiGRU model is given;
Assumed an input \(A_{i}\) and the preceding state \(R_{i - 1}\) at location \(i\), \(R_{i}\) can be calculated as follows.
Here, \(R_{i} ,o_{i}\) and \(U_{i} \in R^{t}\) are \(t\)-dimensional hidden state, reset, and update gate, respectively; \(Y_{o} ,Y_{u} ,Y_{t} \in R^{de * d}\) and \(U_{o} ,U_{u} ,U \in R^{d \times d}\) are the limitations of the GRU; \(\sigma\) is the sigmoid function, and \(\oplus\) indicates element-wise production. For the \(i^{th}\) word in sequence, in these hidden states \(\vec{R}_{i}\) and \(\mathop{R}\limits^{\leftharpoonup} _{i}\), which are encoded through the forward GRU to represent the previous and after the framework of \(A_{i}\), correspondingly. The concatenation \(R_{i} = [\vec{R}_{i} ;\mathop{R}\limits^{\leftarrow} _{i} ]\) is the output of the Bi-GRU layer at \(i\).
Initially, the input sequence is processed by the Hierarchical residual learning layers, a type of neural network architecture used to create a hierarchical structure of data, combined with the attention mechanism and Bi-GRU to generate a powerful model for BNER. The hierarchical residual learning captures the hierarchical relation between features where multiple levels of abstraction are needed. In the BNER system, sentences are viewed as a hierarchy of words or phrases. The output of the residual layers is passed to the attention layers. The attention mechanism focuses on a specific part of the input sequence to capture long-range dependencies between features. The Bi-GRU layer processes sequential data in both directions to capture both past and future context information. The output layer generates the final output, such as a predicted word in the input sequence.
The contribution of individual components (Hierarchical Residual Learning, Bi-GRU, Attention, and IGAO) on model performance is analyzed by performing recognition tasks using individual components separately. Hierarchical residual learning is used in neural network models to enhance training. The hierarchical layers learn from residual layers and map input into output through identity shortcuts, thus avoiding the vanishing gradient problems and enhancing stability. The attention mechanism improves the performance of the Bi-GRU model by capturing complex relationships between features. Attention mechanisms provide insights into the model’s decision-making process, making it easier to understand how it arrives at its predictions. Bi-GRU identifies named entities that spread across multiple words. Bi-GRU is robust to noise and variations in the input data, thus suitable for real-world biomedical named entity recognition tasks. IGAOs help identify optimal parameters such as learning rate, batch size, and hidden units in the residual and Bi-GRU layers through exploration and exploitation capabilities.
The following is the use of a model called the Improved Green Anaconda [45] Optimization Algorithm for tuning hyperparameters in the BNER system. Green anacondas stand in for the population members in the proposed population-based metaheuristic algorithm.
-
Initialization
Each green anaconda represents a possible mathematical solution to the problem at hand, and its position in the search space determines the values of the decision variables. This indicates that a matrix can be used to model each green anaconda, as demonstrated by Eq. (11).
$$G = \left[ {\begin{array}{*{20}c} {G_{1} } \\ \vdots \\ {G_{i} } \\ \vdots \\ {G_{q} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {g_{1,1} } & \cdots & {g_{1,t} } & \cdots & {g_{1,p} } \\ \vdots & \ddots & \vdots & {\mathinner{\mkern2mu\raise1pt\hbox{.}\mkern2mu \raise4pt\hbox{.}\mkern2mu\raise7pt\hbox{.}\mkern1mu}} & \vdots \\ {g_{i,1} } & \ldots & {g_{i,t} } & \ldots & {g_{i,p} } \\ \vdots & {\mathinner{\mkern2mu\raise1pt\hbox{.}\mkern2mu \raise4pt\hbox{.}\mkern2mu\raise7pt\hbox{.}\mkern1mu}} & \vdots & \ddots & \vdots \\ {g_{q,1} } & \cdots & {g_{q,t} } & \cdots & {g_{q,p} } \\ \end{array} } \right]_{q \times p}$$(11)Equation (12) randomly creates each green anaconda's starting position in the search space at the start of the algorithm's implementation.
$$g_{i,t} = mn_{t} + c_{i,t} .(xn_{t} - mn_{t} ),\,\,\,i = 1,2,....,q,\,\,\,t = 1,2,...,p,$$(12)Here, \(G\) represents the IGAO population matrix, \(G_{i}\) stands for the \(i^{th}\) green anaconda, \(g_{i,t}\) signifies the search space \(t^{th}\) dimension, \(q\) denotes the amount of green anacondas, \(p\) indicates the number of decision variables, \(c_{i,d}\) is an arbitrary amount in the interval [0,1], and \(mn_{t}\) and \(xn_{t}\) represent the bounds for the lower and upper of the \(t^{th}\) decision variable, respectively. The values of the decision variables indicated by each green anaconda can be used to evaluate the objective function. After integrating the values into the objective function into Eq. (13), a vector representation of the set of results is possible.
$$K = \left[ {\begin{array}{*{20}c} {K_{1} } \\ . \\ {K_{i} } \\ {.....} \\ {K_{q} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {K(G_{1)} } \\ {} \\ {K(G_{i} )} \\ {} \\ {K(G_{q} )} \\ \end{array} } \right]_{q \times 1}$$(13)where, \(K\) is denoted as the considered objective function and \(K_{i}\) is symbolized as a function based on the \(i^{th}\) green anaconda. Finding the best member requires comparing the calculated values of the goal function. The best member is updated in accordance with changes made to the objective function values and the placements of green anacondas in each iteration of IGAO.
-
Exploration
The male species' mating season technique of locating and approaching the female species is used to update the position of green anacondas during the first phase of IGAO. By avoiding local optimal zones and allowing for significant movements in the position of green anacondas, this technique demonstrates the exploratory capability of IGAO in global search. Mathematical modeling of this process relies on the assumption that GAO population members with higher objective function values reflect the female species of each green anaconda. The potential female species for every green anaconda is determined using Eq. (14).
$$CFL^{i} = \left\{ {G_{ei} :K_{ei} < K_{i} and\,e_{i} \ne i} \right\},\,\,Where\,i = 1,2,.......,q,\,\,\,and\,e_{i} \in \left\{ {1,2,......,q} \right\}$$(14)For the \(i^{th}\) green anaconda, \(CFL^{i}\) is the set of possible female locations in Eq. (14), where \(K_{i}\) is the \(i^{th}\) green anaconda's row number in the IGAO population matrix and the corresponding element's location amount in the objective function vector with a higher objective function value than the \(i^{th}\) green anaconda. The objective function values are used to model the effect of pheromone concentration on the movement of green anacondas. The possibility of the green anaconda choosing a member increases as its objective function value increases. One way to determine which male species belongs to which IGAO is to utilize Eq. (15), which calculates the probability function of pheromone concentration.
$$FV_{j}^{i} = \frac{{CFF_{j}^{i} - CFF_{\max }^{i} }}{{\sum\nolimits_{x = 1}^{xci} {CFF_{x}^{i} - CFF_{\max }^{i} } }},where\,i = 1,2,......,q\,\,\,and\,j = 1,2,....,x_{i}$$(15)If the \(j^{th}\) female is attentive to the \(i^{th}\) green anaconda through pheromones, then \(FV_{j}^{i}\) indicates that this is possible. \(CFF^{i}\) is the vector containing the collection of objective function values for the \(i^{th}\) green anaconda candidate females, \(CFF_{j}^{i}\) denotes the \(j^{th}\) value, \(CFF_{\max }^{i}\) denotes the supreme value and \(x_{i}\) represents the number of candidate females for the \(i^{th}\) green anaconda.
It is expected in the IGAO design that the green anaconda approaches one of the potential females after choosing it at random. Equation (16) calculates the candidate females’ cumulative probability function, simulating this selection process. Equation (17) determines the selected female species for the green anaconda by comparing the cumulative probability function with a random number having a normal distribution in the interval [0, 1].
$$CFL^{i} = FV_{j}^{i} + V_{j - 1}^{i} ,\,\,\,\,\,\,\,\,\,\,\,wherei = 1,2,....,q,\,\,\,\,\,j = 1,2,.....,p,and\,\,V_{0}^{i} = 0$$(16)$$SF^{i} = CFL_{j}^{i} :V_{j - 1}^{i} < c_{i,j} < V_{j}^{i}$$(17)Within this context, \(V_{j}^{i}\) represents the cumulative probability function of the \(j^{th}\) female candidate for the \(i^{th}\) green anaconda, \(SF^{i}\) stands for the chosen female for the \(i^{th}\) green anaconda, and \(c\) is a random number whose range is [0, 1]. The green anaconda approaches the female species after choosing it, and Eq. (18) is used to determine a new, arbitrary location in the search space. Equation (19) is used to update the location of the green anaconda to the new place if the objective function value is enhanced there; if not, it stays in its former location.
$$g_{i,t}^{z1} = g_{i,t} . + c_{i,t} \left( {SF_{t}^{i} - I_{i,t} .g_{i,t} } \right),\,\,\,i = 1,2,......,q,\,\,\,\,\,\,and\,\,\,t = 1,2,........,p,$$(18)$$G_{i} = \left\{ {\begin{array}{*{20}c} {G_{i}^{z1} ,} & {h_{i}^{z1} < h_{i} } \\ {G_{i} ,} & {else} \\ \end{array} } \right.$$(19)Here, \(G_{i}^{z1}\) denotes the location of the new \(i^{th}\) green anaconda based on the first stage, \(G_{i,t}^{z1}\) denotes the \(t\,^{th}\) dimension, \(h_{i}^{z1}\) denotes the objective function value, \(c_{i,t}\) denotes arbitrary sums with a normal distribution in the series of [0, 1], \(SF_{t}^{i}\) denotes the \(t\,^{th}\) measurement of the certain female for the \(i^{th}\) green anaconda, \(I_{i,t}\) denotes arbitrary quantities from the set {1, 2}, \(q\) denotes the amount of green anacondas and \(p\) denotes the number of decision variables.
-
Exploitation
The second stage of IGAO involves updating the population's position according to the green anaconda's hunting tactics. Every green anaconda is located at random using Eq. (20). Equation (21) mimics the attack and suffocation of prey by the green anaconda and updates the objective function value if it improves in this new position. This approach causes the green anacondas to move somewhat, which shows that GAO is good at local search and exploiting situations.
$$g_{i,t}^{z2} = g_{i,t} . + (1 - 2c_{i,t} )^{{\frac{{xn_{t} - mn_{t} }}{t}}} \,i = 1,2,......,q,\,\,\,\,\,\,and\,\,\,t = 1,2,........,T,$$(20)$$G_{i} = \left\{ {\begin{array}{*{20}c} {G_{i}^{z2} ,} & {h_{i}^{z2} < h_{i} } \\ {G_{i} ,} & {else} \\ \end{array} } \right.$$(21)Here, \(G_{i}^{z2}\) denotes the location of the new \(i^{th}\) green anaconda in the second stage, \(g_{i,t}^{z2}\) denotes the \(t\,^{th}\) dimension, \(h_{i}^{z2}\) denotes the objective function value, \(t\) denotes the repetition stand of the algorithm and \(T\) denotes the supreme amount of algorithm repetitions.
-
Improvement
In Eq. (12), a random number is replaced by an iterative chaotic map. The updated equation can be expressed in the following manner:
$$A^{n} = \left\{ \begin{gathered} 1\,\,\,\,\,\,\,\,\,if\,n = s(m),m \in [1,x]s = ranpermutation(h),h \in [2,\left\lceil {\sin \left( {\frac{c\pi }{{C_{k} }}} \right)(d - 2)} \right\rceil + 1] \hfill \\ 0\,\,\,\,\,\,\,\,\,\,else \hfill \\ \end{gathered} \right.\,\,\,$$(22)Here, \(c = 0.5\). By adhering to these rules, the GOA algorithm effectively guides the optimization process, enabling the IGa-BiHR BNERM model to refine its parameters and recognize the biomedical names. Compared to brute-force methods, GAO examines the search space more targeted and effectively, eliminating the need to assess every potential combination of hyperparameters. Moreover, GAO's capacity to balance exploitation (improving known excellent regions) and exploration (exploring new space areas) is a significant strength. This is crucial for hyperparameter tuning since excessive exploration might spend money on less-than-ideal locations, and excessive exploitation can cause premature convergence. Compared to conventional optimization techniques, GAO are less prone to becoming trapped in local optima. This is especially important when tweaking hyperparameters, as the loss function might have several local minima and be quite irregular. The iterative chaotic map is used to improve the performance of GOA. Chaotic maps can accelerate the convergence rate through dynamic adjustments to the balance between exploration (global search) and exploitation (local search). Chaos aids in wide exploration early in the optimization process and more effectively narrows the search later on. Moreover, the search process is given dynamic adaptation via chaotic maps. GAO can dynamically switch between exploration and exploitation as the optimization process advances because the chaotic behavior adjusts to the present stage of the search. Algorithm 1 represents the Pseudocode of the proposed IGAO algorithm. Figure 4 displays the IGa-BiHR BNERM model to recognize the biomedical name entity.
IGAO is a new metaheuristic optimization algorithm suitable for fine-tuning complex models like the Bi-GRU-based Hierarchical ResNet Model. IGAO balances the exploration and exploitation phases, thus making it suitable for complex optimization tasks. The IGAO avoids getting stuck into local optima to give optimal solutions quickly. The algorithm is sensitive to noises and thus handles optimization problemsections even with noisy data. The IGAO generally has fewer parameters and, thus, is easier to use. The IGAO is more advantageous than other optimization techniques like genetic algorithm (GA), particle swarm optimization (PSO), differential evolution (DE), etc. The IGAO exhibits better exploration capabilities and avoids getting stuck in local optima compared to GA; IGAO is more robust to noise and handles more complex optimization backgrounds than the PSO. IGAO is more efficient and converges faster than DE algorithms. Thus, Green Anaconda Optimization is a promising choice for fine-tuning Bi-GRU-based Hierarchical ResNet models due to its global optimization capabilities, efficiency, robustness, and flexibility. The IGa-BiHR BNERM model for biomedical named entity recognition has been proposed as a highly accurate approach. Using sophisticated methods and algorithms, it recognizes and categorizes different kinds of named things in biomedical texts. This model could improve the efficiency and precision of biomedical information extraction, benefiting the medical research community and the healthcare industry. The IGAO algorithm’s computational complexity depends on parameters like population size, complex evaluation function, and even the number of generation algorithm runs. These parameters lead to complexity in the IGAO algorithm. Several factors impact model performance when fine-tuning a Bi-GRU-based Hierarchical ResNet model using IGAO. The rise in the number of individuals in the IGAO population provides better results but affects the diversity of the search space, computational cost, and convergence speed. The probability of crossover between individuals impacts the exploration and exploitation capabilities of the algorithm. The rise in the number of generations provides better suboptimal results but increases computational cost. Optimizing a model with a large number of parameters is computationally expensive. However, reducing the dimensionality of the problem can improve computational efficiency. Overall, the computational complexity of GAO affects the performance of the Bi-GRU-based Hierarchical ResNet model through Training Time and Solution Quality.
Results and discussion
This section conducted experimental analyses on the proposed and existing models, evaluating their performance using the MACCROBAT dataset for BNER. A variety of metrics are employed, including F1-Score, root mean square error (RSME), receiver operating characteristic curve (ROC), accuracy, mean squared error (MSE), precision, recall, mean absolute error (MAE), specificity, sensitivity, Positive predictive value (PPV), Negative predictive rate (NPR), False positive rate (FPR), and Flase negative rate (FNR) for each model. The accuracy of the proposed model is then compared with established models such as CNN-WE-CRF, Bi-GRU-WE-CRF, LSTM-WE-CRF, Bi-Lstm-CRF, and ResNet-CRF. The following section provides a comprehensive overview and comparison of these metrics across various models. The model can extract context from past and future tokens because of Bi-GRU's ability to analyze input sequences forward and backward. This is essential for comprehending the meaning of entities in connection to adjacent words. Moreover, Bi-GRU may be expanded by stacking layers, enabling deeper networks to recognize complicated entity connections more accurately by capturing more nuanced patterns in the data. The error matrices, such as MSE, MAE, and so on, are used to calculate the error of the classification model. Besides the error matrix, the proposed model is compared with various evaluation matrices in the results section. The hyperparameter values used in this experiment are presented in Table 2.
Dataset description
The model proposed in this study uses MACCROBAT, a specialized dataset tailored for BNER tasks. The dataset can be downloaded from the following link: https://figshare.com/articles/dataset/MACCROBAT2018/9764942. This ZIP archive includes two hundred documents: two hundred annotations in brat standoff style and two hundred plain text source papers with one phrase per line. For example, "15,939,911.txt" refers to the work of Marcu and Donohue titled "A young man with palpitations and Ebstein's Anomaly of the Tricuspid Valve" in PubMed. The text data comes from full-text publications found on PubMed Central; however, it has been edited to keep only the contents of clinical case reports. Annotations were meticulously hand-crafted to guarantee precision.
Performance metrics
Several performance measures have been chosen to evaluate the proposed BNER system. Here are the calculations for these metrics:
Accuracy
The effectiveness of a classification model is often measured by its accuracy, which signifies how well the model performs in correctly classifying instances. High accuracy values are expected to demonstrate the model's efficacy for classification tasks.
Here, \(tp\) is denoted as the true positives, \(tn\) is denoted as the true negatives, \(fn\) is denoted as the false negatives and \(fp\) -denotes the false positives.
Precision
Precision quantifies the total number \(tp\) correctly identified by the classifier, indicating the accuracy of positive classifications. The formulation for precision is as follows:
Recall
Recall evaluates how often a model correctly identifies the true positive from actual positive samples. The formulation for recall is as follows:
F-measure
Classification truth values, the harmonic mean of recall, and precision are used to calculate the F-measure. A larger F-measure is ideal because it relies on classification accuracy. The computation of the F-measure can be expressed as follows:
4.2.5 RMSERMSE is an error measure that indicates the model's overall misclassifications. This measure's value should be as low as possible. As an example, consider the following formulation:
Here,\(y_{i}\)-prediction, \(\hat{y}_{i}\) -true value, \(n\)-overall quantity of data points.
MAE
Users can turn learning problems into optimization problems with the aid of MAE. It additionally acts as a straightforward, quantifiable method to gauge errors in regression problems. The formulation is as follows:
MSE
When applied to a regression problem, it measures the average squared deviation of the actual values from the anticipated values. This metric quantitatively assesses the proximity between predicted and actual values. The formulation is as follows:
MCC
It aims to evaluate or quantify the difference between expected and actual values, similar to the chi-square statistic for a 2 × 2 contingency table. This is how the formula is expressed:
Specificity
Specificity, sometimes called the genuine negative rate, is a performance indicator used to assess a classification model's accuracy in identifying negative cases. This formula is used to compute it:
NPV
The negative predictive value is the probability that a leaf will not have a particular disease following a negative test result.
PPV
After a positive test, the positive predictive value is the possibility that the leaf will be diseased.
FNR
Actual positive values are projected to be negative.
FPR
Actual negative values are projected to be positive.
Performance analysis
By comparing the suggested Biomedical Named Entity Recognition (BNER) system to existing BNER systems using pre-existing evaluation measures, the study determined how much of an advance in performance it represented. The subsequent section presents a comprehensive comparative analysis of the proposed model's performance relative to existing models.
The proposed technique yields a significant performance boost in the BNER stage, resulting in more accurate identification of biomedical names. This improvement was quantified using the MACCROBAT dataset (Fig. 5).
The proposed model is compared to the existing BNER model and the proposed technique achieves 99.11% accuracy, 98.32% precision, 98.25% recall, 98.05% F1-score, and 98.29% specificity. The proposed method achieves efficient performance due to the utilization of BiLSTM to handle long-term dependencies. Compared to the existing technique, the proposed technique can handle a large amount of data for BNER. The limitations of an existing model, such as low accuracy, higher false positive rate, and low true positive, are resolved in the proposed model by using the Residual network model for BNER recognition—Fig. 6 delves into comparing error performance between the proposed and existing models.
The proposed technique's RMSE, MSE, and MAE values attain 0.223, 0.05, and 0.03, respectively. Compared to other related methods, the proposed model achieves a low error rate due to the use of a hyperparameter tuning model and the reduction of overfitting issues. This further solidifies the proposed model's superiority in handling biomedical named entity recognition tasks. Figure 7 (a) and 7(b) show the proposed and existing PPV and NPR BNER algorithms.
Figure 7 (a) & (b) illustrate the evaluation of the proposed and existing BNER models using essential metrics such as PPV and NPR. The proposed model exhibits impressive PPV and NPR scores of 99.45% and 99.01%, respectively, demonstrating its superior ability to recognize biomedical names accurately. The proposed model achieves higher PPV than other models by reducing false positives and improving the context around entities by integrating advanced feature extraction and sequence modeling approaches. Higher NPR is obtained by the proposed technique due to its improved recognition of non-entity patterns and reduced false negatives. Figure 8(a) &(b) show the proposed and existing FNR and FPR BNER algorithms.
Figure 8(a) & (b) depict evaluating the proposed and existing BENR models using FPR and FNR metrics. The proposed model demonstrates minimal FPR and FNR scores, which are significantly lower than 0.32%, highlighting its exceptional ability to recognize biomedical entities accurately. The proposed model can differentiate between true biomedical entities and noise due to the combination of Bi-GRU, ResNet, and ROBERT-WWM embeddings, leading to in a lower FPR. Using its deep hierarchical structure and effective weight selection by IGAO, the proposed model reduces FNR by improving its capacity to capture and recognize entities. A comparison of the time complexity of the suggested and current BNER models is shown in Fig. 9.
The proposed model utilized a lower execution time of 100 s than another existing model. The proposed model handles inefficiency in each region to minimize execution time. The model improves previous ones in terms of speed by integrating the computing power of Bi-GRU, the efficiency of the Hierarchical ResNet, and the optimized feature extraction of ROBERT-WWM. Compared to models with slower optimization techniques, the IGAO technique significantly reduces the total execution time by boosting convergence during training. Figure 10 compares the AUC curve between proposed and existing methods.
The AUC curve defines the classification between true positive rate (TPR) and false positive rate (FPR) for every threshold. The proposed classifier provides a higher AUC value of 0.995 than the other classifiers considered for comparison. The higher TPR and lower FPR for each threshold determine the efficient performance of the proposed model by comparing it to other techniques. The proposed technique training and testing of accuracy and loss curve are depicted in Fig. 11 (a) & (b).
The training and testing of the proposed technique are performed based on an 80:20 ratio; here, 80% of the data is used for training, and the remaining 20% is used to test the proposed model. The training and testing of the proposed model are evaluated for 300 epochs, which attain 0.92 and 0.9672 accuracy due to the iteration of the model. Figure 11 (b) illustrates the training and testing loss for the existing and proposed models. Similarly, loss of training and testing, which attain 0.89 and 0.156 for 300 epochs. As a result, the proposed strategy outperforms current models in terms of accuracy since it goes through multiple training cycles to decrease loss. Table 3 presents an overall comparison of the proposed BENR model and existing BENR model performance.
Table 3 utilizes the proposed and existing models' performance metrics, encompassing accuracy, precision, recall, f1-score, specificity, RMSE, MSE, MAE, PPV, NPR, FPR, and FNR. The study consistently shows that the suggested model outperforms the present model in all these metrics. The superiority above implies that the proposed model effectively addresses the shortcomings identified in the current model, leading to enhanced performance in the BNER. The AUROC and AUPR of the proposed technique are obtained to be 0.995 and 98.65%, which is higher than other existing models.
Ablation study
The performance of a proposed technique is analyzed based on the contribution of each module described in Table 4. Module 1 includes the word embedding technique with the proposed translation technique (i.e., performing the proposed technique without using the pre-processing process). Module 2 performs the proposed technique without using the hyperparameter tuning techniques (i.e., performs with pre-processing, word embedding, and BNER model without optimization). Module 3 performs every method in the proposed model.
Comparative analysis
The comparative analysis of convergence analysis is analyzed using various optimization models and proposed techniques to determine its accuracy and loss, as clearly described in Table 5. The proposed IGWO algorithm uses different optimization techniques, such as the genetic algorithm (GA), coati optimization algorithm (COA), whale optimization algorithm (WOA), and particle swarm optimization algorithm (PSO).
The performance of the proposed BNER technique is analyzed using various techniques using a different dataset, described in Table 6. NCBI disease corpus from [46], a collection of 793 PubMed abstracts with extensive mention and concept annotations, intends to be used as a biomedical natural language processing research tool. CDR data from [47] developed a corpus named BC5CDR, which includes 1500 PubMed articles with 4409 annotated chemicals, 5818 diseases, and 3116 chemical-disease interactions. The biomedical relation extraction dataset (BioRED) form [48] was analyzed using existing and proposed techniques.
The performance of the proposed technique is compared with various existing models for three different datasets illustrated in Table 6. The proposed method achieves higher accuracy for three datasets than the existing models.
Figure 12 shows the accuracy evaluated for individual components in the proposed model to show the effectiveness of the proposed model. Only using the Bi-GRU model for BNER achieved an accuracy of only 90%, using the attention mechanism with Bi-GRU achieved an accuracy of 93%, using residual learning with Bi-GRU attained an accuracy of 95%, and the proposed method achieved the highest accuracy of 99.11%. This shows the contribution of individual components.
Discussion
The discussion emphasizes the superiority of the proposed technique over existing related work presented in Table 7. The proposed model demonstrates enhanced performance due to several factors, including streamlined hyperparameter tuning, optimized losses, and reduced time consumption. These efficiencies are essential in creating a more effective and accurate BNER system.
Moreover, the proposed model excels compared to its counterparts, as shown in the comparative analysis in Table 4. Its ability to manage complex biomedical named entities, employ robust feature extraction techniques, and utilize deep learning algorithms significantly contributes to its success. These features enable the model to recognize biomedical entities even in challenging contexts accurately. The proposed Bi-GRU-based Hierarchical ResNet models handle large datasets effectively due to their efficient architecture and deep learning techniques. The model is parallelized to distribute the computational load across multiple layers, thus improving scalability for very large datasets. The model is adapted to different types of biomedical text data by incorporating domain-specific features or modifying the architecture. Pre-trained Bi-GRU-based Hierarchical ResNet models can be fine-tuned on new datasets, reducing the need for extensive training. Techniques like data augmentation are used to increase the size and diversity of the training data, improving the model's generalization ability. Thus, the Bi-GRU-Based Hierarchical ResNet Model is a scalable and adaptable architecture that can be applied to various biomedical text data. Its ability to capture hierarchical relationships and long-range dependencies makes it well-suited for tasks such as named entity recognition in the biomedical domain.
Computational complexity analysis
The proposed study additionally analyses the computational complexity of the suggested work to expand the examination of the results of the proposed IGa-BiHR BNERM model. Additionally, a comparison with other current approaches is conducted to demonstrate the efficacy of the proposed model. In this case, the computational complexity of the proposed deep learning model was calculated using Big O notation. When assessing difficulties, the Big O notation is frequently utilized. The computational complexity of the proposed IGa-BiHR BNERM model and other current models is displayed in Table 8 below.
The computational complexity analysis shows that the proposed model yielded lower complexity than alternative methods. Thus, it demonstrates that the suggested methodology is more reliable for BNER on the MACCROBAT dataset.
-
Friedman test
When the dependent variable is computed in its definite form, the Friedman test is run to see if any variations have occurred across the classes. The Friedman chi-square test examines several factors, including degrees of freedom (DOF), statistics, uncorrected chi-squared test p-value (p), and Friedman chi-square statistic (Q). The results of the Friedman test analysis are displayed in Table 9.
Table 9 Analysis of Friedman test -
Wilcoxon test
Additionally, the rank sum or signed rank test is shown for the Wilcoxon test. It is a nonparametric statistical test that may be used to compare two related groups. This test determines if one or more pairing sets are statistically different. The proposed study assesses the Wilcoxon test mean, p-value, Z value, standard deviation, and smaller rank totals (W). The analysis of the Wilcoxon test is shown in Table 10.
Table 10 Analysis of Friedman test
Conclusion
The paper proposes an innovative and efficient model for BNER called IGa-BiHR BNERM. The proposed model accurately identifies labeled biomedical predictions, making it a valuable tool in biomedical research. The IGa-BiHR BNERM model comprises several steps, including pre-processing, Segmentation, feature extraction, and classification. These stages aim to guarantee that biomedical items from text data may be appropriately identified and classified by the model. Using the MACCROBAT dataset, the proposed model's performance is assessed and contrasted with current approaches. The model demonstrates impressive results, with a training loss of 0.46, a testing loss of 0.39, a training accuracy of 0.9567, and a testing accuracy of 0.9423. Additionally, the proposed approach is assessed with the MACCROBAT dataset; the results show that the model has the following rates of performance: 99.11% accuracy, 98.32% precision, 98.25% recall, 98.05% F1-score, 98.15% specificity, and 98.29% sensitivity. The model also demonstrates an RMSE rate of 0.223%, an MSE rate of 0.05%, and an MAE rate of 0.03%. While Bi-GRU-based Hierarchical ResNet models in biomedical named entity recognition (NER) result in certain limitations. Such as computational expenses when using larger datasets, Biomedical datasets are limited in size, which impacts the model performance; bi-gru captures range dependencies; however, capturing very long-range dependencies is difficult. Thus, future research should focus on the following things: discovering more efficient architectures to reduce computational complexity while maintaining performance, developing data augmentation techniques to augment biomedical text data, and increasing its range. Explore alternative architectures like transformer models for capturing long-range dependencies. By addressing these limitations, future research can continue to improve the performance and applicability of Bi-GRU-based Hierarchical ResNet models for biomedical named entity recognition. The suggested strategy performs better at correctly identifying biomedical names than the current techniques. The outcomes confirm the usefulness of the proposed approach, and future research entails examining more datasets to verify its effectiveness even more. In conclusion, the IGa-BiHR BNERM model is an innovative and effective tool for biomedical named entity recognition with immense potential in biomedical research. In the future, a more recent dataset will be considered to improve the proposed model. Furthermore, this research intends to use a more advanced model for pre-processing and segmentation to increase the performance even further. It also likes to include more robust methods in classifications.
Availability of data and materials
The datasets used and analyzed during the current study are available at https://www.kaggle.com/datasets/okolojeremiah/maccrobat.
References
Govindarajan S, et al. (2023) RETRACTED: an optimization based feature extraction and machine learning techniques for named entity identification
Sung M, et al. BERN2: an advanced neural biomedical named entity recognition and normalization tool. Bioinformatics. 2022;38(20):4837–9.
Kaswan KS, et al (2021) "AI-based natural language processing for the generation of meaningful information electronic health record (EHR) data." Advanced AI techniques and applications in bioinformatics. CRC Press, 41-86
Wang DQ, et al. Accelerating the integration of ChatGPT and other large-scale AI models into biomedical research and healthcare. MedComm–Future Med. 2023;2(2):43.
Ahmad PN, Shah AM, Lee K. A review on electronic health record text-mining for biomedical name entity recognition in healthcare domain. InHealthcare 2023 Apr 28 (Vol. 11, No. 9, p. 1268). MDPI
Li S. Using enhanced knowledge graph embedding and graph summarisation for question answering over knowledge graphs. Diss: Murdoch University; 2023.
Huang MS, et al. Biomedical named entity recognition and linking datasets: survey and our recent development. Brief Bioinform. 2020;21(6):2219–38. https://doi.org/10.1093/bib/bbaa054.
Chen X, et al. Improving the named entity recognition of Chinese electronic medical records by combining domain dictionary and rules. Int J Environ Res Pub Health. 2020;17(8):2687. https://doi.org/10.3390/ijerph17082687.
Ramachandran R, Arutchelvan K. Named entity recognition on bio-medical literature documents using hybrid based approach. J Ambient Int Human Comput. 2021. https://doi.org/10.1007/s12652-021-03078-z.
Perera N, Dehmer M, Emmert-Streib F. Named entity recognition and relation detection for biomedical information extraction. Front Cell Developm Biology. 2020;8:673.
Cariello MC, Lenci A, Mitkov RA (2021) comparison between named entity recognition models in the biomedical domain. In: Proceedings of the Translation and Interpreting Technology Online Conference (pp. 76-84).
Wei J, et al. Research on named entity recognition of adverse drug reactions based on NLP and deep learning. Front Pharmacol. 2023. https://doi.org/10.3389/fphar.2023.1121796.
Pavlova V, Makhlouf M (2023) BIOptimus: Pre-training an optimal biomedical language model with curriculum learning for named entity recognition. InThe 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks (pp. 337-349)
Hao Z et al (2024) Biomedical named entity recognition based on reinforcement learning label correction. In: Fourth International Conference on Computer Vision and Data Mining (ICCVDM 2023) (Vol. 13063, pp. 254-260).SPIE
Cui X, et al. RoBGP: a chinese nested biomedical named entity recognition model based on RoBERTa and global pointer. Comput Mater Continua. 2024;78(3):3603.
Deng Q, et al. CLSTM-SNP: convolutional neural network to enhance spiking neural P systems for named entity recognition based on long short-term memory network. Neural Process Lett. 2024. https://doi.org/10.1007/s11063-024-11576-2.
Cao Lulu, et al. Online biomedical named entities recognition by data and knowledge-driven model. Artif Intell Med. 2024;150:102813. https://doi.org/10.1016/j.artmed.2024.102813.
Li Q (2023) Resource Description Framework (RDF) Modeling of Named Entity Co-occurrences Derived from Biomedical Literature in the PubChemRDF. In: SWAT4HCLS
Chen S, et al. Biomedical entity normalization using encoder regularization and dynamic ranking mechanism. In: Liu Fei, Duan Nan, Qingting Xu, Hong Yu, editors., et al., Natural Language Processing and Chinese Computing: 12th National CCF Conference, NLPCC 2023, Foshan, China, October 12–15, 2023, Proceedings, Part I. Cham: Springer Nature Switzerland; 2023. p. 498–510. https://doi.org/10.1007/978-3-031-44693-1_39.
Tang Z, Wan B, Yang Li. Word-character graph convolution network for chinese named entity recognition. IEEE/ACM Transact Audio, Speech, Language Process. 2020;28:1520–32.
Dash Adyasha, et al. A clinical named entity recognition model using pretrained word embedding and deep neural networks. Decision Anal J. 2024;10:100426. https://doi.org/10.1016/j.dajour.2024.100426.
Fabregat H, et al. Negation-based transfer learning for improving biomedical Named Entity Recognition and Relation Extraction. J Biomed Inform. 2023;138:104279. https://doi.org/10.1016/j.jbi.2022.104279.
Bhattacharya M, et al. Improving biomedical named entity recognition through transfer learning and asymmetric tri-training. Proced Comput Sci. 2023;218:2723–33. https://doi.org/10.1016/j.procs.2023.01.244.
Asghari M, Sierra-Sosa D, Elmaghraby AS. BINER: A low-cost biomedical named entity recognition. Inf Sci. 2022;602:184–200.
Moscato V, et al. Taughtnet: Learning multi-task biomedical named entity recognition from single-task teachers. IEEE J Biomed Health Inform. 2023;27(5):2512–23. https://doi.org/10.1109/JBHI.2023.3244044.
Guan Z, Zhou X. A prefix and attention map discrimination fusion guided attention for biomedical named entity recognition. BMC Bioinform. 2023;24(1):42.
Chai Z, et al. Hierarchical shared transfer learning for biomedical named entity recognition. BMC Bioinform. 2022. https://doi.org/10.1186/s12859-021-04551-4.
Sun Wei, et al. Weak Supervision and Clustering-Based Sample Selection for Clinical Named Entity Recognition. In: De Francisci G, Morales CP, Ruchansky N, Kourtellis N, Baralis E, Bonchi F, editors., et al., Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track: European Conference, ECML PKDD 2023, Turin, Italy, September 18–22, 2023, Proceedings, Part VI. Cham: Springer Nature Switzerland; 2023. p. 444–59. https://doi.org/10.1007/978-3-031-43427-3_27.
Jeon SH, Sungzoon C. Edge weight updating neural network for named entity normalization. Neural Process Lett. 2023;55(5):5597–618. https://doi.org/10.1007/s11063-022-11102-2.
Košprdić M, et al. From zero to hero: harnessing transformers for biomedical named entity recognition in zero-and few-shot contexts. Artificial Intell Med. 2024;1(156):102970.
Alamro H, et al. BioBBC: a multi-feature model that enhances the detection of biomedical entities. Sci Rep. 2024;14(1):7697.
Tian Y, et al. Improving biomedical named entity recognition with syntactic information. BMC Bioinform. 2020;21:1–17.
Guirguis, M (2023) Named Entity Recognition from Biomedical Text
Nassimi S (2023) Entity Linking for the Biomedical Domain (Master's thesis, Hannover: Gottfried Wilhelm Leibniz Universität).
Tsui, Brian Y., et al (2018) "Creating a scalable deep learning based Named Entity Recognition Model for biomedical textual data by repurposing BioSample free-text annotations." Preprint at biorxiv. org/content/biorxiv/early/2018/09/12/414136. full. pdf
Atkinson J, Bull V. A multi-strategy approach to biological named entity recognition. Expert Syst Appl. 2012;39(17):12968–74.
Gridach M. Character-level neural network for biomedical named entity recognition. J Biomed Inform. 2017;70:85–91.
Alonso Casero, Á (2021) Named entity recognition and normalization in biomedical literature: a practical case in SARS-CoV-2 literature. Diss. ETSI_Informatica
Chen P, et al. Named entity recognition of Chinese electronic medical records based on a hybrid neural network and medical MC-BERT. BMC Med Inform Decision Making. 2022;22(1):315.
Patel K et al (2023) RJB-Net: Residual Deep Learning with Joint Bilateral Denoising Network for Remote Sensing Image Fusion. In2023 3rd Asian Conference on Innovation in Technology (ASIANCON). IEEE
Cheng Y, et al. A novel hierarchical structural pruning-multiscale feature fusion residual network for intelligent fault diagnosis. Mech Mach Theory. 2023;184:05292.
Yuan Min, et al. MCAFNet: a multiscale channel attention fusion network for semantic segmentation of remote sensing images. Remote Sens. 2023;15(2):361.
Xu Q, et al. Multiscale convolutional neural network based on channel space attention for gearbox compound fault diagnosis. Sensors. 2023;23(8):3827.
Dai Z, Wang X, Ni P, Li Y, Li G, Bai X (2019) Named entity recognition using BERT BiLSTM CRF for Chinese electronic health records. In2019 12th international congress on image and signal processing, biomedical engineering and informatics (cisp-bmei). IEEE
Dehghani M, Trojovský P, Malik OP. Green anaconda optimization: a new bio-inspired metaheuristic algorithm for solving optimization problems. Biomimetics. 2023;8(1):121.
Doğan RI, Leaman R, Lu Z. NCBI disease corpus: a resource for disease name recognition and concept normalization. J Biomed Inform. 2014;47:1–10.
Li J, Sun Y, Johnson RJ, Sciaky D, Wei CH, Leaman R, Davis AP, Mattingly CJ, Wiegers TC, Lu Z. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database. 2016;2016:baw068. https://doi.org/10.1093/database/baw068.
Luo L, Lai PT, Wei CH, Arighi CN, Lu Z. BioRED: a rich biomedical relation extraction dataset. Brief Bioinform. 2022. https://doi.org/10.1093/bib/bbac282.
Acknowledgements
The authors are grateful to the anonymous reviewers for their valuable comments.
Funding
This research did not receive any specific grant from a funding agency in the public, commercial, or not for the public sector.
Author information
Authors and Affiliations
Contributions
All authors contributed equally to this work. RCB., RKD, and YC gathered the data, performed the analysis, and wrote the manuscript. PS and USR contributed to conceptualizing the explanation of the method used and reviewing and editing the manuscript. All authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Bhushan, R.C., Donthi, R.K., Chilukuri, Y. et al. Biomedical named entity recognition using improved green anaconda-assisted Bi-GRU-based hierarchical ResNet model. BMC Bioinformatics 26, 34 (2025). https://doi.org/10.1186/s12859-024-06008-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859-024-06008-w












