In this section, we analyzed the results from three different perspectives: knowledge structure of bioinformatics, evolution of knowledge structure, and evolution of computing-related topics.
Knowledge structure of bioinformatics
As shown in Fig. 1b, the popularity of topic 1(mathematical models, theoretical biology) and topic 2(DNA replication, transcription and expression), which are the initial main topics, have gradually decreased since 1980; meanwhile, topic 4(molecular biology) and topic 5(Molecular dynamics) remain relatively stable with limited change. Topic 6(genomics and proteomics) and Topic 8(data-related) have developed rapidly after 2000 and became the main topics of bioinformatics. Topic 3(system biology) and Topic 9(proteomics) were developed around 2000; although they did not receive too much attention, they have maintained a small but stable growth since 2000. The mode of Topic 7(HGP: The Human Genome Project) is somewhat special. This topic has been growing steadily since it appeared in 1980, reaching a peak in about 1995, but began to decrease greatly after 2000. This phenomenon can be well explained by combining the start and end time of the human genome project (1990-2003).
Although themeriver can describe the changes in research hotspots, it cannot sketch the relationship between topics and thus cannot accomplish our goal 2. Thus, we need a science to map the whole knowledge structure of bioinformatics. By analyzing Fig. 3, we obtain the following conclusions about the relationship between bioinformatics’ topics. (1) Topics 2, 4 and 6 are bridges connecting other topics. And they all deal with DNA indicating that DNA research is the ’skeleton’ of bioinformatics. (2) Among the topics distributed around the structure diagram, Topic 5 is related to Topic 2 and 4, while Topic 7 is mainly related to Topic 2 and 6. Topic 1 is relatively independent from other topics. (3) The three topics 3, 8 and 9 related to computing methods are all located at the top of the structural diagram. Topic 8 is at the center, which means that data are the core of computing-related topics of bioinformatics. (4) The closest non-computational topic to Topic 8 is Topic 6; hence, topic discussed in Topic 6 may use more methods of data science than other topics.
Evolution of knowledge structure
With the continuous development of bioinformatics, the relationship between topics is also changing. From 1960 to 2019, the change in the number of papers on different topics is shown in Fig. 1a. We selected several time periods when the number of papers varied greatly.
At the beginning of the development of bioinformatics, the relationship between the main topics 1, 2, 4 and 5 was relatively weak. However, from 1980 to 2000, the 2,4,5 topics began to merge to a certain extent. We selected the fusion part interactively and examined at the high-frequency MESH phrases discussed in the merged papers. The top six phrases are: escherichia coli, amino acids, binding site, plasma membrane, binding sites, and signal transduction, which are all related to biological macromolecules (Fig. 4).
From 2000 to 2005, the number of substructures of bioinformatics increased, but the upper and lower sides are relatively independent. Bridge topic 6(genomics and proteomics) has not become the main hot topic at that time, and has not produced more contact with Topic 3 and 8, but produced more fusion with topic 2, indicating that omics research began with DNA. The period from 2005 to 2010 is a period of vigorous development of bioinformatics. Topic 6, as a bridge topic and a hot topic, began to connect various parts (Fig. 5).
After 2010, the hotspot began to move upward and the distance between Topic 6 and Topics 8,9 was shortened in this period. This finding indicates that omics research increasingly relied more on the support of computational methods (Fig. 6).
Evolution of computing-related topics
Understanding the evolution of topics related to computation is crucial to understand the whole bioinformatics structure due to the special interdisciplinary nature of bioinformatics. We tried to answer two questions through visual analysis: (1) What is the proportion of computing-related topics in bioinformatics? Is this ratio stable? (2) What changes have taken place in the biological topics and computational methods involved in the evolution of these topics?
Goal 1 can be achieved by simply applying themeriver. We will mainly introduce the solution to Goal 2.
As shown in Fig. 7, themeriver shows the change of the proportion of four computing-related topics and the word co-occurrence network elaborates the change of topics in each period. By default, the word co-occurrence network of all topics is displayed. Click the category label to hide other categories and explore the selected topic in depth. MESH phrases with high frequency will be displayed by default. Moving the mouse over the circle will display its corresponding MESH phrase. The terms related to computation are represented by stroked circles and marked with orange.
The proportion of papers that related to computation evidently changes. The proportion was close to 0.3 in the early stage of the development of bioinformatics, began to decline in the early 1980s, and only accounted for about 0.1 of the total papers in the early 1990s. The proportion then began to rise again in the middle and late 1990s, surpassing the initial proportion and occupying nearly half of the bioinformatics topics after 2005. During such changes, an obvious pattern is observed: although the proportion of computing-related topics is only a little higher than the original, the internal structure of those topics has undergone earth-shaking changes.
Through further analysis of the co-occurrence word network, we conclude the following. (1) Prior to 1980, Topic 1 occupied an absolute dominant position. The computational methods involved were mainly mathematical models, and computer simulation appeared after 1970. The main biological topics discussed during this period include: active transport, cell, etc. (2) The period of 1990-2000 was a turning point for major changes in the internal structure of computing-related topics. The overall proportion began to rise, but the popularity of Topic 1 remained the same; and Topic 8 gradually replaced Topic 1 to occupy the dominant position. No new computational method was found in Topic 8, but the discussion on sequence increased significantly. (3)The period of 2000 - 2010 is an era of data and genes. The popularity of topics 3 and 8 continues to rise, but the content of the discussion has undergone great changes. The computational topics are all centered on data, data analysis, and the popularity of gene expression is the highest in biological topics while discussions on sequences began to decrease. Topic 9 has also seen a significant increase, and the computational methods involved are mainly related to data; however, biological topics focus on magnetic resonance and images. (4) In 2010 - 2019, the overall structure has not changed, and the proportion of each topic has remained relatively stable, however, the contents they discussed are different from those before. During this period, more computing methods are involved, such as machine learning, data mining, text mining, cloud computing and so on. Thus the interaction between computing methods and bioinformatics becomes increasingly closer and closer.