Exploring the dynamics and interplay of human papillomavirus and cervical tumorigenesis by integrating biological data into a mathematical model

Background Cervical cancer is the fourth most common tumor in women worldwide, mostly resulting from high-risk human papillomavirus (HR-HPV) with persistent infection. Results The present discoveries are comprised of the following: (i) A total of 16.64% of the individuals were positive for HR-HPV infection, with 13.04% having a single HR-HPV type and 3.60% having multiple HR-HPV types. (ii) Cluster analysis showed that the infection rate trends of HPV31 and HPV33 in all infections as well as HPV33 and HPV35 in single infections in precancerous stages were very similar. (iii) The single/multiple infection proportions of HR-HPV demonstrated a trend that the multiple infections rates of HR-HPV increased as the disease developed. Conclusions The HR-HPV prevalence in outpatients was 16.64%, and the predominant HR-HPV types in the study were HPV52, HPV58 and HPV16. HR-HPV subtypes with common biological properties had similar infection rate trends in precancerous stages. Especially, as the disease development of precancer evolved, defense against HPV infection broke, meanwhile, the potential of more HPV infection increased, which resulted in increase of multiple infections of HPV.


Background
Cervical cancer is the fourth most common tumor in women worldwide [1], mostly resulting from human papillomavirus (HPV). The current report [2] estimates that cancer cases related to HPV infection account for 4.5% of the total number of new cancers worldwide, of which cervical cancer accounts for 83% of these HPV infection-related cancers, posing a great threat to women's health, especially in developing countries. Thirteen HPV genotypes denoted as high-risk HPV (HR-HPV) are essential factors for cervical tumorigenesis [2], so the dynamics of the HPV genotypes described here reflect the relationship between an individual HPV genotype and the development of cervical cancer, i.e., precancerous stages. Since it takes approximately 20 years for the carcinogenesis of HR-HPV with persistent infection [3], cervical cancer is the only malignant tumor that can be prevented and treated early through HPV-type screening, which plays a significant role in improving the prognosis of patients [3][4][5]. As China has become one of the countries with a high incidence of cervical cancer and HPV infection is widespread in females [6], it is very important to investigate HPV-type infections for Chinese population.
Generally, both the liquid-based cytology test (LCT) and the ThinPrep cytology test (TCT) are used to screen cervical cancer, but they do not effectively detect specific HR-HPV genotypes in infections. It is noted that we usually use the commercial names of LCT and TCT to represent the cytology tests, since they are from different manufacturers. However, HPV genotyping can easily detect the genotypes better than the LCT and TCT methods, and distinguish the difference between single infection (Denoted by Table 1) and multiple infections (Denoted by Table 1), once HPV infection occurs. Since the handicap of using cytology tests or HPV genotyping alone, we always employ the combination of cytology tests and HPV genotyping based on the significantly high sensitivity and lower false-negative rates achieved [7][8][9][10][11]. For example, Catteau et al. [12] calculated the prevalence rates of 13 HR-HPV-type infections in different precancerous stages among Belgian women, and Ying et al. [13] mainly employed the prevalence rates of different HR-HPV types in all precancerous stages to describe the distribution of the major infectious types in Beijing China after collecting the related data by both LCT/TCT and HR-HPV genotyping methods. Nevertheless, reported by the previous studies [12,13], the prevalence of each HR-HPV type are different in the same precancerous stage. And for the same kind of HR-HPV type, the prevalence is not consistent in different precancerous stages [13]. Therefore, it is inaccurate to employ the total precancerous stage data to describe the relationship between HR-HPV types and precancerous stages.
Furthermore, it is still unclear whether multiple infections are more risky than single infection of HR-HPV [13][14][15]. For example, Chaturvedi et al. [14] investigated the coinfection patterns of 25 HPV genotypes and computed the odds ratios for each genotype with 24 other genotypes. The results showed that the disease risk of multiple infections is close to the total estimated risk of individual infections. However, both Ying et al. [13] and Dickson et al. [15] indicated that women with multiple infections have a significantly higher risk of cervical disease than women with single infections. Since previous research collect data with different HPV types, patient ages and other related factors [13][14][15], they result in inconsistent conclusions for the risk of cervical lesions caused by multiple infections [16,17]. Moreover, most previous studies [13][14][15]18] employed cohort analysis for cervical cancer without considering the proportion of single and multiple infections in different precancerous stages for different HPV types as well as the impact of HR-HPV genotypes and precancerous stages on the infections.
Regarding to the previous shortcomings, we develop three innovations to overcome them: (1) we collected the clinical data for 13 HR-HPV types in 4 precancerous stages by integrating TCT into HR-HPV genotype detection; (2) we performed cluster analysis for 13 genotypes in 4 precancerous stages; (3) we investigated the proportion of single/multiple infections at 4 precancerous stages for each HR-HPV genotype, and explore the impact of HR-HPV genotypes and precancerous stages on the infections by Poisson regression [19].
A total of 16,693 patients were studied from July 2016 to July 2017 in the outpatient department of the General Hospital of the People's Liberation Army. We first statistically analyzed infection data for 13 HR-HPV types in 4 precancerous stages. The results showed that the overall prevalence rate of the 13 HR-HPV types (16.64%) is less than the previous, but HPV52, HPV58 and HPV16 still have the greatest impact on the health of women in China. Next, we found that biological homology results in similar infection rate trends in precancerous stages by our k-means [20] cluster analysis. Finally, we not only found that the multiple infection proportion of HR-HPV increased as the disease developed, but also demonstrated that only the precancerous stages were statistically significant [19] by considering the impact of both HR-HPV genotypes and precancerous stages on infection results. Finally, we discuss the limitations and future study. The dataset for all infections numbers of the 13 HR-HPV types in the four precancerous stages (see Additional file 1)

Single_Infection_Set
The dataset for single infection numbers of the 13 HR-HPV types in the four precancerous stages (see Additional file 1)

Multiple_Infection_Set
The dataset for multiple infections numbers of the 13 HR-HPV types in the four precancerous stages (see Additional file 1)

Pathological examination
Cervical cells were detected by TCT and the results of cytological pathology were diagnosed by senior physicians according to the Bethesda System of cervical cytology [21]. The precancerous stages are classified as follows [21]: (1)

Detection of HPV genotypes
Thirteen HR-HPV genotypes (HPV16, HPV18, HPV31, HPV33, HPV35, HPV39, HPV45, HPV51, HPV52, HPV56, HPV58, HPV59 and HPV68) were detected with the real-time polymerase chain reaction kit for high-risk HPV genotypes from Shanghai ZJ biotechnology Company (http://www.liferiver.com.cn/productinfor/p15_62. html). The specific steps are strictly in accordance with the instructions of the kit. If the viral load of HPV-DNA was greater than or equal to 10 4 copies/ml, it was positive, otherwise it was negative.

Research subjects
The data in this study are from 16,693 patients who all underwent biopsies in the outpatient department of the General Hospital of the People's Liberation Army from July 2016 to July 2017. The cervical Fig. 1  Workflow of the study Figure 1 and Table 1 describe the workflow of the study and the nomenclature, respectively. The workflow consists of Data preprocessing and Data analysis steps. Data preprocessing step process the raw datasets for all infections (denoted by Table 1), single infection and multiple infections of the 13 HR-HPV types in the four precancerous stages by using a pie chart (left panel of data preprocessing component in Fig. 1) to describe the classical statistical analysis results. Data analysis (Fig. 1) comprises cluster and regression analysis. Although the 13 HR-HPV genotypes are biologically independent, some of them may have common biological properties, resulting in similarity in the phenotypes (i.e., similarity in the number of infected people). Therefore, we use cluster analysis (left panel of data analysis in Fig. 1) to investigate the similarity of infection for 13 different HR-HPV types in precancerous stages. It is well known that the cluster analysis [22] consists of hierarchical clustering and nonhierarchical clustering. Since the aim of the study is to investigate which HR-HPV types have similar infections in the precancerous stages, we consider that the classical K-means [20] is suitable for this study. Here, K-means uses Euclidean distance (Eq. 1) [23] to measure the distance between two observed values: dij represents the distance between observations of ith and jth HR-HPV genotypes. xi and xj represent the number of infected ith and jth HR-HPV genotypes, respectively.
Currently, Poisson regression is widely used for clinical data analysis [24]. For instance, Rochon et al. [25] used Poisson regression analysis to study the number of rejection reactions in patients after transplantation within a certain time, and Vonesh et al. [26] analyzed the potential risk factors related to the number of peritoneal bacterial infections. Here, we used Poisson regression (Eq. 2) to investigate the impact of HR-HPV genotypes and precancerous stages on infection [23].
Here, we set the infection number (λ) as the outcome variable and log e (λ) as the connection function in R   [27]. X 1 and X 2 respectively represent the HR-HPV genotypes and precancerous stages as the prediction variables. β 0 is the intercept. β 1 and β 2 are coefficients for prediction variables.

Results
Comparison of the prevalence rates Figure 2 shows the comparison of the prevalence rates of the 13 HR-HPV types in all infections, single infection and multiple infections for the previous [13] and current study. Figure 3 shows the dynamics of infections for each HR-HPV type at different precancerous stages.

Cluster analysis of all infections and single infections
For all infections, Fig. 4 demonstrates that the infection trends of HPV31 and HPV33 are similar in the four precancerous stages. Regarding single infections, Fig. 5 shows that not only HPV39 and HPV51 but also HPV33 and HPV35 have similar infection trends in the 4 precancerous stages.
The impact of HR-HPV genotypes and precancerous stages on infection Figure 6 describes the proportion of single and multiple infections for each HR-HPV genotype in different precancerous stages. Figure 6a demonstrates that the proportion of multiple infections for 12 HR-HPV genotypes is less than their single infection except for HPV68 under Normal stage. Figure 6b demonstrates that the proportion of multiple infections for 6 HR-HPV genotypes (HPV52, HPV58, HPV45, HPV18, HPV16 and HPV33) is less than their single infection under ASC-US stage. Figure 6c demonstrates that the proportion of multiple infections for 4 HR-HPV genotypes (HPV35, HPV56, HPV52 and HPV45) is less than their single infection under LSIL stage. Figure 6d demonstrates that the proportion of multiple infections for 4 genotypes (HPV35, HPV18, HPV45 and HPV33) is less than their single infection under HSIL stage. Next, the Poisson regression analysis demonstrates that only the precancerous stages are statistically significant, while the HR-HPV genotypes are not ( Table 2).

Discussion
Since Fig. 2 shows that current total prevalence rate of the 13 HR-HPV types is less than the previous, we consider that the prevalence of both single and multiple infections was decreasing during these years in China. Additionally, since Fig. 3 shows that the prevalence rates of these HR-HPV types are inversely proportional to the severity of cervical lesions, we consider that most patients infected with HR-HPV types are in the early lesion stage (especially the squamous epithelial cells were still in the Normal stage). Thus, we have plenty of room to reduce the prevalence of HR-HPV types in China and should pay more attention to promoting cervical screening and HPV vaccine research.
Furthermore, both previous studies [3,13] and Fig. 3 indicate that the top three greatest HR-HPV types threating to China are HPV52, HPV58 and HPV16. Figure 3 also demonstrates that neither the proportion of the 13 HR-HPV types in the same precancerous stage nor the infection rate in different stages for the same HR-HPV type is similar, which implies that the infection of different HR-HPV types in different pathological stages is not consistent. Therefore, it is better to describe the phenomenon of HR-HPV infections in different precancerous stages and HR-HPV types, but not using the overall prevalence rate of HR-HPV types for each precancerous stage [12].
Moreover, the blue part in Fig. 3 indicates that the prevalence rates of HPV16, HPV58, HPV52 and HPV18 are greater than those of other types under the HSIL stage which are easily transformed into cervical cancer.
For cluster analysis, Fig. 4 demonstrates that the infection trends of HPV31 and HPV33 in the four precancerous stages are very similar in all infections. For single infection, Fig. 5 shows that the infection trends of HPV33 and HPV35 in the four precancerous stages are very similar. Since Villiers et al. [28] previously reported that high-risk subtypes such as HPV31, HPV33, HPV35, HPV52, HPV16 and HPV58 belong to alphapapillomavirus ninth species, our results show that such HR-HPV subtypes with common biological properties could have similar infection rate trends in precancerous stages. As we described previously, the Normal, ASC-US, LSIL and HSIL are in a disease development in order respectively. Figure 6 implies that as the disease  Note. * Pr < 0.05 development of precancer evolves, defense against HPV infection breaks, meanwhile, the potential of more HPV infection increases, which results in increase of multiple infection of HPV. In addition, only the precancerous stages are statistically significant when considering the impact of HR-HPV genotypes and precancerous stages on infection by regression analysis (Table 2).