Genetic variation is a major driving force in the evolution of organism. In individuals, specific genetic mutations such as SNPs can be deleterious and cause disease. The human genome project has yielded massive amounts of data on human SNPs, and this information can be used to further investigate human diseases. It is estimated that the human genome contains 10 million SNP sites . As a major repository of human SNPs, the NCBI dbSNP database  contains ~25 million human entries in the release of build 130. The annotation of single nucleotide polymorphisms (SNPs) is attracting a great deal of attention. Non-synonymous SNPs (nsSNPs), also referred to as single amino acid polymorphisms (SAPs), are SNPs that cause amino acid substitutions, and these are believed to be directly related to diseases. Thus far, only a small proportion of SAPs has been associated with disease. To date, ~20,000 non-synonymous SNPs are available with explicit annotation in the Swiss-Prot database [3, 4]. Therefore, it is desirable to develop effective methods for identifying disease-related amino acid substitutions.
Several computational models have been developed for this purpose. Evolutionary information is commonly considered to be the most important feature for such a prediction task. Based on sequence homology, an earliest predictor SIFT was developed by Ng and Henikoff [6, 7]. The PANTHER database was designed based on family Hidden Markov Models (HMMs) to determine the likelihood of affecting protein function . PolyPhen [9–11] showed that the selection pressure against deleterious SNPs depended on the molecular function of the proteins. Sequence/structural attributions were also incorporated in many studies. Satisfactory results were obtained by Ferrer-Costa  using mutation matrices, amino acid properties, and sequence potentials. By using attributions derived from other tools, an automated computational pipeline was constructed to annotate disease-associated nsSNPs . Many other models have been developed based on this combination strategy [14–21]. Saunders and Baker evaluated the contributions of several structural features and evolutionary information in predicting deleterious mutations . Wang and Moult undertook a detailed investigation of SNPs in which they studied the effects of the mutations on molecular function . Recently, Mort et al.,  Li et al.,  and Carter et al.  functionally profiled human amino acid substitutions. They found a significant difference between deleterious and polymorphic variants in terms of both structural and functional disruption. Yue et al. [27–29] performed comprehensive studies on the impact of single amino acid substitutions on protein structure and stability. In these studies, stability change was also regarded as an important factor that contributed to dysfunction. Detailed studies were carried out by Reumers et al.,  and Bromberg et al.  in which the extent of the functional effect of a mutation was correlated to its effect on protein stability.
Wang et al.,  and Yue et al.  showed that the functional impacts of a mutation are closely related to its protein structural context. Recently, Alexander et al.  showed how the fold and function of a protein is altered by mutations. They observed a conformational switch between two different folds triggered by a single amino acid substitution, which directly proved the dependence of protein structure and function on amino acid interactions. Therefore, the challenge that is faced, especially when there is a lack of annotations on the functional role of a residue, is how to incorporate such useful features for detecting disease-associated mutations. To resolve this, in our study a complex network was employed to depict protein structure.
Owing to their potential for systematic analysis, complex networks have been widely used in proteomics. This method can also be used to represent a protein structure as a network (we call it protein structure network, PSN) in which the vertices are the residues and the edges are their interactions. This provides novel insight into protein folding mechanisms, stability, and function. Greene et al., and Bagler et al. described the small-world and even scale-free  properties of such network, which were independent of the protein structural class . Vendruscolo et al., and Dokholyan et al. determined that a limited set of hub vertices with large connectivity plays a key role in protein folding [35–37]. In another study, hubs were defined as residues with more than four links, and these brought together different secondary structure elements that contributed to both protein folding and stability . All these studies suggest that protein structure network (PSN) facilitates the systematic analysis of residue interactions both locally and globally. PSN also has the advantage of capturing the role of a residue in protein structure and function.
Using this information, Cheng et al. developed a solely structure-based approach named Bongo to predict disease-associated SAPs  and obtained a satisfactory positive predictive value. Their study emphasized that the functional essentiality of a site is closely correlated to its role in maintaining protein structure. Their study showed that PSN should be capable of detecting polymorphic mutations. However, their method performed poorly in detecting disease-associated mutations, which was believed to be due to the inability of Bongo to identify functional roles of the residue. In this study, we demonstrated that PSN can also perform well in predicting disease-associated mutations.
We carried out a comprehensive analysis on the network properties of mutations by using a dataset compiled from Swiss-Prot. We tried to determine how disease-associated variants differ from polymorphism variants in terms of network topological features. Four well-established network topological features, degree, clustering coefficient, betweenness, and closeness, were calculated based on protein structure networks and used to predict disease-associated SAPs. The neighborhood of the mutation was also investigated. These features offer a quantitative description of residue interactions. We compared their performance with that of conservation features. Finally, a model was constructed to predict disease-associated SAPs by combining network topological, conservation, and properties of neighboring residues around a mutation (environmental features) as well as several features reported in previous studies. The satisfactory performance suggested that studying residue interactions can help to distinguish disease-associated SAPs from polymorphic SAPs.