SNP and gene networks construction and analysis from classification of copy number variations data
© Liu et al; licensee BioMed Central Ltd. 2011
Published: 27 July 2011
Detection of genomic DNA copy number variations (CNVs) can provide a complete and more comprehensive view of human disease. It is interesting to identify and represent relevant CNVs from a genome-wide data due to high data volume and the complexity of interactions.
In this paper, we incorporate the DNA copy number variation data derived from SNP arrays into a computational shrunken model and formalize the detection of copy number variations as a case-control classification problem. More than 80% accuracy can be obtained using our classification model and by shrinkage, the number of relevant CNVs to disease can be determined. In order to understand relevant CNVs, we study their corresponding SNPs in the genome and a statistical software PLINK is employed to compute the pair-wise SNP-SNP interactions, and identify SNP networks based on their P-values. Our selected SNP networks are statistically significant compared with random SNP networks and play a role in the biological process. For the unique genes that those SNPs are located in, a gene-gene similarity value is computed using GOSemSim and gene pairs that have similarity values being greater than a threshold are selected to construct gene networks. A gene enrichment analysis show that our gene networks are functionally important.
Experimental results demonstrate that our selected SNP and gene networks based on the selected CNVs contain some functional relationships directly or indirectly to disease study.
Two datasets are given to demonstrate the effectiveness of the introduced method. Some statistical and biological analysis show that this shrunken classification model is effective in identifying CNVs from genome-wide data and our proposed framework has a potential to become a useful analysis tool for SNP data sets.
Copy number variation (CNV) is defined as a genomic segment range from one kilobase to several megabases in size, in which copy number differences have been observed by comparison of two or more reference genomes [1, 2]. Human beings ordinarily have two copies of each autosomal region, one per chromosome. CNVs can be caused by genomic rearrangements such as inversions, deletions and duplications.
The fact that DNA copy number variation is a widespread and common phenomenon among human beings was first discovered [3, 4] following the completion of the human genome project. The high variability of the copy number throughout the human genome have been found by investigating on 270 HapMap individuals [5, 6]. Various studies have been developed for genome-wide CNV analysis to suggest their significant role in understanding human genetics. Redon et al.  published the first comprehensive and global map of CNVs in a genome-wide scale for making the beginning of large-scale copy number analysis. Comparative genomic hybridization (CGH)  and array-based comparative genomic hybridization  can detect gains and losses of genomic segments. Based on CGH, Wang et al.  proposed a new algorithm “Cluster Along Chromosomes” (CLAC), which builds hierarchical clustering-style trees along each chromosome and selects the interesting clusters by controlling False Discovery Rate (FDR) at a certain level. Bayesian model is a popular method for CNV detection. Pique-Regi et al.  exploited the use of piecewise constant (PWC) vectors to represent genome copy number and sparse Bayesian learning (SBL) to detect CNA breakpoints. Wu et al.  implemented Bayesian segmentation approach to carry out segmentation and assigning copy number status simultaneously. Recently, Chen et al.  used the mean and variance change point model (MVCM) to detect CNVs or breakpoints with an approximate p-value for statistical testing. In addition, Oldridge et al.  quantitatively evaluated several processing and segmentation strategies when short-sequence oligonucleotide arrays are applied and provide guidelines to optimize performance based on study-specific objectives.
Besides the above mentioned CNV detection methods, high-density single nucleotide polymorphism (SNP) arrays are becoming more and more popular in CNV detection. The main reason is that the inheritance pattern and linkage disequilibrium of CNVs are similar to those of the SNPs [6, 7]. Zhao et al.  measured the locus-specific hybridization intensity by hybridizing genomic representations of both DNA to SNP arrays to detect some novel CNVs in cancer samples. Huang et al.  proposed an algorithm that used whole genome sampling analysis (WGSA) by jointly measuring perfect match intensity and discrimination ratios to identify copy number changes. An QuantiSNP algorithm  described in 2007 used Objective Bayes Hidden-Markov Model (OB-HMM) by incorporating log R ratio and B allele frequency to set certain hyperparameters as priors, and using a novel re-sampling framework to calibrate the model to a fixed false positive error rate for CNVs detection. Wang et al.  designed a tool named PennCNV, a hidden Markov model (HMM) based approach, for kilobase-resolution detection of CNVs by combining multiple sources of information together, not only the total signal intensity and allelic intensity ratio of each SNP marker, but also the neighboring distances, the allele frequency and the pedigree information.
Recently, disease classification using copy number variation data has been demonstrated by several groups [19–22]. The motivation of this kind of method is that appropriate machine learning models for disease classification using copy number variation data will be effective not only for clinical treatment, but also for identification of disease susceptibility loci. Generally, copy numbers at probe loci are used directly as features. In a CNV data set, the association between a disease and a set of relevant CNVs are investigated. Patients and normals are often categorized in groups according to their copy number changes. Thousands of CNVs in different regions of chromosomes are used to describe characteristics of patient/normal samples.
When many CNVs are used to detect the association between a disease and multiple marker genotypes, we expect in a typical data set that contains the CNV data of several thousands of CNVs in different individuals, it is common to find only several numbers of copy number positions having genetic patterns that are highly specific to each group of individuals. The CNVs are called the relevant CNVs, as opposed to the irrelevant CNVs that do not help much in identifying the group (i.e., individuals of the same type). Due to the large number of CNVs being irrelevant to each group, two individuals in the same group could have low similarity when measured by a simple similarity function that consider the characteristics of all CNVs. The groups may thus be undetectable by classification algorithms. Here, we are interested in the development of high-dimensional numerical classification algorithm that can identify group of individuals and their relevant CNVs, i.e., detect the association between a disease and multiple marker variations. In this paper, we address this problem by applying a shrunken dissimilarity measure to copy number variation values derived from genome-wide SNP genotyping data. Performance was measured via cross-validation classification accuracy. By shrinkage, the number of relevant CNVs can be determined. In order to understand relevant CNVs, we study their corresponding SNPs in the genome and a statistical software PLINK is employed to compute the pair-wise SNP-SNP interactions, and identify SNP network based on their P-values. Some statistical analysis are done to illustrate the significance of our selected SNP networks. For the unique genes that those SNPs are located in, a gene-gene similarity value is computed using GOSemSim and gene pairs that has a similarity value being greater than a threshold are selected to construct gene networks. A gene enrichment analysis are done to show that some molecular functions and biological process are significantly associated with our gene networks. The whole framework indicates that our classification model is efficient in high dimensional CNV detection and SNP and gene networks constructed afterwards have relationships directly or indirectly to disease study.
The outline of this paper is given as follows. In Section 2, we propose the LRR calculation for signal intensities, the shrunken dissimilarity measure to analyze CNV data classification and the logistic model to calculate SNP interactions. In Section 3, we present experimental results on two real CNV data sets derived from genome-wide SNP arrays, and illustrate their corresponding SNP and gene networks for statistical and biological analysis. Finally, we give concluding remarks in Section 4.
There are many available tools that can be used to do genotyping call and Log R Ratio (LRR) calculation, such as Birdsuite , CNAG , dChip [25, 26] and GLAD . We calculated LRR values using PennCNV , which is a public available software for copy number variation detection from SNP genotyping arrays. For each SNP in a genome-wide data, the raw signal intensity values for its two alleles A and B are measured and then subject to a normalization procedure. The normalized signal intensity values should reveal three clusters (AA, AB, BB) representing three distinct genotypes. A genotype calling is a genotype assignment based on these three canonical clusters.
The normalization procedure produces X and Y values for each SNP, representing the experiment-wide normalized signal intensity on alleles A and B, respectively. One additional measurement will then be calculated for each SNP, where R = X + Y refers to the total signal intensity. As for a normalized measure of total signal intensity, the LRR value for each SNP is then calculated as
LRR = log2(R observed /R expected ) (1)
where R expected is computed from linear interpolation of canonical genotype clusters . LRR refers to the logarithm (in base 2) of the total observed normalized intensity R observed of the two alleles A and B relative to the expected R expected . For normal diploid genotypes, LRR should fluctuate randomly around zero.
Shrunken centroid method
It is well-known that DNA microarray data, which can simultaneously measure the expression level of thousands of different genes, have been successfully used to identify genetic heterogeneity of disease. However, mircoarray data typically has a large number of genes (features) and relatively few samples (observations), meaning that conventional machine learning methods may fail when applied to such data. The Prediction Analysis for Microarrays (PAM), has recently been reported as a potential powerful tool for microarrays analysis, which is based on the technique of nearest shrunken centroid [29, 30]. Nearest shrunken centroid is the “de-noised” version of simple nearest centroid classification. The main idea of nearest centroid classification is to compare test sample to each class centroid. The class whose centroid the test sample is closest to is the predicted class for test sample. The centroid is the average value for each feature (or attribute) in each class, normalized by the pooled within-class standard deviation. The nearest shrinkage centroid classification shrinks each centroid toward the overall centroid for all classes by a certain amount, which is called a “threshold”. After the overall centroid is set to zero, this shrinkage is to move the centroid to zero by the threshold, i.e., if the value of centroid in magnitude is larger than the threshold, it is subtracted by the threshold; if it is less than the threshold, it is set to zero, so the corresponding feature disappears, and it isn’t used for the following classification. By doing so, nearest shrunken centroid can find out the minimal subset of genes which succinctly characterize each class. Shrunken centroid method is traditionally used to deal with microarray data. In this paper, we adopt this method for copy number variation values derived from SNP genotyping arrays. The normalized signal intensity Log R Ratio (LRR) value in each probe loci calculated from existing software, which we will give a detailed description in the next subsection, will be considered as a unique feature.
π k is the prior probability of class k.
Logistic model to calculate SNP interactions
As our experimental data sets are both disease-trait samples, it is feasible to test epistasis using PLINK to  detect SNP-SNP interactions, which is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analysis in a computationally efficient manner. All pairwise combinations of input SNPs can be tested using a logistic regression model, which is based on allele dosage for each SNP, A and B, and fits the model in the form of (5)
Y~b0 + b1.A + b2.B + b3.AB + e (5)
The test for interaction is based on the coefficient of b3, therefore only considers allelic by allelic epistasis. We focus on symmetrical cases in our study, that means only unique pairs are analysed, for example, if SNP1*SNP2 is performed, SNP2*SNP1 will not be calculated again. The χ2 statistics is applied and the odds ratio for interaction is interpreted in the standard manner: a value of 1.0 indicates no effect.
Results and Discussion
WTCCC Type 1 Diabetes study
Data set and preprocessing
The core study of the Wellcome Trust Case Control Consortium (WTCCC) comprised an analysis of genetic signals from each of seven common human diseases (type 1 diabetes, type 2 diabetes, coronary artery disease, hypertension, bipolar disorder, rheumatoid arthritis and Croh disease) . Genotyping for type 1 diabetes (T1D) study was conducted by Affymetrix using the (“commercial”) Affymetrix 500K chip in 3504 samples with 2000 cases and 1504 controls.
The quantile normalized signal intensity data, which were generated from the Affymetrix intensity (‘CEL’) files and used as input to the Chiamo genotype calling program, was downloaded from WTCCC webpage and appropriately prepared according to the requirements of input file formats of PennCNV. A biological literature searching indicates that in chromosome 6, there are some well-known T1D related genes. Therefore, we chose chromosome 6 as an example to demonstrate our method. There are 31470 unique SNP loci in this chromosome, 1 markers have complete genotyping failure with confidence threshold of 0. 01, 1042 markers do not have at least two types of genotypes, 2460 markers have abnormal patterns, so there are 27967 SNP markers have been analyzed to construct canonical clustering positions.
Distribution of LRR values for selected SNPs of T1D study.
SNP network construction and analysis
Characteristics of SNP networks in different thresholds of T1D study.
NO. of networks
NO. of SNPs
NO. of SNP pairs
1(3), 2(4), 3(9), 4(20)
1(2), 2(3), 3(8), 4(21)
1(2), 2(2), 3(2), 4(2), 5(5)
1(1), 2(1), 3(1), 4(1), 5(4)
1(2), 2(2), 3(5)
1(1), 2(1), 3(4)
Significance of selected SNP networks in different thresholds of T1D study.
NO. of networks
1.2672 × 10–6
1.2765 × 10–7
3.1578 × 10–8
2.0741 × 10–8
5.6191 × 10–8
2.9939 × 10–8
3.5062 × 10–9
6.5545 × 10–9
3.7898 × 10–9
2.0716 × 10–8
1.9979 × 10–8
1.0436 × 10–7
5.8348 × 10–8
5.1991 × 10–9
2.1721 × 10–9
Gene network construction and analysis
Our method can select 63 SNPs, 25 of them are located in gene coding areas, and these 25 SNPs belong to 24 unique genes, which are all shown in Table 4. After checking with NCBI, we found that some of the genes that our selected SNPs located in are directly or indirectly related to diabetes. For example, GMDS refers to GDP-mannose 4,6-dehydratase, which catalyzes the conversion of GDP-mannose to GDP-4-keto-6-deoxymannose, and it has been verified that mannose-binding lectin is a predictor of Type 1 Diabetes . MTHFD1L is an enzyme involved in THF synthesis in mitochondria, and mutations of mitochondria strongly associate with diabetes . And also for LAMA2, whose full name is laminin, alpha 2, is a major component of the basement membrane, the abnormal level of it has been reported to have significant relationship to the presence of diabetes .
Detailed gene descriptions of SNPs selected in chromosome 6 of T1D study.
runt-related transcription factor 2
nudix (nucleoside diphosphate linked moiety X)-type motif 3
G protein-coupled receptor 116
leucine rich repeat containing 16
butyrophilin, subfamily 3, member A1
chromosome 6 open reading frame 209
similar to PPP1R14B protein
pleckstrin homology domain containing, family G (with RhoGef domain) member 1
chromosome 6 open reading frame 210
hypothetical protein FLJ34503
dynein, axonemal, heavy polypeptide 8
sorting nexin 9
leucine rich repeat and fibronectin type III domain containing 2
erythrocyte membrane protein band 4.1-like 2
FYN oncogene related to SRC, FGR, YES
laminin, alpha 2 (merosin, congenital muscular dystrophy)
NADPH oxidase 3
laminin, alpha 2 (merosin, congenital muscular dystrophy)
unc-5 homolog C (C. elegans)-like
methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 1-like an enzyme involved in THF synthesis in mitochondria
ribosomal protein S6 kinase, 90kDa, polypeptide 2
Gene networks formed for different threshold values of T1D study.
Number of gene pairs
Number of gene networks
We found some interesting relationships from SNP and gene networks. For example, for SNPs rs9321142 and rs9371491, which are interacted in the same SNP network under P-value of 0.1 in Figure 2, their corresponding genes are LAMA2 and MTHFD1L, but located in different gene networks in Figure 4, which means that maybe we can merge these two gene networks together, (shown with red dash line in Figure 4), and furthermore, can be consistent with existing biological functions of these two genes to diabetes. Another example, rs6912853 is interacted with rs2734990 under a very significant P-value of 9. 43 x 10 – 5, but rs2734990 is located in intergenic area and do not have a record in Gene Ontology until now, maybe we can make use of rs6912853’s gene information, BTN3A1, to further analyze the inner functions of rs2734990 and extend GO afterward.
Gene Ontology terms significant associated with selected genes (P < 0.001) of T1D study.
superoxide-generating NADPH oxidase activity
response to gravity
folic acid and derivative biosynthetic process
tetrahydrofolate metabolic process
NADPH oxidase complex
oxidoreductase activity, acting on NADH or NADPH, with oxygen as acceptor
Data set and preprocessing
Hirschsprung (HSCR, MIM 142623), also known as aganglionic megacolon, is a congenital intestinal disease. Patients suffer from different extent of aganglionosis due to the absence of ganglion cells in the gastrointestinal tract. Trisomy 21 (Down syndrome) is the most recurrent structural variations found in HSCR, contributing to 2-10% of the cases.
This data set consists of 330 controls and 121 cases of Chinese ethnicity. In this analysis, we excluded the centromeric and telomeric regions using 500kb threshold because these regions tend to have bias in CNV calling. We transformed the probe intensities into LRR values from this SNP data set (n=3369) using PennCNV with the same parameter setting and procedure as mentioned above.
Distribution of LRR values for selected SNPs of Hirschsprung study.
As there are only 6 SNPs selected, the number is too small to construct any networks. So for this dataset, we did not do any analysis in SNP level.
Gene network construction and analysis
Detailed descriptions of SNPs selected in chromosome 21 of Hirschsprung study.
Down syndrome cell adhesion molecule
Mutations in this gene cause enterokinase deficiency, a malabsorption disorder characterized by diarrhea and failure to thrive
T-cell lymphoma invasion and metastasis 1
Gene Ontology terms significant associated with selected genes (P < 0.0001) of Hirschsprung study.
positive regulation of axonogenesis
positive regulation of cell projection organization
ephrin receptor signaling pathway
regulation of cell projection organization
regulation of axon extension involved in axon guidance
ephrin receptor binding
In this paper, we use the method of nearest shrunken centroid for gene expression data, and apply it to tackle copy number variation data determined from genome-wide SNP arrays. The method can be implemented on a personal computer very efficiently. The relevant SNPs are selected for disease data. Experimental results are reported to show the effectiveness of our method. In particular, we find some SNPs that contain in some genes which are relevant to a particular disease. Based on the SNP and gene networks, we can find out some unknown relationships between their corresponding genes, which can be considered as an extension of existing GO knowledge. The existing Gene Ontology enrichment analysis tool also suggests that our selected genes are associated to some molecular function and biological process. In the future, we will study the following problems. Detailed biological analysis of CNVs determined from other genome-wide SNP data sets will be studied. Statistical and association study of selected SNPs can be carried out.
This work was financially supported by Research Grant Council  and Hong Kong Baptist University FRGs.
The authors would like to extend gratitude to the support by research grants, including HKU 765407M and HKU 775907M from the Hong Kong Research Grants Council, AOSPINE (AOSBRC-07-02) and AoE/M-04/04 from the University Grants Committee of Hong Kong. We are also grateful to Prof. Kathryn S.E. Cheah, Prof. Pak C. Sham, Prof. Paul K.H. Tam, Dr Maria-Merc, Garcia-Barcel, Dr YQ Song, Dr Danny Chan, Dr Kenneth Cheung and Dr Clara S. Tang for their assistance and kindness in data sharing.
This article has been published as part of BMC Bioinformatics Volume 12 Supplement 5, 2011: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2010. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12?issue=S5.
- Feuk L, Carson AR, Scherer SW: Structural variation in the human genome. Nature Reviews Genetics 2006, 7: 85–97.View ArticlePubMedGoogle Scholar
- Freeman JL, Perry GH, Feuk L, Redon R, McCarroll SA, Altshuler DM, Aburatani H, Jones KW, Tyler-Smith C, Hurles ME, Carter NP, Scherer SW, Lee C: Copy number variation: New insights in genome diversity. Genome Research 2006, 16: 949–961. 10.1101/gr.3677206View ArticlePubMedGoogle Scholar
- Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M, Navin N, Lucito R, Healy J, Hicks J, Ye K, Reiner A, Gilliam TC, Trask B, Patterson N, Zetterberg A, Wigler M: Large-Scale Copy Number Polymorphism in the Human Genome. Science 2004, 305: 525–528. 10.1126/science.1098918View ArticlePubMedGoogle Scholar
- Iafrate AJ, Feuk L, Riveral MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C: Detection of large-scale variation in the human genome. Nature Genetics 2004, 36: 949–951. 10.1038/ng1416View ArticlePubMedGoogle Scholar
- Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK: A high-resolution survey of deletion polymorphism in the human genome. Nature Genetics 2006, 38: 75–81. 10.1038/ng1697View ArticlePubMedGoogle Scholar
- McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, Barrett JC, Dallaire S, Gabriel SB, Lee C, Daly MJ, Altshuler DM: Common deletion polymorphisms in the human genome. Nature Genetics 2006, 38: 86–92. 10.1038/ng1696View ArticlePubMedGoogle Scholar
- Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzalez JR, Gratacos M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME: Global variation in copy number in the human genome. Nature 2006, 444: 444–454. 10.1038/nature05329PubMed CentralView ArticlePubMedGoogle Scholar
- Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, Pinkel D: Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 1992, 258: 818–821. 10.1126/science.1359641View ArticlePubMedGoogle Scholar
- Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, Collins C, Kuo WL, Chen C, Zhai Y, Dairkee SH, Ljung B, Gray JW, Albertson DG: High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nature Genetics 1998, 20: 207–211. 10.1038/2524View ArticlePubMedGoogle Scholar
- Wang P, Kim Y, Pollack J, Narasimhan B, Tibshirani R: A method for calling gains and losses in array CGH data. Biostatistics 2005, 6: 45–58. 10.1093/biostatistics/kxh017View ArticlePubMedGoogle Scholar
- Pique-Regi R, Monso-Varona J, Ortega A, Seeger RC, Triche TJ, Asgharzadeh S: Sparse representation and Bayesian detection of genome copy number alterations from microarray data. Bioinformatics 2008, 24(3):309–318. 10.1093/bioinformatics/btm601PubMed CentralView ArticlePubMedGoogle Scholar
- Wu LY, Chipman HA, Bull SB, Briollais L, Wang K: A Bayesian segmentation approach to ascertain copy number variations at the population level. Bioinformatics 2009, 25(13):1669–1679. 10.1093/bioinformatics/btp270View ArticlePubMedGoogle Scholar
- Chen J, Wang YP: A Statistical Change Point Model Approach for the Detection of DNA Copy Number Variations in Array CGH Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2009, 6: 529–541.PubMed CentralView ArticlePubMedGoogle Scholar
- Oldridge DA, Banerjee S, Setlur SR, Sboner A, Demichelis F: Optimizing copy number variation analysis using genome-wide short sequence oligonucleotide arrays. Nucleic Acids Research 2010, 38: 3275–3286. 10.1093/nar/gkq073PubMed CentralView ArticlePubMedGoogle Scholar
- Zhao X, Li C, Paez JG, Chin K, Janne PA, Chen T, Girard L, Minna J, Christiani D, Leo C, Gray JW, Sellers WR, Meyerson M: An Integrated View of Copy Number and Allelic Alterations in the Cancer Genome Using Single Nucleotide Polymorphism Arrays. Cancer Research 2004, 64: 3060–3071. 10.1158/0008-5472.CAN-03-3308View ArticlePubMedGoogle Scholar
- Huang J, Wei W, Zhang J, Liu G, Bignell GR, Stratton MR, Futreal PA, Wooster R, Jones KW, Shapero MH: Whole genome DNA copy number changes identified by high density oligonucleotide arrays. Hum Genomics 2004, 1: 287–299.PubMed CentralView ArticlePubMedGoogle Scholar
- Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, Bassett AS, Seller A, Holmes CC, Ragoussis J: QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Research 2007, 35: 2013–2025. 10.1093/nar/gkm076PubMed CentralView ArticlePubMedGoogle Scholar
- Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SFA, Hakonarson H, Bucan M: Penn CNV: An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Research 2007, 17: 1665–1674. 10.1101/gr.6861907PubMed CentralView ArticlePubMedGoogle Scholar
- Bastian BC, Olshen AB, LeBoit PE, Pinkel D: Classifying Melanocytic Tumors Based on DNA Copy Number Changes. American Journal of Pathology 2003, 163: 1765–1770. 10.1016/S0002-9440(10)63536-5PubMed CentralView ArticlePubMedGoogle Scholar
- O’Hagan RC, Brennan CW, Strahs A, Zhang X, Kannan K, Donovan M, Cauwels C, Sharpless NE, Wong WH, Chin L: Array Comparative Genome Hybridization for Tumor Classification and Gene Discovery in Mouse Models of Malignant Melanoma. Cancer Res 2003, 63: 5352–5356.PubMedGoogle Scholar
- Baumbusch LO, Aaroe J, Johansen FE, Hicks J, Sun H, Bruhn L, Gunderson K, Naume B, Kristensen VN, Liestol K, Borresen-Dale AL, Lingjaerde OC: Comparison of the Agilent, ROMA/NimbleGen and Illumina platforms for classification of copy number alterations in human breast tumors. BMC Genomics 2008, 9: 379. 10.1186/1471-2164-9-379PubMed CentralView ArticlePubMedGoogle Scholar
- Myllykangas S, Tikka J, Bohling T, Knuutila S, Hollmen J: Classification of human cancers based on DNA copy number amplification modeling. BMC Medical Genomics 2008, 1: 15. 10.1186/1755-8794-1-15PubMed CentralView ArticlePubMedGoogle Scholar
- Korn JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, Cawley S, Hubbell E, Veitch J, Collins PJ, Darvishi K, Lee C, Nizzari MM, Gabriel SB, Purcell S, Daly MJ, Altshuler D: Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genetics 2008, 40: 1253–1260. 10.1038/ng.237PubMed CentralView ArticlePubMedGoogle Scholar
- Nannya Y, Sanada M, Nakazaki K, Hosoya N, Wang L, Hangaishi A, Kurokawa M, Chiba S, Bailey DK, Kennedy GC, Ogawa S: A Robust Algorithm for Copy Number Detection Using High-Density Oligonucleotide Single Nucleotide Polymorphism Genotyping Arrays. Cancer Res 2005, 65: 6071–6079. 10.1158/0008-5472.CAN-05-0465View ArticlePubMedGoogle Scholar
- Li C, Wong WH: DNA-Chip Analyzer (dChip). In The analysis of gene expression data: methods and software. Edited by: Parmigiani G, Garrett ES, Irizarry R, Zeger SL. New York: Springer; 2003:120–141.View ArticleGoogle Scholar
- Lin M, Wei LJ, Sellers WR, Lieberfarb M, Wong WH, Li C: dChipSNP: Significance Curve and Clustering of SNP-Array-Based Loss-of-Heterozygosity Data. Bioinformatics 2004, 20: 1233–1240. 10.1093/bioinformatics/bth069View ArticlePubMedGoogle Scholar
- Hupe P, Stransky N, Thiery J, Radvanyi F, Barillot E: Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics 2004, 20: 3413–3422. 10.1093/bioinformatics/bth418View ArticlePubMedGoogle Scholar
- Peiffer DA, Le JM, Steemers FJ, Chang W, Jenniges T, Garcia F, Haden K, Li J, Shaw CA, Belmont J, Cheung SW, Shen RM, Barker DL, Gunderson KL: High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res 2006, 16(9):1136–1148. 10.1101/gr.5402306PubMed CentralView ArticlePubMedGoogle Scholar
- Bair E, Tibshirani R: Machine learning method applied to DNA microarray data can improve the diagnosis of cancer. SIGKDD Explorations 2003, 5: 48–55. 10.1145/980972.980980View ArticleGoogle Scholar
- Tibshirani R, Hastie T, Narasimhan B, Chu G: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A. 2002, 99(10):6567–6572. 10.1073/pnas.082099299PubMed CentralView ArticlePubMedGoogle Scholar
- Burton P: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007, 447: 661–678. 10.1038/nature05911View ArticleGoogle Scholar
- Hovind P, Hansen TK, Tarnow L, Thiel S, Steffensen R, Flyvbjerg A, Parving H: Mannose-Binding Lectin as a Predictor of Microalbuminuria in Type 1 Diabetes. Diabetes 2005, 54: 1523–1527. 10.2337/diabetes.54.5.1523View ArticlePubMedGoogle Scholar
- Maassen A, ‘t Hart LM, van Essen E, Heine RJ, Nijpels G, Tafrechi RSJ, Raap AK, Janssen GMC, Lemkes HH: Mitochondrial Diabetes: Molecular Mechanisms and Clinical Presentation. Diabetes 2004, 53: S103-S109. 10.2337/diabetes.53.2007.S103View ArticlePubMedGoogle Scholar
- Tarsio JF, Reger LA, Furcht LT: Molecular mechanisms in basement membrane complications of diabetes. Alterations in heparin, laminin, and type IV collagen association. Diabetes 1988, 37: 532–539. 10.2337/diabetes.37.5.532View ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. Nat Genet 2000, 25: 25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Schlicker A, Domingues FS, Rahnenfuhrer J, Lengauer T: A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinformatics 2006, 7: 302.1–302.16.View ArticleGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13: 2498–2504. 10.1101/gr.1239303PubMed CentralView ArticlePubMedGoogle Scholar
- Zheng Q, Wang XJ: GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis. Nucleic Acids Res 2008, 36(Web Server issue):W358–363.PubMed CentralView ArticlePubMedGoogle Scholar
- Korbel JO, Tirosh-Wagner T, Urban AE, Chen XN, Kasowski M, Dai L, Grubert F, Erdman C, Gao MC, Lange K, Sobel EM, Barlow GM, Aylsworth AS, Carpenter NJ, Clark RD, Cohen MY, Doran E, Falik-Zaccai T, Lewin SO, Lott IT, McGillivray BC, Moeschler JB, Pettenati MJ, Pueschel SM, Rao KW, Shaffer LG, Shohat M, Riper AJV, Warburton D, Weissman S, Gerstein MB, Snyder M, Korenberg JR: The genetic architecture of Down syndrome phenotypes revealed by high-resolution analysis of human segmental trisomies. PNAS 2009, 106: 12031–12036. 10.1073/pnas.0813248106PubMed CentralView ArticlePubMedGoogle Scholar
- Yamakawa K, Huot YK, Haendelt MA, Hubert R, Chen XN, Lyons GE, Korenberg JR: DSCAM: a novel member of the immunoglobulin superfamily maps in a Down syndrome region and is involved in the development of the nervous system. Human Molecular Genetics 1998, 7: 227–237. 10.1093/hmg/7.2.227View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.