Microarray platforms based on Single Nucleotide Polymorphisms (SNP arrays) are powerful tools in the research of genomic structural variation because they allow the integration of genotype and copy-number information. Researchers can simultaneously identify both copy number and copy number neutral changes, using the log2-ratio (LRR) intensity signal and the allele genotyping of the probes [1–6]. While SNP arrays have been effectively used in the study of copy number variation (CNV) and single nucleotide polymorphism (SNP) genotyping, only recently they have been utilised to identify the mosaic occurrence of copy number and copy neutral genomic abnormalities . Genetic mosaicism is recognized as the presence of two or more different cell populations with different genotypes in one single individual, developed from a single fertilized egg. Such genetic abnormalities may result from a mutation during development that is propagated to only a subset of the adult cells. Somatic mosaicism for chromosomal rearrangements has been recently described on the basis of comparative analysis of differentiated human tissues from adult individuals  and divergence between identical twins . Moreover it is well known that some mosaic abnormalities are involved in multiple developmental and tissue-specific disorders [10–16]. Despite of all these evidences, the frequency and extent of chromosomal mosaicism in adult normal population has been estimated only recently for the first time , and its real contribution to intra- and inter-individual genome variation is yet to be determined. For helping in that purpose, specialized algorithms and data analysis tools aimed at calling the mosaic occurrence of structural variation are badly needed.
Mosaic events can be captured by analysing SNP array data, specially from assessing multiple clusters of heterozygous alleles showing B allele frequency (BAF) and LRR values different from the those expected for regular heterozygous deletions, duplications or loss of heterozygosis events (See Additional File 1 - Figure S1 for examples of different types of mosaic rearrangements). Two recent studies have demonstrated that structural variants occurring in mosaicism are more frequent than expected, and thus they may play a relevant role in human diversity and disease susceptibility [7, 17]. While both studies used Illumina SNP array data, only Rodríguez-Santiago et al. (2010) used tools for discovering occurrences in a systematic way. Their approach may, however, result in the underestimation of mosaic prevalence in two challenging situations: 1) small rearrangements and 2) rearrangements affecting a low percentage of cells. In addition, the used algorithm was computationally demanding (2 weeks to analyze about 2,000 individuals genotyped with Illumina HumanHap 1M), which constitutes a technical drawback in the analysis of high-density arrays of thousands of individuals.
Mosaicism detection can be cast as a segmentation problem . In fact BAFsegmentation is a software developed for the identification of mosaicism in cancer cells based on the circular binary segmentation (CSB) algorithm . The disadvantage of this procedure is the lack of a method to clearly control the false discovery rate (FDR) .
As an alternative segmentation method, the genome alteration detection analysis (GADA) can also identify allelic imbalances by using BAF values provided by SNP arrays . This value is the fraction of the total signal due to a specific allele and it is the suited value to study allelic imbalances underlying mosaicism for genomic rearrangements. Compared to circular binary segmentation (CBS) , GADA has similar accuracy, and is several orders of magnitude faster. Recently, GADA segmentation was applied to CNV calling by using LRR values from SNP array data in very large data sets with high efficiency and accuracy [6, 22]. To overcome the specific difficulties in the identification of mosaic events from SNP arrays, we have developed Mosaic Alteration Detection-MAD method which includes both statistic (including FDR control) and bioinformatic tools to specifically analyse BAF values from SNP array data. The software presented here improves SNP array data analysis allowing the capture of mosaic copy number (deletions, duplications, aneuploidies) and copy neutral changes (uniparental disomies, UPD, namely the occurrence of two copies of a particular chromosome from the same parent), as well as regions of homozygosity due to identity-by-descent. The developed method was used to analyse Illumina HumanHap 1M SNP array data obtained from control, bladder cancer and HapMap individuals. We also compared its performance (sensitivity and specificity) with BAFsegmentation  under several simulated challenging scenarios like small altered regions, low percentage of mosaic cells and poor array quality. Finally, our tool was applied to SNP array data previously utilised for detecting mosaic abnormalities [17, 18]. While all published findings were successfully identified, additional mosaic events were detected by MAD and experimentally validated afterwards. Our results suggest that improved methods can accurately capture mosaic chromosomal rearrangements using SNP array data and that their prevalence is higher than initially thought.