SimArray: a user-friendly and user-configurable microarray design tool
© Auburn et al; licensee BioMed Central Ltd. 2006
Received: 11 October 2005
Accepted: 01 March 2006
Published: 01 March 2006
Microarrays were first developed to assess gene expression but are now also used to map protein-binding sites and to assess allelic variation between individuals. Regardless of the intended application, efficient production and appropriate array design are key determinants of experimental success. Inefficient production can make larger-scale studies prohibitively expensive, whereas poor array design makes normalisation and data analysis problematic.
We have developed a user-friendly tool, SimArray, which generates a randomised spot layout, computes a maximum meta-grid area, and estimates the print time, in response to user-specified design decisions. Selected parameters include: the number of probes to be printed; the microtitre plate format; the printing pin configuration, and the achievable spot density. SimArray is compatible with all current robotic spotters that employ 96-, 384- or 1536-well microtitre plates, and can be configured to reflect most production environments. Print time and maximum meta-grid area estimates facilitate evaluation of each array design for its suitability. Randomisation of the spot layout facilitates correction of systematic biases by normalisation.
SimArray is intended to help both established researchers and those new to the microarray field to develop microarray designs with randomised spot layouts that are compatible with their specific production environment. SimArray is an open-source program and is available from http://www.flychip.org.uk/SimArray/.
The full utility of the spotted microarray format is clearly reflected in the range of its applications. Transcriptome arrays, containing cDNA, gDNA, or oligonucleotide probes, are used to measure differential gene expression [1–5]. Whole-genome arrays, typically composed of tiled gDNA or oligonucleotides , have been used to identify in vivo sites of protein-DNA interactions [7, 8] or allelic variation [9, 10]. Whilst these applications dominate, other formats, for example antibody arrays, facilitate analysis of protein and small-molecule analytes [11, 12]. Thus, spotted microarrays enable high-throughput, cost-effective, and large-scale analysis of molecular interactions.
Robotic spotters deposit probes as an ordered array by repetition of a simple multi-step procedure [13–15]. First, the print tool is positioned over the first batch of probes to be printed, arranged in 96-, 384- or 1536-well microtitre plates. Second, the spotting pins are filled by capillary action with probe material. This step is often called a source visit. Third, the probe material is deposited on chemically-modified glass slides [14, 16]. Finally, the pins are cleaned, to prevent cross-contamination between subsequent spot depositions, before re-filling and printing the next batch of probes. The diversity of instrumentation, spotting pins, and reagents available, mean that procedures may be refined for optimal throughput, spot density, morphology, and consistency . Whilst this facilitates production of high-quality arrays, it can also lead to significant differences between facilities with regard to the instrumentation, protocols, and reagents employed.
Robotic spotters are supplied with sophisticated software to convert operator inputs to the precise list of instructions needed by the arrayer, e.g., how often each source visit is to be printed, and at which spot location [13–15]. Most spotters, however, are not supplied with adequate array design tools. Operators are instead left to develop suitable spot layouts in an ad hoc fashion. This oftenleads to sub-optimal designs with spots positioned according to print order, thus juxtaposing replicates, when a non-sequential or randomised spot layout can help to control for confounding spatial effects [17, 18]. For example, biases caused by inconsistent probe concentrations in the microtitre plates [18, 19] and local variations in hybridisation or washing efficiency [20–23], also see Additional file 1. Whilst random noise can be overcome with simple replication and averaging, systematic biases must be specifically addressed by randomisation and normalisation [24–27].
There is therefore a need for a microarray design tool that can generate randomised source visits and, in the case of some instruments, permit the use of variable numbers of replicates, since current spot density constraints and the need for genome-wide coverage mean that replication is often limited to the normalisation controls. Exogenous or 'spike' controls, i.e., probes that are complementary to targets not present in the genome of interest, can be employed for this purpose [28, 29]. Print time and maximum meta-grid area estimates enable users to evaluate the suitability of the array design.
We have addressed the current lack of such microarray design tools by developing SimArray, a user-friendly and user-configurable program that generates a randomised spot layout, computes the maximum meta-grid area, and estimates printing time, in response to user-defined design decisions. The user enters these parameters by running SimArray twice. The first run produces the source visit list that can be edited to include variable numbers of replicates, or for specific source visits to be omitted when plates are partially filled. The second run processes this source list to create the spot layout, maximum meta-grid area and estimated print time. User-configurable files mean that SimArray can be adapted to most production environments.
SimArray was developed in Perl version 5.6.1 and 5.8.3, under both Windows and UNIX operating systems. SimArray can be run under Windows (after installing Perl; for example, ActivePerl ) and UNIX.
Before the first run
Download the three configuration files and an 'index.sa' file from the SimArray web site . Configuration files that describe instrument-specific pin configurations and achievable spot densities have already been created. If a suitable file is not available, an existing one should be downloaded and edited. An example file for the user to record their specific print cycle times is also available for editing. The 'index.sa' file should then be updated, to record the locations and names of the configuration files. The configuration files will then only need to be re-edited if the printing environment is altered.
Probe number: enter the number of microarray probes (or wells) to be printed.
Plate format: select an appropriate plate format.
Tools available: select an appropriate pin configuration.
Source visits: the source visit list is generated for editing.
Required spot density: SimArray counts the number of spots to be printed.
Pin type: select the spotting pin to be used.
Evaluate pin selection: SimArray evaluates whether the selected pin is compatible with the required spot density.
Compute spots_x and spots_y: SimArray computes and displays the sub-grid dimensions that fall between the target spot number and a user-specified upper limit, for the user to select.
Compute print time: select an appropriate print set-up, SimArray then calculates the estimated print time.
Summary report: SimArray generates a report containing a summary of the user's responses, the randomised spot layout, an estimated print time, and the maximum meta-grid area.
After the second run
The randomised source visit map can be directly uploaded to instruments that accept either comma-, tab-, or space-separated values source files, or manually entered. Microarrays can then be manufactured, with spots no longer positioned according to print order. Standard robotic spotter data tracking software can be used to record which probe is present at each spot location.
Results and Discussion
Maximum meta-grid area is calculated by simply multiplying the number of pins in each axis by the pin tool's pre-defined pin pitch (Fig. 1). Consequently, SimArray does not take spot pitch into consideration and can over-estimate meta-grid areas, especially for low-density arrays that are printed with reduced spot pitches. Since high-density arrays limit the scope for reducing spot pitch, we believe this is a reasonable approach because SimArray will be of most use when designing higher-density arrays. Additionally, most operators print microarrays with the spot pitch set to the near-maximum distance permissible to reduce the probability that neighbouring spots printed by the same pin will be merged together. Prediction of the maximum meta-grid area will at least allow users to decide whether it is possible for them to hybridise the array, e.g., when the hybridisation area is constrained by automated hybridisation stations.
Sub-grid dimensions, i.e., the number of spots in the x and y-axis, which are compatible with the target spot number per sub-grid are then calculated, and users select an option from a list of compatible choices. To limit the length of this list, SimArray will only display sub-grid dimensions that are equal to or greater than the target spot number per sub-grid and less than a user specified limit. The upper limit is the target spot number per sub-grid, plus the user-specified 'spot number margin'. SimArray prevents users from selecting grid dimensions that are incompatible with the spotting pins' maximum achievable spot density. If, however, the selected sub-grid dimensions permit more than the required number of spots to be printed, additional spot locations are flagged as blanks by assigning them a source visit number of zero, i.e., not printed. SimArray will fail at this stage if there are no viable sub-grid dimension between the minimum and maximum target. We therefore recommend using a 'spot number margin' of at least ten.
The user-configurable files described above and in Figures 2, 3 and 4, maximise the utility of this tool because they enable a range of production environments to be explicitly modelled. All SimArray configuration files contain a header, which includes a key to the file's contents to help with this task. If required, additional comments can also be added because all lines marked with a hash at the start are automatically ignored when SimArray reads these files. However, column meaning and order is fixed and must be preserved. We aim to develop an on-line library of configuration files at the SimArray web site .
A fully worked simple example
For this worked example, we compare the performance of a MicroGrid II (Genomic Solutions) and a Qarray2 (Genetix) instrument, printing a 15 K probe library. The configuration files were edited to reflect the specific set of printing conditions for each robotic spotter. The example library consists of two probe types: transcript-specific probes and exogenous controls, along with some empty wells (Figure 6). The design requirement is to print single copies of the transcript-specific probes, and for every pin to print quadruplet spots for the exogenous control probes, randomising the distribution of elements on the array, whilst omitting the empty wells. The probes were arranged in the spotting plates, according to these design criterion (Figure 6).
SimArray also permits further simulations, allowing an evaluation of alternative pin configurations, replicate numbers, slide numbers and instrument configurations, subject to the availability of appropriately annotated configuration files. This further ensures that microarrays of an ideal design can be generated, whilst permitting each to be evaluated for its compatibility to the local production environment. Printing with different pin configurations, however, requires the spotting library itself to be redesigned, as spotting pins enter adjacent wells of the microtitre plate and the probes must be arranged accordingly (Figs 1 and 6). This suggests that spot layouts should be defined before the spotting probes are transferred to microtitre plates for printing.
We have developed a user-friendly microarray design tool, SimArray, which generates a randomised spot layout, computes a maximum meta-grid area, and an estimated print time, in response to user-specified design decisions. SimArray is of general utility for all users of robotic spotters and can be configured to suit individual production environments.
Availability and requirements
Project name: SimArray
Project home page: http://www.flychip.org.uk/SimArray/
Operating system: Windows and UNIX
Programming language: Perl version 5.6.1 (and more recent versions)
Any restrictions to use by non-academics: none
List of abbreviations used
DNA Complementary DNA
DNA Genomic DNA
The sub-grids (and spots) that are printed by one print tool
The patch of spots that is printed by a single pin
The authors would like to thank Gos Micklem, David Kreil, and the four anonymous referees for constructive criticism of the manuscript, and François Guillier for help with the web site. This work was supported by research grants from the BBSRC.
- Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995, 270: 467–470.View ArticlePubMedGoogle Scholar
- Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW: Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proc Natl Acad Sci U S A 1996, 93: 10614–10619. 10.1073/pnas.93.20.10614PubMed CentralView ArticlePubMedGoogle Scholar
- Hayward RE, Derisi JL, Alfadhli S, Kaslow DC, Brown PO, Rathod PK: Shotgun DNA microarrays and stage-specific gene expression in Plasmodium falciparum malaria. Mol Microbiol 2000, 35: 6–14. 10.1046/j.1365-2958.2000.01730.xView ArticlePubMedGoogle Scholar
- Zaigler A, Schuster SC, Soppa J: Construction and usage of a onefold-coverage shotgun DNA microarray to characterize the metabolism of the archaeon Haloferax volcanii. Mol Microbiol 2003, 48: 1089–1105. 10.1046/j.1365-2958.2003.03497.xView ArticlePubMedGoogle Scholar
- Relogio A, Schwager C, Richter A, Ansorge W, Valcarcel J: Optimization of oligonucleotide-based DNA microarrays. Nucleic Acids Res 2002, 30: e51. 10.1093/nar/30.11.e51PubMed CentralView ArticlePubMedGoogle Scholar
- Mockler TC, Ecker JR: Applications of DNA tiling arrays for whole-genome analysis. Genomics 2005, 85: 1–15. 10.1016/j.ygeno.2004.10.005View ArticlePubMedGoogle Scholar
- Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, Volkert TL, Wilson CJ, Bell SP, Young RA: Genome-wide location and function of DNA binding proteins. Science 2000, 290: 2306–2309. 10.1126/science.290.5500.2306View ArticlePubMedGoogle Scholar
- Mukherjee S, Berger MF, Jona G, Wang XS, Muzzey D, Snyder M, Young RA, Bulyk ML: Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays. Nat Genet 2004, 36: 1331–1339. 10.1038/ng1473PubMed CentralView ArticlePubMedGoogle Scholar
- Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, Collins C, Kuo WL, Chen C, Zhai Y, Dairkee SH, Ljung BM, Gray JW, Albertson DG: High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet 1998, 20: 207–211. 10.1038/2524View ArticlePubMedGoogle Scholar
- Pinkel D, Albertson DG: Comparative genomic hybridization. Annu Rev Genomics Hum Genet 2005, 6: 331–354. 10.1146/annurev.genom.6.080604.162140View ArticlePubMedGoogle Scholar
- Lueking A, Cahill DJ, Mullner S: Protein biochips: A new and versatile platform technology for molecular medicine. Drug Discov Today 2005, 10: 789–794. 10.1016/S1359-6446(05)03449-5View ArticlePubMedGoogle Scholar
- Chiosis G, Brodsky JL: Small molecule microarrays: from proteins to mammalian cells - are we there yet? Trends Biotechnol 2005, 23: 271–274. 10.1016/j.tibtech.2005.03.011View ArticlePubMedGoogle Scholar
- Cheung VG, Morley M, Aguilar F, Massimi A, Kucherlapati R, Childs G: Making and reading microarrays. Nat Genet 1999, 21: 15–19. 10.1038/4439View ArticlePubMedGoogle Scholar
- Hegde P, Qi R, Abernathy K, Gay C, Dharap S, Gaspard R, Hughes JE, Snesrud E, Lee N, Quackenbush J: A concise guide to cDNA microarray analysis. Biotechniques 2000, 29: 548–50, 552–4, 556 passim.PubMedGoogle Scholar
- Affara NA: Resource and hardware options for microarray-based experimentation. Brief Funct Genomic Proteomic 2003, 2: 7–20. 10.1093/bfgp/2.1.7View ArticlePubMedGoogle Scholar
- Auburn RP, Kreil DP, Meadows LA, Fischer B, Matilla SS, Russell S: Robotic spotting of cDNA and oligonucleotide microarrays. Trends Biotechnol 2005, 23: 374–379. 10.1016/j.tibtech.2005.04.002View ArticlePubMedGoogle Scholar
- Wernisch L, Kendall SL, Soneji S, Wietzorrek A, Parish T, Hinds J, Butcher PD, Stoker NG: Analysis of whole-genome microarray replicates using mixed models. Bioinformatics 2003, 19: 53–61. 10.1093/bioinformatics/19.1.53View ArticlePubMedGoogle Scholar
- Qian J, Kluger Y, Yu H, Gerstein M: Identification and correction of spurious spatial correlations in microarray data. Biotechniques 2003, 35: 42–4, 46, 48.PubMedGoogle Scholar
- Spruill SE, Lu J, Hardy S, Weir B: Assessing sources of variability in microarray gene expression data. Biotechniques 2002, 33: 916–20, 922–3.PubMedGoogle Scholar
- Yang YH, Buckley MJ, Dudoit S, Speed TP: Comparison of methods for image analysis on cDNA microarray data. J Comp Graph Stat 2002, 11: 108–136. 10.1198/106186002317375640View ArticleGoogle Scholar
- Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 2002, 30: e15. 10.1093/nar/30.4.e15PubMed CentralView ArticlePubMedGoogle Scholar
- Kreil DP, Russell RR: There is no silver bullet--a guide to low-level data transforms and normalisation methods for microarray data. Brief Bioinform 2005, 6: 86–97. 10.1093/bib/6.1.86View ArticlePubMedGoogle Scholar
- Futschik ME, Crompton T: OLIN: optimized normalization, visualization and quality testing of two-channel microarray data. Bioinformatics 2005, 21: 1724–1726. 10.1093/bioinformatics/bti199View ArticlePubMedGoogle Scholar
- Kerr MK, Churchill GA: Statistical design and the analysis of gene expression microarray data. Genet Res 2001, 77: 123–128. 10.1017/S0016672301005055PubMedGoogle Scholar
- Churchill GA: Fundamentals of experimental design for cDNA microarrays. Nat Genet 2002, 32 Suppl: 490–495. 10.1038/ng1031View ArticlePubMedGoogle Scholar
- Brodsky L, Leontovich A, Shtutman M, Feinstein E: Identification and handling of artifactual gene expression profiles emerging in microarray hybridization experiments. Nucleic Acids Res 2004, 32: e46. 10.1093/nar/gnh043PubMed CentralView ArticlePubMedGoogle Scholar
- Le Meur N, Lamirault G, Bihouee A, Steenman M, Bedrine-Ferran H, Teusan R, Ramstein G, Leger JJ: A dynamic, web-accessible resource to process raw microarray scan data into consolidated gene expression values: importance of replication. Nucleic Acids Res 2004, 32: 5349–5358. 10.1093/nar/gkh870PubMed CentralView ArticlePubMedGoogle Scholar
- van Bakel H, Holstege FC: In control: systematic assessment of microarray performance. EMBO Rep 2004, 5: 964–969. 10.1038/sj.embor.7400253PubMed CentralView ArticlePubMedGoogle Scholar
- Baker SC, Bauer SR, Beyer RP, Brenton JD, Bromley B, Burrill J, Causton HC, Conley MP, Elespuru R, Fero M, Foy C, Fuscoe J, Gao X, Gerhold DL, Gilles P, Goodsaid F, Guo X, Hackett J, Hockett RD, Ikonomi P, Irizarry RA, Kawasaki ES, Kaysser-Kranich T, Kerr K, Kiser G, Koch WH, Lee KY, Liu C, Liu ZL, Lucas A, Manohar CF, Miyada G, Modrusan Z, Parkes H, Puri RK, Reid L, Ryder TB, Salit M, Samaha RR, Scherf U, Sendera TJ, Setterquist RA, Shi L, Shippy R, Soriano JV, Wagar EA, Warrington JA, Williams M, Wilmer F, Wilson M, Wolber PK, Wu X, Zadro R: The External RNA Controls Consortium: a progress report. Nat Methods 2005, 2: 731–734. 10.1038/nmeth1005-731View ArticlePubMedGoogle Scholar
- ActivePerl [ http://www.activestate.com/ActivePerl/].
- SimArray [ http://www.flychip.org.uk/SimArray ].
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.