- Open Access
PreP+07: improvements of a user friendly tool to preprocess and analyse microarray data
© Martin-Requena et al; licensee BioMed Central Ltd. 2009
- Received: 29 August 2008
- Accepted: 12 January 2009
- Published: 12 January 2009
Nowadays, microarray gene expression analysis is a widely used technology that scientists handle but whose final interpretation usually requires the participation of a specialist. The need for this participation is due to the requirement of some background in statistics that most users lack or have a very vague notion of. Moreover, programming skills could also be essential to analyse these data. An interactive, easy to use application seems therefore necessary to help researchers to extract full information from data and analyse them in a simple, powerful and confident way.
PreP+07 is a standalone Windows XP application that presents a friendly interface for spot filtration, inter- and intra-slide normalization, duplicate resolution, dye-swapping, error removal and statistical analyses. Additionally, it contains two unique implementation of the procedures – double scan and Supervised Lowess-, a complete set of graphical representations – MA plot, RG plot, QQ plot, PP plot, PN plot – and can deal with many data formats, such as tabulated text, GenePix GPR and ArrayPRO. PreP+07 performance has been compared with the equivalent functions in Bioconductor using a tomato chip with 13056 spots. The number of differentially expressed genes considering p-values coming from the PreP+07 and Bioconductor Limma packages were statistically identical when the data set was only normalized; however, a slight variability was appreciated when the data was both normalized and scaled.
PreP+07 implementation provides a high degree of freedom in selecting and organizing a small set of widely used data processing protocols, and can handle many data formats. Its reliability has been proven so that a laboratory researcher can afford a statistical pre-processing of his/her microarray results and obtain a list of differentially expressed genes using PreP+07 without any programming skills. All of this gives support to scientists that have been using previous PreP releases since its first version in 2003.
- Lowess Normalization
- Affymetrix Chip
- Empty Spot
- Error Removal
Large scale gene expression monitoring technology is changing our view of the biological processes, including their dynamics. Hence, microarrays have emerged as the primary tool for studying the expression patterns of thousands of genes from a single experiment. As this technology matures, the ability to generate a large volume of data is accelerating; it is now perfectly normal to use tens or even hundreds of microarrays in a single study.
Microarray data are rich and complex but experimental biases, as well as variations introduced along the various steps in measuring gene expression levels, tend to make them unaffordable as is . Therefore, data pre-processing is highly recommendable for reproducibility, reliability, compatibility and standardization of microarray analysis and results , even if it does not seem so necessary with Affymetrix chips . Microarray data pre-processing, mainly normalization, is used to remove biases within each array by local regression. Many normalization methods often make the assumption that the majority of genes are not differentially regulated or that the number of up-regulated genes roughly equals the number of down-regulated genes. Although these assumptions are not applicable to every case, they do not seem to cause a serious effect on most microarray experiments. Alternatively, several methods have been proposed to normalize microarrays that do not fulfil the previous assumptions [4, 5] for both Affymetrix GeneChip and two-colour array data. In any case, pre-processed data is usually more reliable in order to identify biologically meaningful patterns, since statistical tests to discover differentially expressed genes tend to depend on the experimental design.
An increasing number of academic and commercial solutions have been developed to tackle the pre-processing, each one with particular strengths and weaknesses. The most widely used and comprehensive packages currently belong to the open source software environment: Bioconductor for R , TM4  and GEPAS .
Bioconductor is a collection of extensible open source libraries for R, whose main focus is to deliver a high quality infrastructure and end user tools for expression analysis. Object-oriented programming with well-defined classes is the basis for overcoming data complexity, and a command line interface is the preferred way to access libraries. This makes it very powerful but its use requires skills in statistics and programming capabilities. Data objects generated by R microarray processing packages can be saved in flat text being assimilated by the user, but the reconversion into the original object for further analysis is not always trivial. TM4 is a series of Java based tools that provide users with a well designed, easy to use interface. It consists of four major applications, as well as a MySQL database for maintaining experimental results, that are mainly focused on two-colour microarrays. In spite of its graphical interface, its use is not always intuitive and it also requires statistical skills to pipeline the pre-processing algorithms; it also presents certain computing inefficiency for intensive calculations. Unfortunately, only MeV is actually kept updated . GEPAS is a nice and very used web based tool that allows the use of R packages without any programming skills. However, as most of the web based applications, it faces technological problems: poor interactive interfaces, not suitable for uploading and downloading huge amounts of data, lack on interactivity and data privacy problems, etc. Hence even if GEPAS deals with Affymetrix and two-colour experiments, its implementation presents some limitations.
Laboratory scientists are often challenged by large quantities of data produced by their microarray experiments, the statistics underlying the analysis of their own data, and the usability of applications that contain such statistical treatments. Pre-processing microarray data requires some background in statistics that most users lack or have a very vague notion of. This gap even includes the knowledge of which statistical approaches to use and the correct order in which statistical calculations have to be performed. In such a context, PreP has been in the front line of public software for two-colour microarray analysis , since it helps statistics-unskilled users to manage and analyze data effectively from their microarray experiments. It provides (a) an integrated gallery of techniques to deal with the many sources of measurement errors, including two new algorithms not available in any other tool; (b) an interactive user friendly interface for the visualization of data in an appropriate representation; (c) a standalone application for data privacy; and (d) highly customizable statistical tools to build up a simple error removal pipeline procedure. In spite of being designed as a tool for the analysis of two-colour chips, the data pre-processing of Affymetrix chips is possible through a slight initial preparation of data which consists of assigning one treatment to Cy3 channel and another treatment to the Cy5 channel; data can be thus processed and M and A values can be calculated. Automatic procedures will be soon added to PreP+07 to perform this data preparation. The improvements described here have turned PreP+07 into a user friendly environment that meets microarray pre-processing requirements for users that are not skilled in statistics or programming, but know how to perform a right experimental design concerning microarrays.
Pre-processing methods available in PreP+07
Background correction and filtering
To enable comparison between arrays and experiments, data must be normalized and then replicates need to be resolved before differential expression analysis. Data treatment starts with background subtraction; this can be performed by PreP+07 or obtained from the microarray reading system. When data are supposed to be of high quality, subtraction can be enough; in any other case, background correction may need more artificial adjustments that are not available in Prep+07. Otherwise, PreP+07 has the option to start the normalization without background subtraction. Prep+07 also provides a data filtering tool to remove, for example, low quality spots, taking into account several criteria, such as foreground and background intensity, spot shape, saturation, etc.
Typically, normalization is the first transformation applied to expression data. It aims to adjust the individual hybridization intensities to balance them appropriately so that meaningful biological comparisons can be made . There are many approaches to normalizing expression levels, but the locally weighted linear regression (Lowess) normalization [12, 13] has become the standard since it takes into account systematic biases and intensity specific artefacts that may appear in the data. PreP+07 implements both full parametric global and print-tip Lowess normalization procedure. Since normalised slides might not be comparable, scaling procedure is also provided for inter-slide normalization . As a rule of thumb, no scaling must be performed unless box plots indicate that means of each slide are significantly different. However, some of the proposed methods are not supported by a model. These methods are called non-parametric and they offer, when properly used, a flexible approach to normalization.
Replication deals with the data merging from several repetitions of the same experiment and repeated spots in a single slide. Usually, errors cause data to be dissimilar from one repetition to another, but more knowledge about them is available as the number of replications grows. This information about error effects is collected by statistical procedures. Prep+07 can deal with biological and technical replicates by average (low replicate number), or by median calculations (when there are more than 16 values for each spot ). However, current proposals recommend using a noise (or error) model [14, 15] and then extracting estimators , quality filters , thresholds , etc from it, to be taken into account in solving replications.
This advanced correction method that improves data quality is uniquely implemented in PreP+07. Devices used for measuring intensities are neither perfect nor without limitations. Saturation and quantization, which compromise the high and low spot intensity reads respectively, appear in the scanned images, and are hard to be removed. The double scan method  combines two readings: a low intensity acquisition to avoid saturated spots and a high intensity second reading to avoid quantization, providing as a result a data set without saturation or quantization, so all slide spots become informative.
Array based comparative genome hybridization (aCGH) is applied frequently to study the genomic content of closely related microorganisms, microbial taxonomy and species determination, as well as the presence of microbial pathogenicity factors. With aCGH a difference in signal arises, not only because of the absence or presence of genomic DNA, but also due to differences in sequence identity. This problem in particular plays an important role in bacterial aCGH experiments, since prokaryotes generally show lower genomic conservation than eukaryotes . The Supervised Lowess (SL) normalization method only uses genes that are conserved (LHGs: likely homologous genes) in both samples hybridized for normalization. In a first step, the SL method performs Lowess normalization over the LHG subset of genes, computing the initial log ratios (i.e. Ri (i = 1...N)), followed by Lowess normalization, generating a set of corrected ratios Rc i (i = 1...n, n < N) and correction factors for the subset of conserved genes used: αi = Rc i - Ri. Subsequently, the Lowess correction factors belonging to the subset of conserved genes (αi) are extrapolated to determine the correction factors βj (j = n+1...N) for the remaining genes. The correction factors are then used to adjust the log ratios of the remaining genes. The spot set used for SL can be selected by hand or using the filtering capabilities of Prep+07.
Finally, differential analysis serves to identify outlier spots (differentially expressed ones) whose outlying behaviour is not due to experimental error but biological expression. The differential expression based on a fixed fold change cut-off has been identified as insufficient. Therefore, methods involving calculation of the mean and standard deviations [16, 20] of the spot distribution of log2(ratio) values, and also defining a global fold change difference and confidence , equivalent to a z-test, have been included for a preliminary analysis.
A typical protocol
Methods available in PreP+07 vs PreP 2003 version
Lowess per block
Scaling – Standard Deviation/Median Absolute Deviation – Intraslide/InterSlide
Stat Test – Local/Global – Ztest Ttest
Coherent Slide View
Quality Slide View
MA Quality Graph
MA per blocks
Normality graphs (QQ/PN/PP)
Density Graph per Block
Save expression Matrix
Automatic load of genepix, imagene files
Loading formats automatically
Delete last step
Delete all steps except last
Toolbar redesigned, related buttons consecutive
Slide Alias when you load it
Apply the same structure with a checkbox button to all loaded slides
Tooltip activation button
Special attention has been paid to improve input-output functionality in PreP+07. Input and output files in PreP+07 are tab-delimited text files, which can be readily imported, for instance into Microsoft Excel. In addition, PreP+07 manages its own data format (engene compatible ), and compatibility with Genepix (*.grp), ArrayPRO and text-tabulated output files are also provided. Additionaly to data loading, the meaning of each column must be specified (column functionality setting). Manual or file configuration can be used for this purpose, including the description (sectors, print-tip groups or grids) of slide structure (*.CEL files from AffymetrixTM platform are accepted as well).
PreP+07 also supplies a broad range of alternative output formats, from simply a text tabulated table to gene expression matrixes including statistical characterization. An important and useful feature is the ability to store intermediate results as a PreP project that can later be recovered for further processing.
Supervised Lowess (SL) can be advantageously used when data follow a non normal distribution due to differences in gene sequence identity, as demonstrated in , suggesting that it is appropriate for any microbial aCGH comparison. In any case, SL assesses a normalizing estimate using a subset of genes (sharing strong sequence similarity) and then uses this estimation to remove the error in the rest of genes. This procedure has been successfully applied to spiked-in dual dye DNA microarray data.
Visualization tools available in PreP+07
A synthetic reproduction of the scanned image from the available data.
Comparison with the scanned image, identifying single spots, splitting the slide in blocks and manual testing.
Slide view of coherent spots
A synthetic reproduction of the scanned image only for coherent data.
Evaluation of the quality of the slide and poorly scanned zones (negative or null values are not shown).
Slide view with quality
Uses the blue channel for displaying the quality of the measure.
Combined with algorithms that provide a quality value for each spot.
AM and RG Graphs
(AM) Logarithmic plot of ratio versus intensity; or (RG) log. of red versus green channel
AM displays the dependencies of the ratio on the intensity (ratio correction and filtering); in the (RG) case the two color channels are emphasizing separately.
Box graph of each block of the slide.
Classical statistical graph for detecting outliers and comparing the distribution of diverse data sets (useful tool for detecting contrast variations inter- or intra-slide).
Density Graph and Density Graph per block
This graph estimates the density of ratios (per block).
Preliminary test on the distribution of the ratios. The expected density graph is a normal distribution (per block, helps detecting spatial errors).
A scatter plot showing the intensity values of one scan acquisition versus the same values of another scan acquisition.
This is a first step for comparing two slides. The data should be near the diagonal if the slides are good replicates of each other.
Dispersion, Deviation and Correlation of Replicates
The intensity values of the individual spots versus the mean of all the spots from the same replication group.
Quality estimation of the replication. For dispersion graph, the data points should be along the diagonal, and the more noise, the more blurred they will be. If the deviation is high the quality will decrease
Normality of Replications
Applies the inverse of the normal distribution function to the distribution function of each replication group.
One typical assumption is that the noise is normally distributed. This graph will test that hypothesis. If the data points lie along the diagonal, the noise is normal.
Probability Normal Plots (PP/QQ/PN)
Plots to compare expected normal distribution values against observed values
QQ compares z-scores, PP p-values and PN compares pvalues vs logratios
Replicated data can be visualized, and the quality estimated, by Dispersion, Deviation and Correlation diagrams that expose these statistical values and their dependence on the average of the replicated spot intensities. For the dispersion graph, the data points should be located along the diagonal, and the more noise, the more blurred they will appear, in other words, a lot of spread spots suggests low data quality.
To assess the replication normality, QQ plot compares quantiles of the expected normal distribution with quantiles of the observed data distribution (similar to QQ, the PP shown p-values and the PN draws p-values versus log ratios). These plots are drawn for every step in the project stack.
PreP+07 implements a local deviation procedure to get a preliminary set of differential expressed genes for this issue with three different estimators: (a) windowed local deviation that takes a fraction of spots near the spot whose deviation is to be found, and then it uses those local spots for the estimation; (b) Lowess absolute deviation, that uses a Lowess curve, given a fraction and a number of steps, for absolute deviation fitting, and (c) Lowess standard deviation, similar to (b) but for standard deviation fitting [12, 13] Negative ratios can be managed as symmetric, forcing the deviation to be the same for positive and negative values, or as asymmetric, to allow different deviations for positive and negative values.
PreP+07 is implemented in Visual C++ for the MS-Windows XP OS. It is designed in an object oriented way for robustness and scalability. The code is intended to ease the use of the application. An important goal was making the user interface friendly. This is achieved by extensive visual information, using the operative system's GUI libraries and a high degree of interactivity. The installation of PreP+07 is extremely simple, just downloading the software from the Web site and launching it. A comprehensive user-friendly manual is also available, giving more details about the methods used, and a pertinent guided tour allows a step by step discovery of the software.
A PreP+07 project
Conceptually speaking, a PreP+07 project is a collection of states. Each state is the result of applying a given process over the previous state. The different PreP+07 states are stored in a stack, meaning states are pushed into the stack and only the last state can be removed (popped-up) from the top of the stack. The last state is the current state, this is to say, the state over which the procedures are applied (the rest of states conform the "history"). Each state is self-contained so that it holds all the necessary information to produce a new state (this allows using a test-error approach to obtain the best results).
Each state consists of a collection of slides. The slides represent and contain the information obtained by the scan of a given DNA chip. The slide has an associated name and, when necessary, a set of pre-computed values to be used in a new step. In general, the slide name resumes the experimental conditions. Finally, a slide is a collection of points (spots). Each spot has a set of values that correspond to light intensities, position in the chip, labels, etc. (see Figure 1).
The first state is produced by a special step named the "load step". In this step the slide files are loaded and identified to translate the original data tags into PreP+07 understandable tags. Options available for the "load step" are particular to this stage (and different for the next "normal" steps). Some of the different procedures implemented in PreP+07 can be applied in any context (such as the normalization; adjusting and ratio scaling) while others require specific conditions (e.g. gene replication).
Since Bioconductor packages are considered a standard in microarray analysis, PreP+07 results were compared with it. The comparison rationale has been to obtain normalized log-ratios by applying R and PreP+07 procedures, then use these log-ratios to perform a two-class t-test and detect the differential expressed genes in both datasets using the Multi Expression Viewer (MeV) program from the TM4 .
A complete set of experimental data obtained in the framework of ESPSOL Spanish project  with Solanum lycopersicum has been used to obtain a set of differentially expressed genes, following the typical protocol described previously. The set was composed of 6 tomato microarrays hybridized to samples representing two different conditions, (three biological replicates for each one called A1, A2, A3, and B1, B2, B3). To keep data confidentially, random Gene IDs were assigned for the tomato sequences. The experimental design includes a dye-swap and images were obtained with the GenePix technology. These chips are organized in 4 × 12 blocks (row major) and each block contains 16 rows and 17 columns (13056 spots), including 896 empty and intra-slide replicates for some tomato ESTs and negative controls, identified by the same ID. In particular 140 different spots contain 14 different negative controls (belonging to different species) and 174 spots contain replicates for 16 ESTs. So finally, 12020 spots correspond to tomato sequences in the chip [see Additional File 1]. All scan acquisitions were performed at normal intensity (PMT GAIN = 730V × 610V) with a minimal number of saturated signals (less than 0,55% in all cases).
Differentially expressed genes obtained with the FL protocol using PreP+07 contrasting their rank-position against Limma ranking.
 P+07 pvalue
 P+07 rank
 R pvalue
 R rank
 pvalue difference
Detailed information of spots 10247 and 2213.
An additional experiment with no proprietary dataset has been performed using a public dataset from GEO (accession GPL7275). Samples belong to NK cells of C57BL/6 mice either mock-infected or infected with P. chabaudi with ID codes from GSM319497 to GSM319502 (3 samples per condition) (see a complete description in http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12727). The object is to identify differentially expressed genes by the infection with P. chabaudi. The protocol used for the analysis was the same but empty or low quality spots were not removed. This data set also shows a significant agreement between R and PreP results [see Additional File 1].
Pre-processing is a necessary step when preparing gene expression data for analysis since raw data carry instrumental and operator errors. Moreover, these biases are not constant across experiments, rendering the data inconsistent. Furthermore, the preprocessing methods should keep real differential values (over- or under-expressed genes) still identifiable, and this must be achieved by using outlier detection and robust statistical methods.
PreP+07 is an attempt to reduce the barriers between scientists that hybridize microarrays and statisticians that analyze microarray results in depth. In other words, PreP+07 enables scientists to prepare their data and conducts a basic analysis of differential expression which is ready for closer and more specialised inspection. Hence, PreP+07 has been designed a standalone interactive graphical suite to integrate widely used pre-processing methods for gene expression data that aims to minimize sources of systematic and random variation in the acquired data, other than those directly related to differential expression. PreP+07 includes a variety of analytical tools for reducing dependencies of intensity and, when available, allowing the resolution of replicated data sets. In some cases these can be applied in any context (such as the normalization, adjusting and ratio scaling). In other cases, some specific conditions have to be met (e.g. gene replication). Once the error has been minimized, PreP+07 allows extracting the individual control, target signals and their ratio since most of the techniques available on PreP+07 are based on robust statistical procedures, thus being respectful to outliers and differentiated values.
Statistical microarray analyses (e.g. Limma/Bioconductor) require a collection of biological and technical replicates in order to obtain information about what genes are differentially expressed. In addition to this, PreP+07 also provides the opportunity of analysing differentially expressed genes slide-by-slide by means of a t-test or z-scores statistics. Slide-by-slide analysis can be very helpful for researchers unskilled in statistical methods that want to obtain an overview of their results. These advantages are strengthened by the interactive interface of PreP+07, which allows the identification of values and quality of every spot on the slide in each plot. The available plot set enables data visualization using different criteria to assess data reliability.
Among the multiple advantages of using PreP+07, the most remarkable characteristics are (a) the visualizations tools are completely interactive, with optional tooltips for each coloured spot in the graph to display complete information aimed to identify outliers spots and obtain their information visually (including tracking information about the number of coherent/incoherent or filtered spots); (b) new and unique methods such as Supervised Lowess and double scan regression; (c) intuitive and powerful replication resolution that allows users to combine inter- and intra-slide replicates; (d) comparable results with most used related software allowing non-bioinformaticians to do the same pre-processing procedures using a graphical and intuitive interface, ensuring data privacy and high quality images; and (e) data results and inputs are interchangeable between programs (i.e. R output can be loaded into PreP to realize different analysis and vice versa).
The learning curve in PreP+07 can be expected to be smoother than the learning curve in R Bioconductor, with PreP+07 the biologist does not need to have prior knowledge about scripting and simple steps such as loading the data and applying filtering or lowess could be done intuitively the first time the user runs the program.
PreP+07 is intended for preliminary microarray analysis for users unskilled in statistical microarray treatments or without scripting languages' capabilities. This is why it is an integrated application that contains only well-known and widely used methods (not all available methods or applicable methods) such as print-tip-lowess, lowess or scale. The idea is not to open a wide range of opportunities, but to offer a small collection of reliable workflows with the necessary options to reach normalised data and even a set of differentially expressed genes.
PreP+07 has been exhaustively tested in various research projects, like aCGH with spiked-in dual dye , Express Fingerprints , Gene expression pattern and protein profile in pigs infected by circovirus  and ESP-SOL Project .
▪ Project name: PreP+07.
▪ Project home page: http://www.bitlab-es.com/prep
▪ Operating system(s): Windows XP.
▪ Programming language: Visual C++.
▪ Other requirements: none.
▪ License: free software.
▪ Any restriction to use by non-academics: none.
This work has been partially financed by the ESPSOL project conducted by the National Institute for Bioinformatics http://www.inab.org, a platform of Genoma España and the EU project "Advancing Clinico Genomic Trials on Cancer" (EU-contract no.026996).
The authors would like to make a special mention to Antonio Granell, Asuncion Fernandez and Sophie Mirabel from the ESP-SOL project for their contribution in the laboratory work. The authors acknowledge the initial work of Sacha v. Hijum and Jorge Garcia de la Nava in the first versions of PreP.
- Do Jin, Choi Dong-Kug: Normalization of microarray data: single-labeled and dual-labeled arrays. Mol Cells 2006, 22: 254–61.PubMedGoogle Scholar
- Barbacioru CatalinC, Wang Yulei, Canales RogerD, Sun YongmingA, Keys DavidN, Chan Frances, Poulter KarenA, Samaha RaymondR: Effect of various normalization methods on Applied Biosystems expression array system data. BMC Bioinformatics 2006, 7: 533. 10.1186/1471-2105-7-533PubMed CentralView ArticlePubMedGoogle Scholar
- Klebanov Lev, Yakovlev Andrei: How high is the level of technical noise in microarray data? Biology Direct 2007, 2: 9. 10.1186/1745-6150-2-9PubMed CentralView ArticlePubMedGoogle Scholar
- Zhao Y, Li M-C, Simon R: An adaptive method for cDNA microarray normalization. BMC Bioinformatics 2005, 6: 28. 10.1186/1471-2105-6-28PubMed CentralView ArticlePubMedGoogle Scholar
- Oshlack Alicia, Emslie Dianne, Corcoran LynnM, Smyth GordonK: Normalization of boutique two-color microarrays with a high proportion of differentially expressed probes. Genome Biology 2007, 8(1):R2. 10.1186/gb-2007-8-1-r2PubMed CentralView ArticlePubMedGoogle Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Bioogy 2004, 5(10):R80. 10.1186/gb-2004-5-10-r80View ArticleGoogle Scholar
- Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, Sturn A, Snuffin M, Rezantsev A, Popov D, Ryltsov A, Kostukovich E, Borisovsky I, Liu Z, Vinsavich A, Trush V, Quackenbush J: TM4: a free, open source system for microarray data management and analysis. Biotechniques 2003, 34(2):374–8.PubMedGoogle Scholar
- 8. Tárraga J, Medina I, Carbonell J, Huerta-Cepas J, Minguez P, Alloza E, Al-Shahrour F, Vegas-Azcárate S, Goetz S, Escobar P, Garcia-Garcia F, Conesa A, Montaner D, Dopazo J: GEPAS, a web based tool for microarray data analysis and interpretation. Nucleic Acids Res 2008, (36 Web Server):W308–14. 10.1093/nar/gkn303Google Scholar
- Chu VT, Gottardo R, Raftery AE, Bumgarner RE, Yeung KY: MeV+R: using MeV as a graphical user interface for Bioconductor applications in microarray analysis. Genome Biology 2008, 9: R118. 10.1186/gb-2008-9-7-r118PubMed CentralView ArticlePubMedGoogle Scholar
- Garcia de la Nava Jorge, van Hijum Sacha, Oswaldo Trelles: PreP: gene expression data pre-processing. Bioinformatics 2003, 19(17):2328–2329. 10.1093/bioinformatics/btg318View ArticlePubMedGoogle Scholar
- Quackenbush J: Microarray data normalization and transformation. Nat Genet 2002, 32: 496–501. 10.1038/ng1032View ArticlePubMedGoogle Scholar
- Dudoit S, Yang YH, Luu P, Speed TP: Normalization for cDNA microarray data. In Proceedings of SPIE Edited by: Bittner YML. 2001.Google Scholar
- Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research 2002, 30(4):e15. 10.1093/nar/30.4.e15PubMed CentralView ArticlePubMedGoogle Scholar
- Ideker T, Thorsson V, Siegel AF, Hood LE: Testing for Differentially Expressed Genes by Maximum Likelihood Analysis of Microarray Data. Journal of Computational Biology 2000, 7(6):805–817. 10.1089/10665270050514945View ArticlePubMedGoogle Scholar
- Rocke DM, Durbin B: A model for measurement error for gene expression arrays. Journal of Computational Biology 2001, 8(6):557–69. 10.1089/106652701753307485View ArticlePubMedGoogle Scholar
- Tseng GC, Oh M, Rohlin L, Liao JC, Wong WH: Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variation and assessment of gene effects. Nucleic Acids Research 2001, 29(12):2549–2557. 10.1093/nar/29.12.2549PubMed CentralView ArticlePubMedGoogle Scholar
- Sabatti C, Karsten SL, Geschwind DH: Thresholding Rules for Recovering a Sparse Signal from Microarray Experiments. Mathematical Biosciences 2002, 176: 17–34. 10.1016/S0025-5564(01)00102-XView ArticlePubMedGoogle Scholar
- Garcia de la Nava Jorge, van Hijum Sacha, Oswaldo Trelles: Saturation and quantization reduction in microaray experiments using two scans at different sensitivities. Statistical application in genetics and molecular biology 2004, 3(1):article 11.View ArticleGoogle Scholar
- van Hijum S, Baerends R, Zomer A, Karsens H, Martin-Requena V, Trelles O, Kok J, Kuipers O: Supervised Lowess normalization of comparative genome hybridization data – application to lactococcal strain comparisons. BMC Bioinformatics 2008, 9: 93. 10.1186/1471-2105-9-93PubMed CentralView ArticlePubMedGoogle Scholar
- Lee MT, Kuo FC, Whitmore GA, Sklar J: Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridisations. PNAS 2000, 97(18):9834–9839. 10.1073/pnas.97.18.9834PubMed CentralView ArticlePubMedGoogle Scholar
- Chen Y, Dougherty ER, Bittner ML: Ratio Based decisions and the quantitative analysis of cDNA microarray images. Journal of Biomedical Optics 1997, 2(4):364–374. 10.1117/12.281504View ArticlePubMedGoogle Scholar
- Garcia de la Nava J, Franco-Santaella D, Cuenca J, Carazo JM, Trelles O, Pascual-Montano A: Engene: The processing and exploratory analysis of gene expression data. Bioinformatics 2003, 19(5):657–8. 10.1093/bioinformatics/btg028View ArticlePubMedGoogle Scholar
- ESP-SOL Project[http://www.bitlab-es.com/espsol]
- Express Fingerprints[http://cordis.europa.eu/data/PROJ_FP5/ACTIONeqDndSESSIONeq112482005919ndDOCeq1132ndTBLeqEN_PROJ.htm]
- Gene expression pattern and protein profile in pigs infected by circovirus[http://www.uco.es/investiga/grupos/mgm/proyectos.html]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.