Workflows for microarray data processing in the Kepler environment

Stropp, Thomas; McPhillips, Timothy; Ludäscher, Bertram; Bieda, Mark

doi:10.1186/1471-2105-13-102

BMC Bioinformatics

Table 1 Microarray Workflow Listing

From: Workflows for microarray data processing in the Kepler environment

Workflow file name	Goal
*GFF file workflows*
	*Descriptive Statistics and File Information Group*
DisplayRegion.xml	Create a graphical display of the value field of a GFF file (like output provided by NimbleGen SignalMap)
GeneralHist.xml	Create a histogram of a given column of a text file. Useful for microarray GFF files.
gffFreqPoly_python.xml	Make several frequency polygons superimposed on one another for comparison.
gffFullDescription.xml	Display information about the GFF file specified.
gffQuickLook.xml	Displays first few lines of a GFF file.
gffStats_gffread_simple.xml	Calculate min, max, mean, median, num of lines, and various percentiles of a specified field. (Python version)
gffStats_Rbased_simple.xml	Calculate min, max, mean, median, num of lines, and various percentiles of a specified field. (R version)
ProbeSpacings.xml	Make a histogram of the probe spacings of a GFF file.
	*File Modification Group*
AddComments.xml	Add comments to the beginning of a GFF file.
gffMakeTinyl.xml	Greatly reduces the size of a GFF so that loading and processing is much faster. Reduces file size by replacing the second, third, and last fields of the file with placeholders. Assumes that these fields are the same in all lines.
gffModThirdField.xml	Modify the third field of a GFF file.
	*File Processing Group (Sorting, Smoothing, Normalization, Subtraction, Splitting)*
gffSmooth.xml	Median smooth (length 3) the 6th column of GFF files.
gffSort.xml	Sort a GFF file in chromosome + start point order (actually field 1 then field 4 order).
QuantNorm.xml	Quantile normalize the 6th field (ratio field) of a series of GFF files.
gffQN_SM3_TINY.xml	Quantile Normalize, Smooth, and Tiny-ize a set of GFF files. See gffMakeTiny.xml for explanation of Tiny-ize.
gffSubtract.xml	Subtract one GFF file from another GFF file (result based on subtraction of values in field 6).
gffSplit.xml	Split a GFF file containing the strings ‘tiled region’, ‘transcription_start_site’, and ‘primary_transcript’ into 3 separate files.
	*Binding Site Detection*
RunDetection.xml	Calculates runs of ratios (6th field) that are greater than or equal to the specified percentile of that column. Can be used for binding site detection for ChIP-chip as in [26].
RunDetection_with_annotation.xml	RunDetection workflow with added annotation of resulting binding sites (e.g. nearest gene) by using R/BioConductor ChIPpeakAnno package
*Affymetrix Analysis*
AMDA.xml	Perform Affymetrix gene expression microarray analysis.
AMDA_limmafinal.xml	Variant of AMDA workflow using limma package [28] for differentially expressed gene determination.
*PCR Primer Design*
PrimerDesign.xml	Pick sets of primers, given a chromosome range from user. Uses UCSC genome browser for outputs.
*General Utilities*
Regex_R.xml	Simple example of find a substring within a string using regular expressions in R framework.
kepler_cut.xml	clone UNIX ‘cut’ command
kepler_paste.xml	clone UNIX ‘paste’ command
kepler_sort.xml	clone UNIX ‘sort’ command

These workflows are further described in Additional file 2: Table S 1. Each workflow is displayed in Additional file 1: Figures S1-S26.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com