Skip to main content

Table 1 Microarray Workflow Listing

From: Workflows for microarray data processing in the Kepler environment

Workflow file name

Goal

GFF file workflows

 

Descriptive Statistics and File Information Group

DisplayRegion.xml

Create a graphical display of the value field of a GFF file (like output provided by NimbleGen SignalMap)

GeneralHist.xml

Create a histogram of a given column of a text file. Useful for microarray GFF files.

gffFreqPoly_python.xml

Make several frequency polygons superimposed on one another for comparison.

gffFullDescription.xml

Display information about the GFF file specified.

gffQuickLook.xml

Displays first few lines of a GFF file.

gffStats_gffread_simple.xml

Calculate min, max, mean, median, num of lines, and various percentiles of a specified field. (Python version)

gffStats_Rbased_simple.xml

Calculate min, max, mean, median, num of lines, and various percentiles of a specified field. (R version)

ProbeSpacings.xml

Make a histogram of the probe spacings of a GFF file.

 

File Modification Group

AddComments.xml

Add comments to the beginning of a GFF file.

gffMakeTinyl.xml

Greatly reduces the size of a GFF so that loading and processing is much faster. Reduces file size by replacing the second, third, and last fields of the file with placeholders. Assumes that these fields are the same in all lines.

gffModThirdField.xml

Modify the third field of a GFF file.

 

File Processing Group (Sorting, Smoothing, Normalization, Subtraction, Splitting)

gffSmooth.xml

Median smooth (length 3) the 6th column of GFF files.

gffSort.xml

Sort a GFF file in chromosome + start point order (actually field 1 then field 4 order).

QuantNorm.xml

Quantile normalize the 6th field (ratio field) of a series of GFF files.

gffQN_SM3_TINY.xml

Quantile Normalize, Smooth, and Tiny-ize a set of GFF files. See gffMakeTiny.xml for explanation of Tiny-ize.

gffSubtract.xml

Subtract one GFF file from another GFF file (result based on subtraction of values in field 6).

gffSplit.xml

Split a GFF file containing the strings ‘tiled region’, ‘transcription_start_site’, and ‘primary_transcript’ into 3 separate files.

 

Binding Site Detection

RunDetection.xml

Calculates runs of ratios (6th field) that are greater than or equal to the specified percentile of that column. Can be used for binding site detection for ChIP-chip as in [26].

RunDetection_with_annotation.xml

RunDetection workflow with added annotation of resulting binding sites (e.g. nearest gene) by using R/BioConductor ChIPpeakAnno package

Affymetrix Analysis

AMDA.xml

Perform Affymetrix gene expression microarray analysis.

AMDA_limmafinal.xml

Variant of AMDA workflow using limma package [28] for differentially expressed gene determination.

PCR Primer Design

PrimerDesign.xml

Pick sets of primers, given a chromosome range from user. Uses UCSC genome browser for outputs.

General Utilities

Regex_R.xml

Simple example of find a substring within a string using regular expressions in R framework.

kepler_cut.xml

clone UNIX ‘cut’ command

kepler_paste.xml

clone UNIX ‘paste’ command

kepler_sort.xml

clone UNIX ‘sort’ command

  1. These workflows are further described in Additional file 2: Table S 1. Each workflow is displayed in Additional file 1: Figures S1-S26.