TableButler – a Windows based tool for processing large data tables generated with high-throughput methods
© Schwager et al; licensee BioMed Central Ltd. 2009
Received: 14 October 2008
Accepted: 29 July 2009
Published: 29 July 2009
High-throughput "omics" based data analysis play emerging roles in life sciences and molecular diagnostics. This emphasizes the urgent need for user-friendly windows-based software interfaces that could process the diversity of large tab-delimited raw data files generated by these methods. Depending on the study, dozens to hundreds of these data tables are generated. Before the actual statistical or cluster analysis, these data tables have to be combined and merged to expression matrices (e.g., in case of gene expression analysis). Gene annotations as well as information concerning the samples analyzed may be appended, renewed or extended. Often additional data values shall be computed or certain features must be filtered out.
In order to perform these tasks, we have developed a Microsoft Windows based application, "TableButler", which allows biologists or clinicians without substantial bioinformatics background to perform a plethora of data processing tasks required to analyze the large-scale data.
TableButler is a monolithic Windows application. It is implemented to handle, join and preprocess large tab delimited ASCII data files. The intuitive user interface enables scientists (e.g. biologists, clinicians or others) to setup workflows for their specific problems by simple drag-and drop like operations.
For more details about TableButler, visit http://www.OncoExpress.org/software/tablebutler.
DNA filter- and microarrays are widely used in functional genomics research. Complete genomes can be spotted on such arrays. After hybridization and image analysis large data tables are generated. From each hybridization ten thousands (genome wide expression arrays) to hundred thousands (genome wide filter arrays or CGH microarrays) of data lines for all measured gene features are generated and saved. Data may be saved as structured XML-documents, mostly using well defined and standardized MAGE-ML  object model and definitions. This requires subsequent use of programs that can import XML documents (e.g. commercial solutions like Rosetta Resolver  or open source tools like Bioconductor  based on R package ). Alternatively, most programs can generate generic tab-delimited text files, which can easily be imported into nearly any spreadsheet or statistics program or databases. Depending on the study type, dozens to hundreds of these data tables are generated. Before the actual statistical or cluster analysis, these data tables have to be combined and merged to expression matrices, gene annotations or sample informations may be appended, renewed or extended. Often additional data values are to be computed or certain features must be filtered out.
One way to perform such tasks can be the use of commercially available microarray databases with integrated handling and analyses tools (e.g. Rosetta Resolver, Agilent ). Large institutes have developed customized solutions (e.g. SMD, Stanford ). Alternatively open source solutions (e.g. BASE  and JExpress  or TM4 ) may be setup. However, all such solutions require considerable computer expertise both for the installation set-up and for the system maintenance.
Some of the tasks mentioned above may also be solved with standard spreadsheet programs from office packages (e.g. OpenOffice ). Unfortunately, both the commercial as well as the freeware solutions have severe limitations. Data files may not exceed 65000 rows and/or 255 columns and may create bizarre results when using incorrect national settings for number or time formats.
Moreover, one can implement such tools "de novo" (using e.g. Perl , C  or R ), which again requires expert knowledge from bio-informaticians. In fact, this approach requires an installation of the respective development environments and – even more critical – detailed background knowledge and experience on development and optimization of algorithms as well as the implementation of such tasks.
In contrast, our here presented solution, TableButler is a standalone application (less then 1 Megabyte) which can perform most of the commonly used operations prior to statistical or cluster analysis of microarray data. At present, TableButler exclusively works with tab-delimited data files, avoiding the need to keep track with file format changes in proprietary spreadsheet formats or varying XML-dialects to enwrap the information. The rich MS Windows user interface allows convenient set-up of operations for non-bioinformatics educated users. By default, all derived data files are generated with new file names, thus preventing data loss due to erroneous actions.
Parameters of interactively set-up filters and operations may be saved and recalled later on for similar operations. This guarantees consistent pre-processing of data tables across project and users.
Not all rows (features) from a hybridization file are required or suited for subsequent statistical or cluster analysis. Spotting controls or spike-in genes for quality tracking of the wet-lab processing steps (RNA extraction, amplification, labeling, etc.) do not contribute any biological information for the study. Low quality genes can increase the signal noise in the statistical tests. Row filtering can be used to remove thus data rows from the data. Rows may be filtered upon text or numerical content of a single data column. Several filters (e.g. remove all genes containing "control" in the gene's description and quality flag <>"Pass") may be combined in a single run.
Often additional data values or data transformations may be useful or required before further analysis. TableButler offers a variety of simple arithmetic, textual and statistical functions that are applied to data values in each gene row:
Simple arithmetic (e.g. add/subtract constants to data columns, Log2, Log10->log2 transform, change sign, invert numbers, column sums, differences and ratios)
Basic statistics, (min, max, arithmetic/geometric mean, variance, standard deviation t-test, ANOVA),
Spot coordinate transformation (Sub grid, Row, Column ->Metarow, Metacolumn, Row, Column and inverse),
Basic normalization (mean/median centring/normalisation)
Data imputation for missing values (constant, row average, hot deck, most similar)
Replica averaging of replicated genes (using gene ids/names as replica indicator)
Text functions (replace find, split text, split complicated text using regular expressions...)
Date to number conversions
Building a matrix
Splice data tables
Here various functions to cut and combine data tables are found:
Remove certain numbers of rows/columns from data files
Append files (row or columns wise)
Remove rows with replicated values in key columns (e.g. remove duplicated gene rows)
Logically combine data files using a key column (Venn like analysis: get data rows from multiple files containing same genes in key columns using logical operators AND, OR, NOT, XOR).
provides several graphs to visually inspect data with standard graphs:
Scatter plots, R/I-plots, quantile plots, Line graphs, Histograms, Box plots, Heat maps
In most cases, multiple operations (filtering, computations) may be combined. Some operations (e.g. t-tests) add multiple new columns to the data files. Here it is recommended to run such operations separately. Parameter sets for operations may be saved and recalled later, allowing standard processing of homologues data sets.
Furthermore, multiple filters may be combined in scripts, to realize complicated data workflows. An internal script editor allows composing scripts, supplying allowed script commands in nested pop-up menus. Scripts can be prototyped interactively, saving customized parameters for the single operations. Scripts may be loaded and executed manually or may be run automatically when TableButler is started with command line parameters.
TableButler may even be run as server: A user-defined folder is watched. Any TableButler scripts dropped to this folder are automatically loaded and executed. The script folder or referenced data folder may be located on shared network resources.
Results and discussion
TableButler is a native Win32 application implemented with Borland's Delphi 5 and runs on Win32 operation systems (e.g. Win98, NT, 2000, XP, Vista). It does not require any additional supporting programs or libraries. TableButler can be copied to any computer with basic user privileges.
TableButler was applied in several collaborative research projects for preprocessing of gene expression data from large format filter arrays (140000 and 76000 features on filter macro-arrays ), custom spotted c-DNA microarrays (56000 features, [13–19],) and commercial Affymetrix (44000 features ).
For more details about TableButler's functionality and usage, visit the web page: http://www.OncoExpress.org/software/tablebutler.
TableButler is a monolithic Windows application. It is implemented to handle, join and preprocess batches of large tab delimited ASCII data files. The intuitive user interface enables scientists (e.g. biologists, clinicians or others) to setup workflows for their specific problems by simple drag-and drop like operations. Special knowledge about scripting languages (Perl, VBS, Java, SQL ...) is not required. TableButler can be executed without installation even from a memory stick. It does not require any supporting libraries or tools.
TableButler may be applied to any kind of tab delimited data table files: DNA expression data, Micro-RNA data, protein data, etc., even lists of telephone numbers or mp3-songs.
TableButler application was implemented in the course of research projects supported by: Deutsche Krebshilfe (Grant # 106997), DFG National Priority Research Program „The Tumor-Vessel Interface" SPP1190" (Grant AB-388), and the Tumorzentrum Heidelberg-Mannheim.
- Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart D, Sherlock G, Ball C, Lepage M, Swiatek M, Marks WL, Goncalves J, Markel S, Iordan D, Shojatalab M, Pizarro A, White J, Hubley R, Deutsch E, Senger M, Aronow BJ, Robinson A, Bassett D, Stoeckert CJ Jr, Brazma A: Design and implementation of microarray gene expression markup language (MAGE-ML). Genome 2002., 3(9):Google Scholar
- Rosetta Resolver Rosetta Biosoftware, 401 Terry Avenue N, Seattle, WA 98109 USA;
- Gentleman RC, Carey VJ, Bates DJ, Bolstad BM, Dettling M, et al.: Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology 2004, 5: R80. 10.1186/gb-2004-5-10-r80PubMed CentralView ArticlePubMedGoogle Scholar
- The R Project for Statistical Computing[http://www.r-project.org/]
- Sherlock G, Hernandez-Boussard T, Kasarskis A, Binkley G, Matese JC, Dwight SS, Kaloper M, Weng S, Jin H, Ball CA, Eisen MB, Spellman PT, Brown PO, Botstein D, Cherry JM: The Stanford Microarray Database. Nucleic Acids Res 2001, 29(1):152–5. 5. Saal LH, Troein C, Vallon-Christersson J, Gruvberger S, Borg A, Peterson C: BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data; Genome Biol. 2002 Jul 15;3(8) 5. Saal LH, Troein C, Vallon-Christersson J, Gruvberger S, Borg A, Peterson C: BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data; Genome Biol. 2002 Jul 15;3(8) 10.1093/nar/29.1.152PubMed CentralView ArticlePubMedGoogle Scholar
- Dysvik B, Jonassen I: J-Express: Exploring Gene Expression Data using Java. Bioinformatics 2001, 17: 369–370. 10.1093/bioinformatics/17.4.369View ArticlePubMedGoogle Scholar
- Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, Sturn A, Snuffin M, Rezantsev A, Popov D, Ryltsov A, Kostukovich E, Borisovsky I, Liu Z, Vinsavich A, Trush V, Quackenbush J: TM4: a free, open-source system for microarray data management and analysis. Biotechniques 2003, 34(2):374–8.PubMedGoogle Scholar
- OpenOffice.org – The free and open productivity suite[http://www.openoffice.org/index.html]
- The Perl Directory – perl.org[http://www.perl.org/]
- GCC, the GNU Compiler Collection – GNU Project – Free Software Foundation(FSF)[http://gcc.gnu.org/]
- Glas A, Floore A, Delahaye L, Witteveen A, Pover R, Bakx N, Lahti-Domenici J, Bruinsma T, Warmoes T, Bernards R, Wessels L, Van't Veer L: Converting a breast cancer microarray signature into a high-throughput diagnostic test. BMC Genomics 2006, 7: 278. 10.1186/1471-2164-7-278PubMed CentralView ArticlePubMedGoogle Scholar
- Abdollahi A, Hahnfeldt P, Maercker C, Grone HJ, Debus J, Ansorge W, Folkman J, Hlatky L, Huber PE: Endostatin's antiangiogenic signaling network. Mol Cell 2004, 13: 649–663. 10.1016/S1097-2765(04)00102-9View ArticlePubMedGoogle Scholar
- Wagner W, Wein F, Seckinger A, Frankhauser M, Wirkner U, Krause U, Blake J, Schwager C, Eckstein V, Ansorge W, Ho AD: Comparative characteristics of mesenchymal stem cells from human bone marrow, adipose tissue, and umbilical cord blood. Exp Hematol 2005, 33(11):1402–16. 10.1016/j.exphem.2005.07.003View ArticlePubMedGoogle Scholar
- Wagner W, Laufs S, Blake J, Schwager C, Wu X, Zeller JW, Ho AD, Fruehauf S: Retroviral integration sites correlate with expressed genes in hematopoietic stem cells. Stem Cells 2005, 23(8):1050–8. 10.1634/stemcells.2005-0006View ArticlePubMedGoogle Scholar
- Wagner W, Saffrich R, Wirkner U, Eckstein V, Blake J, Ansorge A, Schwager C, Wein F, Miesala K, Ansorge W, Ho AD: Hematopoietic progenitor cells and cellular microenvironment: behavioral and molecular changes upon interaction. Stem Cells 2005, 23(8):1180–91. 10.1634/stemcells.2004-0361View ArticlePubMedGoogle Scholar
- Wagner W, Ansorge A, Wirkner U, Eckstein V, Schwager C, Blake J, Miesala K, Selig J, Saffrich R, Ansorge W, Ho AD: Molecular evidence for stem cell function of the slow-dividing fraction among human hematopoietic progenitor cells by genome-wide analysis. Blood 2004, 104(3):675–86. 10.1182/blood-2003-10-3423View ArticlePubMedGoogle Scholar
- Almstrup K, Hoei-Hansen CE, Nielsen JE, Wirkner U, Ansorge W, Skakkebaek NE, Rajpert-De Meyts E, Leffers H: Genome-wide gene expression profiling of testicular carcinoma in situ progression into overt tumours. Br J Cancer 2005, 92(10):1934–41. 10.1038/sj.bjc.6602560PubMed CentralView ArticlePubMedGoogle Scholar
- Almstrup K, Hoei-Hansen CE, Wirkner U, Blake J, Schwager C, Ansorge W, Nielsen JE, Skakkebaek NE, Rajpert-De Meyts E, Leffers H: Embryonic stem cell-like features of testicular carcinoma in situ revealed by genome-wide gene expression profiling. Cancer Res 2004, 64(14):4736–43. 10.1158/0008-5472.CAN-04-0679View ArticlePubMedGoogle Scholar
- Domhan S, Muschal S, Schwager C, Morath C, Wirkner U, Ansorge W, Maercker C, Zeier M, Huber PE, Abdollahi A: Molecular mechanisms of the antiangiogenic and antitumor effects of mycophenolic acid. Mol Cancer Ther 2008, 7(6):1656–68. 10.1158/1535-7163.MCT-08-0193View ArticlePubMedGoogle Scholar
- Abdollahi A, Schwager C, Kleeff J, Esposito I, Domhan S, Peschke P, Hauser K, Hahnfeldt P, Hlatky L, Debus J, Peters JM, Friess H, Folkman J, Huber PE: Transcriptional network governing the angiogenic switch in human pancreatic cancer. Proc Natl Acad Sci USA 2007, 104(31):12890–5. 10.1073/pnas.0705505104PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.