TmaDB: a repository for tissue microarray data
© Sharma-Oates et al; licensee BioMed Central Ltd. 2005
Received: 19 April 2005
Accepted: 01 September 2005
Published: 01 September 2005
Tissue microarray (TMA) technology has been developed to facilitate large, genome-scale molecular pathology studies. This technique provides a high-throughput method for analyzing a large cohort of clinical specimens in a single experiment thereby permitting the parallel analysis of molecular alterations (at the DNA, RNA, or protein level) in thousands of tissue specimens. As a vast quantity of data can be generated in a single TMA experiment a systematic approach is required for the storage and analysis of such data.
To analyse TMA output a relational database (known as TmaDB) has been developed to collate all aspects of information relating to TMAs. These data include the TMA construction protocol, experimental protocol and results from the various immunocytological and histochemical staining experiments including the scanned images for each of the TMA cores. Furthermore the database contains pathological information associated with each of the specimens on the TMA slide, the location of the various TMAs and the individual specimen blocks (from which cores were taken) in the laboratory and their current status i.e. if they can be sectioned into further slides or if they are exhausted. TmaDB has been designed to incorporate and extend many of the published common data elements and the XML format for TMA experiments and is therefore compatible with the TMA data exchange specifications developed by the Association for Pathology Informatics community. Finally the design of the database is made flexible such that TMA experiments from several types of cancer can be stored in a single database, which incorporates the national minimum data set required for pathology reports supported by the Royal College of Pathologists (UK).
TmaDB will provide a comprehensive repository for TMA data such that a large number of results from the numerous immunostaining experiments can be efficiently compared for each of the TMA cores. This will allow a systematic, large-scale comparison of tumour samples to facilitate the identification of gene products of clinical importance such as therapeutic or prognostic markers. In addition this work will contribute to the establishment of a standard for reporting TMA data analogous to MIAME in the description of microarray data.
On a single TMA block up to 1000 specimens can be arrayed representing a number of different pathologies and tissue types both, normal and tumour. The recipient block is then sectioned off into a number of slides (from 50 – 400) . These slides can then be screened at the DNA, RNA and protein level by fluorescence in situ hybridisation (FISH), in situ hybridisation (ISH), and immunohistochemistry (IHC), respectively.
There is a large quantity of data associated with a TMA experiment. These include images, quantified experimental results, patient demographics and pathology reports of each specimen . The vast quantity of data generated by TMA experiments and the different types of information associated with each experiment has led to the need for a central data repository. Thus far there have only been three papers published on approaches for the archiving and analysis of TMA data and all three have limitations [5, 6]. The relational database management system of Manley et al.  associates pathology data with immunohistochemical staining results but is restricted to just one type of cancer in this case being prostate cancer and cannot be adapted to include pathology data from different types of cancer. Liu et al.  on the other hand focus on the archiving and the analysis of TMA staining results but offer no means to relate the pathology information. Finally, the TAD database developed by Coombes et al.  is a tool focusing on enabling pathologists to solely score cores on TMA slides generated from a single TMA block. It is therefore considered too restrictive as a database for archiving TMA data.
Currently there are no central databases that facilitate the storage and analysis of TMA data that can also accommodate pathological information from different types of cancer. This has led to the design and implementation of TmaDB a relational database that archives all aspects of TMA data into a single database that can be accessed from remote locations to both update and retrieve data. The database design is flexible, thus incorporating pathological information from different types of cancer into one central database. The database design is compliant with recently recommended data exchange specifications [4, 8].
Construction and contents
The "block" table is the key table in the database to which all other tables are related (directly or indirectly). This table contains the specimen identifier and other data relating to a particular specimen. The "path_report" table is another important table that stores some generic information about a patients' medical history regardless of the specific cancer diagnosis. The specific information associated with a particular type of cancer is stored in the disease-specific tables. The disease specific information is obtained from the national minimum data set required for the reporting of cancer pathological investigations. This minimum data set is a standard supported by the Royal College of Pathologists. The "tma" table stores the information about the design and the construction of the TMA block (also referred as the recipient block), which can then be sectioned onto several slides (Figure 1). Each of these slides is stained with a specific antibody or a histochemical stain. The staining protocol, the slide image and other details of the experiment are stored in the "tma_experiment" table. Each of the cores on the TMA slide has a stain result and a scanned image associated with it that is stored in the "core_expt_results" table. Other information regarding the quality of the core and the tissue diagnosis is stored in the "core_info" table. The "whole_tissue_section" table stores the staining protocol as well as the results and the images of the stain generated from the traditional whole tissue section (if available) for the clinical specimens used in the TMA.
Information on the storage location in the laboratory of each block and whether the block is still in use is stored in the block table along with when this information was last updated. This enables a block to be found easily in the event it is required for future experiments. This information is uploaded from the tab delimited text file containing details of TMA construct design. The information recorded in this file is shown on the Web site, and can be downloaded and used as a template for recording data alongside experimental work.
The pathological details of each of the cores together with additional information such as the location on the original block from which the core was obtained and the quality of the core are uploaded from a tab delimited text file and stored in the core location table. The TMA slide immunohistological staining protocol and the results are also recorded in a tab delimited text file, an example of which is shown as a template on the Web site.
Each TMA slide is scanned using software that provides a very high-resolution image of the TMA slide enabling users to "zoom in" and "zoom out" on the image over the web. TMA slide images are saved on an image server. They can be divided into a grid such that each core image can be saved as a separate image. The TmaDB has a link to the image server for each of the cores and results from the analysis of the images are stored in the "core_expt_result" table (Figure 2). Only the filenames of the images are stored by TmaDB and the actual images are stored on the image server.
Currently there are 18 TMA experiments stored in the database although this number is set to rise. The number of people whose clinical specimens have been analysed on TMA is 1095 (April 2005). Most of this data has been generated as part of clinical trials.
Utility and discussion
The MySQL relational database is interfaced with the World Wide Web making the database user-friendly as well as enabling access from remote locations for multiple users at any time. The web site also enables users to submit and retrieve data from the database. The homepage provides a brief background on TMA technology and outlines the advantages and disadvantages of the technique, it also provides a menu for easy navigation of the web site. A help page provides detailed explanations for each of the menu links. A brief description is given below:
The "Display all records" link connects to a page containing all the tables in the database. Clicking on individual table names fetches all the data contained in the particular table. The "Database design" link, displays the ER diagram (Figure 2). The "MySQL search" link is an html form that enables the user to retrieve data from the database with MySQL "select" statements. For security reasons, the user only has permission to search the database but not write to the database. However badly constructed MySQL "select" statements with incorrect syntax will not be executed and the MySQL error message will be displayed on screen. In addition, the provision of SQL select facilities could leave the database vulnerable to badly constructed queries that would be damaging to performance but not data, although this has not been a problem to date. The other four links on the menu are for inputting data into the database. The "Submit TMA design" link allows the user to upload a tab delimited text file which is then saved into a directory on the server side and an email is sent automatically to the curator who then checks the file to make sure there are no errors and then the information is loaded into the database automatically. The data is loaded sequentially into the database beginning with the TMA design ("Submit TMA design") then the specific details of each of the cores on the TMA ("Submit core path data") followed by experimental protocol and results ("Submit experimental protocol and results") and finally the pathological data ("Submit block path report"). The file formats for each of the files to be uploaded are shown on the Web site. The final link on the menu connects to the image server where all images for all TMAs are archived. All files can also be uploaded in XML file format (the schema for this is provided on the Web site). In addition the output from all queries to the database can be displayed or downloaded as an XML file. The XML file format is compatible with the data exchange specification developed by Berman et al.  in that we have adopted their extensibility mechanism and therefore our instance documents are still valid under their exchange specifications. The basic XML structure developed by Berman et al.  has been extended with the aim of establishing a community standard that is similar to MAGE-ML for microarray databases .
The purpose of a central cancer TMA database is to provide a single repository for laboratory experimentalists to store TMA experiments. Storage of all TMA experiments will enable researchers to make correlations between the intensity of staining of a specific antibody and the stage of cancer, or between different types of cancers. The curation of a database will also encourage researchers to report results in a standard format thereby enabling comparisons between experiments performed by different individuals. An additional advantage of the database is to be able to determine where a specimen block is stored or the person/institution in possession of a specimen block of interest for further experiments. Furthermore it is anticipated that the database will also be used as a learning tool for pathologists.
Further developments of the database include the addition of other information from different experiments (such as cDNA microarray) on the same clinical specimen used in the TMA experiment and allowing users to carry out complex queries (see below).
An example of a complex query to the database:
Find all cases where specimens are stained with antibody "p53" AND type of cancer is "colon" AND pT stage is "3".
The TmaDB provides a central repository for archiving all aspects of TMA data. The relational database includes the vast majority of the published Common Data Elements for a TMA experiment. This will therefore enable efficient data exchange as well as contributing to the establishment of a standard for reporting TMA data analogous to MIAME in the description of microarray data . The database design is adaptable such that pathological data from several different types of cancer can be included in one database.
It is anticipated that in addition to providing a resource for archiving and querying TMA data TmaDB will also enable large-scale analyses of TMA data.
Availability and requirements
This work is supported by Yorkshire Cancer Research.
- Kallioniemi OP, Wagner U, Kononen J, Sauter G: Tissue microarray technology for high-throughput molecular profiling of cancer. Hum Mol Genet 2001, 10: 657–662. 10.1093/hmg/10.7.657View ArticlePubMed
- Kononen J, Bubendorf L, Kallioniemi A, Barlund M, Schraml P, Leighton S, Torhorst J, Mihatsch MJ, Sauter G, Kallioniemi OP: Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nat Med 1998, 4: 844–847. 10.1038/nm0798-844View ArticlePubMed
- Henshall S: Tissue microarrays. J Mammary Gland Biol Neoplasia 2003, 3: 347–358. 10.1023/B:JOMG.0000010034.43145.86View Article
- Berman JJ, Edgerton ME, Friedman BA: The tissue microarray data exchange specification: a community-based, open source tool for sharing tissue microarray data. BMC Med Inform Decis Mak 2003, 3: 5. 10.1186/1472-6947-3-5PubMed CentralView ArticlePubMed
- Manley S, Mucci NR, De Marzo AM, Rubin MA: Relational database structure to manage high-density tissue microarray data and images for pathology studies focusing on clinical outcome: the prostate specialized program of research excellence model. Am J Pathol 2001, 159: 837–843.PubMed CentralView ArticlePubMed
- Liu CL, Prapong W, Natkunam Y, Alizadeh A, Montgomery K, Gilks CB, van de Rijn M: Software tools for high-throughput analysis and archiving of immunohistochemistry staining data obtained with tissue microarrays. Am J Pathol 2002, 161: 1557–1565.PubMed CentralView ArticlePubMed
- Coombes KR, Zhang L, Bueso-Ramos , Brisbay S, Logothetis C, Roth J, Keating MJ, McDonnell TJ: TAD: a web interface and database for tissue microarrays. Appl Bioinformatics 2002, 1: 155–158.PubMed
- Berman JJ, Datta M, Kajdacsy-Balla A, Melamed J, Orenstein J, Dobbin K, Patel A, Dhir R, Becich MJ: The tissue microarray data exchange specification: implementation by the Cooperative Prostate Cancer Tissue Resource. BMC Bioinformatics 2004, 5: 19. 10.1186/1471-2105-5-19PubMed CentralView ArticlePubMed
- Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart D, Sherlock G, Ball C, Lepage M, Swiatek M, Marks WL, Goncalves J, Markel S, Iordan D, Shojatalab M, Pizarro A, White J, Hubley R, Deutsch E, Senger M, Aronow BJ, Robinson A, Bassett D, Stoeckert CJ Jr, Brazma A: Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 2002, 3: RESEARCH0046. 10.1186/gb-2002-3-9-research0046PubMed CentralView ArticlePubMed
- Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M: Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 2001, 29: 365–71. 10.1038/ng1201-365View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.