The tissue microarray data exchange specification: implementation by the Cooperative Prostate Cancer Tissue Resource
© Berman et al 2004
Received: 17 December 2003
Accepted: 27 February 2004
Published: 27 February 2004
Tissue Microarrays (TMAs) have emerged as a powerful tool for examining the distribution of marker molecules in hundreds of different tissues displayed on a single slide. TMAs have been used successfully to validate candidate molecules discovered in gene array experiments. Like gene expression studies, TMA experiments are data intensive, requiring substantial information to interpret, replicate or validate. Recently, an open access Tissue Microarray Data Exchange Specification has been released that allows TMA data to be organized in a self-describing XML document annotated with well-defined common data elements. While this specification provides sufficient information for the reproduction of the experiment by outside research groups, its initial description did not contain instructions or examples of actual implementations, and no implementation studies have been published. The purpose of this paper is to demonstrate how the TMA Data Exchange Specification is implemented in a prostate cancer TMA.
The Cooperative Prostate Cancer Tissue Resource (CPCTR) is funded by the National Cancer Institute to provide researchers with samples of prostate cancer annotated with demographic and clinical data. The CPCTR now offers prostate cancer TMAs and has implemented a TMA database conforming to the new open access Tissue Microarray Data Exchange Specification. The bulk of the TMA database consists of clinical and demographic data elements for 299 patient samples. These data elements were extracted from an Excel database using a transformative Perl script. The Perl script and the TMA database are open access documents distributed with this manuscript.
TMA databases conforming to the Tissue Microarray Data Exchange Specification can be merged with other TMA files, expanded through the addition of data elements, or linked to data contained in external biological databases. This article describes an open access implementation of the TMA Data Exchange Specification and provides detailed guidance to researchers who wish to use the Specification.
Because TMAs are designed to answer questions applicable to pathologic lesions with specific sets of attributes (e.g. stage or grade or diagnostic subtype), preparation of a TMA requires access to large archives of paraffin embedded tissues. Each TMA core tissue must be annotated with clinical, demographic or histopathologic information so that measurements on the TMA core samples can result in clinically useful correlations. To ensure inter-laboratory reproducibility, information describing the preparation of TMA blocks and slides need to be provided along with the TMA data records.
The Cooperative Prostate Cancer Tissue Resource (CPCTR) is a multi-institutional virtual tissue bank funded by the U.S. National Cancer Institute (NCI) to provide researchers with samples of prostate cancer tissues . The member institutions of the CPCTR are New York University, George Washington University, University of Pittsburgh and Medical College of Wisconsin. The CPCTR began service to the cancer research community on December 6, 2001. The CPCTR has over 5,000 prostate cancer specimens including radical prostatectomy cases (paraffin and fresh-frozen) and paraffinized needle biopsies. The CPCTR represents the largest repository of histologically-characterized and clinically annotated prostate cancer tissue in the USA. All accrued cases undergo pathology review and all clinical data is collected using methodology standardized across the participating institutions. CPCTR resources are available to all researchers, academic and commercial. Further information can be obtained from the CPCTR website .
The CPCTR has constructed a prostate cancer TMA implemented in conformance with the new TMA Data Exchange Specification (herinafter designated "the Specification"). The Specification was developed through a series of open workshops sponsored by the National Cancer Institute and the Association for Pathology Informatics . Tissue data included in the CPCTR TMA database is de-identified, and assembled in an open access database to permit data sharing, in compliance with current NIH policy on data sharing  and in concert with ongoing NIH initiatives to develop new methods for sharing research data [11, 12].
Results and Discussion
An informative header that explained the purpose of the file and provided all the information to understand the file (i.e., its organization).
Information regarding the creation of the file (e.g., creator, date of creation)
Rights of use (e.g. specifying any restrictions on use)
Methodology (e.g. how the data contained in the file was obtained)
Metadata (the data that describes or defines the actual data)
Metadata definitions (clear descriptions and definitions of the metadata)
Uses Uniform Resource Locators (URLs) to link the TMA database with web documents that provide detailed information supplementing the metadata tags. These external URLs are:
A link to the Dublin Core Meta Data Elements used in the header section of the document .
A link to the ISO-11179-compliant listing of Common Data Elements (CDEs) provided in the Specification .
A link to the CPCTR CDEs .
Supports complex TMAs within a single TMA file. In this case, a single TMA file contained four blocks, with cores from a single tissue samples appearing in multiple locations in more than one block.
Protects patient privacy (by deidentifying all data)
Allows data sharing (by permitting free distribution of the XML data document)
Tissue microarrays allow for the high throughput analysis of tissue samples and their association with clinical or outcomes data. Yet these experiments require a large amount of information for the subsequent analysis and evaluation, in particular by interested second parties. The Specification provides an accurate and reproducible method for the transfer of this information as is required for inter-laboratory reproducibility. One of the most important problems with modern data specifications is the daunting technical expertise required for their implementation. The Specification was written to permit maximal flexibility and minimal implementation requirements . This study demonstrates that the Specification can be implemented using a simple Perl script that converts an Excel database into XML-tagged data elements. The resulting large section of core-related XML text can be simply inserted into a conformant document containing header, block and slide information. The resulting TMA database can be validated with a Perl script provided with the Specification document.
Human subjects protections
All institutions participating in the CPCTR have Institutional Review Board (IRB) approval for human subjects research. Each CPCTR institution develops its own local protocols to protect the confidentiality and privacy of human subjects and obtains local IRB approval for all CPCTR activities. The IRB assurance numbers for each cooperating institution are: New York University – M1177; Medical College of Wisconsin – M1061; University of Pittsburgh Medical Center – M1256; and George Washington University Medical Center – M1125. Tissue data records from the cooperating institutions are submitted to a central data manager (Information Management Services, Inc., contracted by the NCI) as de-identified records. All institutions assign an arbitrary number to each record before submitting the de-identified record to the central database. This ensures that the central database has no links connecting records to patients. In addition, HIPAA's proscribed set of 18 data elements are omitted from core sample records (so-called safe harbor approach to HIPAA-compliance) .
Tissue and data collection
The CPCTR maintains a publicly available Manual of Operations that describes its tissue collection procedures and policies .
Pathological characterization of specimens involves review of all cases by a CPCTR pathologist using diagnostic criteria explained in the publicly available CPCTR histologic atlas and manual .
The TMA Data Exchange Specification
The Specification is an open access document that can be used without restriction .
Header, containing the specification Dublin Core identifiers, 2) Block, describing the paraffin-embedded array of tissues, 3) Slide, describing the glass slides produced from the Block, and 4) Core, containing all data related to the individual tissue samples contained in the array. The simplest possible structure for a conforming TMA file consists of nothing more than empty tags designating the four required sections [see Figure 2] .
Common Data Elements (CDEs) are metadata tags that describe the data elements included in an XML database. To be of value, CDEs must be well-defined, uniquely identified and available for human review or computer access. Eighty CDEs, conforming to the ISO-11179  specification for data elements constitute the XML tags provided in the Specification . CDE descriptors are publicly available . However, the only CDEs that must appear in any conforming TMA file are the section CDEs (header, block, slide and core), the root CDE (histo) and the tma CDE itself (tma). A set of six simple semantic rules describe the syntax for the data exchange specification .
The Specification was designed for maximal flexibility. Flexibility in the first version of an XML specification permits the addition of greater structure in later versions built on tested implementations. A similar approach has been used for ANSI/HL7 Common Data Architecture (CDE) wherein the earliest version (Level One) is intentionally sparse . At this time, there is no DTD (Data Type Definition) or Schema included in the Specification. For those wishing to use a DTD, a Specification-compliant DTD has been prepared by David G. Nohle, Ohio State University Department of Pathology and the Mid-Region AIDS & Cancer Specimen Resource (ACSR) .
Constructing the TMA Data file
Filling the four sections (header, block, slide and core)
Assembling the four sections into a TMA file with a proper file declaration, root element and TMA CDE.
Validating that the TMA file conforms to the specification
The header, block and slide sections of the TMA will vary only slightly from project to project within a laboratory. The CPCTR header, block and slide sections were prepared "by hand" using the section-specific CDEs provided in the specification.
The CPCTR prostate cancer TMA consists of 299 core samples distributed in four blocks, each block having 300 arrayed cores. Each block contains about 150 core samples in two different locations in each block. The core duplicates are staggered in the array, to maximize the chance that a given core will be represented if an area of the slide section is lost in processing. The distribution of one set of core samples in multiple array locations in four blocks yields a complex TMA that cannot be adequately represented by separate descriptions of each block. The Specification permits multi-block TMA files. Within the block CDE are the nested sets of four blocks that compose the complex TMA. Each core CDE is nested within a specific block CDE, and one core may have two associated array locations [see Figure 6].
The four sections are concatenated as a single XML database file. The CPCTR database file is provided with this manuscript [see Additional file 2].
Validating the TMA Data file
Availability and requirements
The Perl scripts and files for the production of TMA databases that meet the Specification are available with this publication. The example prostate cancer TMA database is available as a supplementary file with this article [see Additional file 1]. The actual tissue microarray slides are available after an application process Although the CPCTR is a non-profit, government-sponsored resource, a surcharge is attached for glass slides, to help defray a portion of the costs of TMA production. The application process and charges are described at the CPCTR web site . Questions regarding any aspect of the CPCTR can be directed to the CPCTR email query service [firstname.lastname@example.org].
This work was supported by four grants from the National Cancer Institute for the support of the Cooperative Prostate Cancer Tissue Resource: U01 CA86772, U01 CA86743, U01 CA86735, and U01 CA86739. With the exceptions of Jules Berman and Kevin Dobbin, the authors are funding recipients of these grants. Jules Berman and Kevin Dobbin performed this work as part of his regular activities as a U.S. government employee. Hang Liu, of the University of Wisconsin, is acknowledged for writing a Perl script that extracted the array locations for core samples.
- Kononen J, Bubendorf L, Kallioniemi A, Barlund M, Schraml P, Leighton S, Torhorst J, Mihatsch MJ, Sauter G, Kallioniemi O-P: Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nat Med 1998, 4: 844–847.View ArticlePubMedGoogle Scholar
- Mousses S, Bubendorf L, Wagner U, Hostetter G, Kononen J, Cornelison R, Goldberger N, Elkahloun AG, Willi N, Koivisto P, Ferhle W, Raffeld M, Sauter G, Kallioniemi O: Clinical validation of candidate genes associated with prostate cancer progression in the CWR22 model system using tissue microarrays. Cancer Research 2002, 62: 1256–1260.PubMedGoogle Scholar
- Perrone EE, Theoharis C, Mucci NR, Hayasaka S, Taylor JM, Cooney KA, Rubin MA: Tissue microarray assessment of prostate cancer tumor proliferation in African-American and white men. J Natl Cancer Inst 2000, 92: 937–939. 10.1093/jnci/92.11.937View ArticlePubMedGoogle Scholar
- Halvorsen OJ, Haukaas SA, Akslen LA: Combined Loss of PTEN and p27 expression is associated with tumor cell proliferation by Ki-67 and increased risk of recurrent disease in localized prostate cancer. Clin Cancer Res 2003, 9: 1474–1479.PubMedGoogle Scholar
- Milanes-Yearsley M, Hammond ME, Pajak TF, Cooper JS, Chang C, Griffin T, Nelson D, Laramore G, Pilepich M: Tissue micro-array: a cost and time-effective method for correlative studies by regional and national cancer study groups. Mod Pathol 2002, 15: 1366–1373. 10.1097/01.MP.0000036345.18944.22View ArticlePubMedGoogle Scholar
- Rubin MA, Dunn R, Strawderman M, Pienta KJ: Tissue microarray sampling strategy for prostate cancer biomarker analysis. Am J Surg Pathol 2002, 26: 312–319. 10.1097/00000478-200203000-00004View ArticlePubMedGoogle Scholar
- Cooperative prostate cancer tissue resource: Release Date April 29,1999, RFA CA-99–012, National Cancer Institute[http://grants1.nih.gov/grants/guide/rfa-files/RFA-CA-99–012.html]
- Cooperative Prostate Cancer Tissue Resource[http://cpctr.cancer.gov]
- Berman JJ, Edgerton ME, Friedman BA: The tissue microarray data exchange specification: A community-based, open source tool for sharing tissue microarray data. BMC Med Inform Decis Mak 2003., 5:Google Scholar
- Final NIH statement on sharing research data: Release date February 26, Notice NOT-OD-03–032, National Institutes of Health[http://grants1.nih.gov/grants/guide/notice-files/NOT-OD-03–032.html]
- Infrastructure for data sharing and archiving: Release date October 17, RFA Number: RFA-HD-03–032, Department of Health and Human Services[http://grants.nih.gov/grants/guide/rfa-files/RFA-HD-03–032.html]
- Tools for collaborations that involve data sharing: Release date June 4, PA Number: PAR-03–134, Department of Health and Human Services[http://grants.nih.gov/grants/guide/pa-files/PAR-03–134.html]
- Dublin core metadata initiative[http://dublincore.org]
- Association for Pathology Informatics tissue microarray common data elements[http://22.214.171.124/jjb/tma_cde.htm]
- CPCTR data elements for frozen tissue collection user instructions[http://www.pathology.pitt.edu/pdf/cpctr/cpctr-cde-v22.pdf]
- Instructions for CPCTR TMA Block Production[http://www.pathology.pitt.edu/pdf/cpctr/block.htm]
- Preparing CPCTR slides from paraffin TMA blocks[http://www.pathology.pitt.edu/pdf/cpctr/section.htm]
- Department of Health and Human Services. 45 CFR (Code of Federal Regulations), 164.514(6)(2)(i). Standards for Privacy of Individually Identifiable Health Information (final) 2002.Google Scholar
- National Cancer Institute cooperative prostate cancer tissue resource manual of operations[http://www.pathology.pitt.edu/pdf/cpctr/cpctr-moo-110403jo.pdf]
- Cooperative prostate cancer tissue resource histomanual[http://www.pathology.pitt.edu/pdf/cpctr/histomanual25.pdf]
- Solbrig HR: Metadata and the reintegration of clinical information: ISO 11179. MD Comput 2000, 3: 25–28.Google Scholar
- Dolin RH, Alschuler L, Beebe C, Biron PV, Boyer SL, Essin D, Kimber E, Lincoln T, Mattison JE: The HL7 Clinical Document Architecture. J Am Med Inform Assoc 2001, 8: 552–69.PubMed CentralView ArticlePubMedGoogle Scholar
- Mid-region AIDS and cancer specimen resource tissue micro-arrays[http://virtualmicroscope.osu.edu/tma/]
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.