In the current study, we describe the evolution of a process that has developed over the past 5 years to use TMAs for biomarker studies. At the center of this process is a relational database tool, Profiler, which allows users to handle all steps of TMA research (Figure1). The profiler system was originally developed for work in the field of prostate cancer biomarker research but has now been expanded to allow for the analysis of any tissue type. This system has been extensively used by investigators from several academic institutions.
The key feature to Profiler's functionality is the ability to access TMA images over the Internet and enter data in a secure manner. During the course of our first TMA experiments, we observed that there was a high probability of annotating the wrong TMA histospot when multiple pathology reviewers were involved due to density of these TMAs, which typically have 500 TMA cores. We therefore developed a system to present TMA images to the reviewer thus avoiding having to physically identify the TMA histospot by examining the slide. Manually evaluating a TMA slide under the microscope can be a common source of error, since it is reasonably easy to loose track of which samples have been evaluated. Profiler keeps track of which cores have been reviewed so the entire TMAs can be analyzed over a single session or multiple sessions.
The integration of TMA data with other data sets is critical for the development of biomarkers. In version 3.0 of Profiler, other databases can be loaded securely onto the system. Over the past 5 years, several groups with rich clinical and pathology data have used the system to evaluate molecular biomarkers. Profiler's database keeps different datasets separate from each other, maintaining private space for research groups to develop projects. This has become an important issue with some cooperative groups that need to maintain control over their clinical databases based on pre-existing sharing agreements.
Worldwide, one of the guiding principles for any database that maintains clinical data is compliance with local regulatory agencies to protect patient confidentiality. In the United States, HIPPA regulations guide how clinical databases can be integrated with the data stored in Profiler. We have developed two approaches. The first approach is to maintain both datasets separately and then the integration of the Profiler TMA data is performed by the clinical group. They are then typically able to provide a completely integrated dataset without any patient identifiers. This dataset is now ready for analysis. The second approach is to integrate the clinical data into the Profiler system as a separate clinical module. We have done this for several groups and have used Oracle security and administrative tools to ensure that the clinical module is only available for viewing and modifying by the designated clinical groups researchers. All queries require the group leader's approval and each time a query is performed on the clinical data, the time, date, and investigator are recorded and send via email to the group leaders. This monitoring feature is critical to ensure that the clinical database is only used by qualified investigators. The system administrator assigns users different levels of access at different sites using password protected user profiles.
The tracking of standard operating procedures (SOP) is critical for the interpretation of experimental results over time. Therefore the development of a "laboratory book" feature as a new module is critical to the adequate annotation of samples and experiments. We are currently developing the ability to input experiment related data such that a laboratory technician could annotate the staining protocol of each experiment (timing, dilutions, etc). In addition, acquisition settings and image analysis procedures will also be stored in this module of Profiler. Validation studies require specific protocols to reproduce all of the experimental conditions, this module should ensure that this data is available.
Few papers have been published so far on TMA data organization and management. Liuet al. presented a system for high-throughput analysis and storage of TMA immuno-staining data, using a combination of commercially available software and novel software. Similarly, Shaknovich et al. proposed a way to manipulate TMA data and images, using commercially available software.
Other academic groups are working on the development of systems integrating commercially available software for the acquisition of digital images and the automatic evaluation of markers with custom solutions for data organization and management.
The Johns Hopkins Tissue Microarray Laboratory has also developed a set of software tools and underlying database structure to manage TMA data. It allows the storage of a wide variety of information related to TMA samples, including patient clinical data, specimens, donor blocks, core, and recipient block information. A dynamic database structure allows users to add custom fields for different organ systems. The client application facilitates automated and manual entry of data related to patients, specimens, tissue blocks, and tissue sub-blocks (individual pathological diagnoses). The system allows users to design their own block arrays. Digital images generated by the Bacus Labs Inc. Slide Scanner (Bacus laboratories, Lombard, IL,) are imported into the database and available for on line visualization and evaluation. Although this system developed separately from the Profiler system, the two groups worked closely initially as part of the S.P.O.R.E. initiative to develop TMAtechnology for translational research.
Some other systems have been developed to automatically acquire and evaluate TMA samples, such as the TMALab ™ (Aperio technology), or the Pathfinder™ Morphoscan™. These solutions work well with high quality TMA slides, usually by superimposing a grid on the panoramic overview of the slide, but they require considerable manual intervention if the TMA samples are not well aligned. These systems are similar to the Bliss and Chromavision systems in that none of these systems identify the histospots automatically. The misalignment of TMA histospots is inevitable due to the way the samples are processed. After a thin 4–5 micron thick section is cut, the histotechnologist floats the sample in a water bath. Even when done by the best-trained technologists, there is slight movement of the samples. Therefore using a grid will never be a practical solution to automate the identification of the samples.
The AQUA system uses an object recognition approach to exactly identify the spatial coordinate of each spot, but lacks in ordering and assigning them to proper patient and/or clinical information based on construction information.
The Bioinfomatics Group of ITC in Trento, Italy, has developed an integrated framework for the management of TMA experiment data. This system called TMABoost is a patient centered web based system. It integrates a custom made TMA digital environment for the automatic localization, identification, acquisition and evaluation of TMA single spot. One unique feature of this system is the ability to recognize the histospots taking the TMA map into account. A probability is determined as to the likelihood that a specific histospot is being correctly identified based on the physical map of the TMA. This feature is particularly useful, as during the course of preparing a slide for immunohistochemisty, some histospots may be lost. Ambiguously located histospots would then be excluded or can be manually resolved. This feature using the existing data reduces the chance that TMA histospots are pared with the incorrect x-y coordinate.
Another important feature of TMA based studies is the growing need to share data and information among different institutions (e.g., multi-center studies, clinical trials, etc.). In such a setting, the implementation of standard data exchange protocols becomes critical as up to now there have not been standard approaches to collecting data at different institutions and sometimes even within the same institution and as centralized solutions are not feasible. As a result of several TMA workshops in the area of TMA bioinformatics, TMA Exchange Specifications have been released that allow TMA data to be organized in a self-describing XML document annotated with well-defined common data elements. TMA data exchange specifications have recently been described by Berman et al.[11, 39]. Adopting these standards should allow for a seamless integration of public TMA databases. The public sharing of TMA data following publication of data, similar to standards developed for expression array data exchange should facilitate biomedical research. As part of a National Biospecimens Pilot Project, the 11 Prostate Cancer S.P.O.R.E. groups will adopt these standard and work with the Cancer Biomedical Informatics Grid (caBIG) program to ensure that these standards remain compliant. Future versions of this system will adopt an ontology system developed in collaboration with the caBIG program. Currently our vocabulary and definitions have been more locally defined and will need to be adjusted as these standards become better defined.
Profiler can be set up at academic institutions that would like to use this system. Although the system is currently written for Oracle, the Profiler application can be deployed to Open Source Software. Apache Web Server and Tomcat Engine are currently Open Source Software. All front end codes for Profiler application can modify to support to Open Source. There are several Open Source databases such as MySQL or PostgreSQL that can be integrated with Profiler application with small modification of JDBC drivers and Java codes. We are currently looking for an Open Source Image application to support TMA images. The Profiler system is available to all academic intuitions using a standard academic licensing agreement. This procedure is intended to make the system widely available and is required by our institutions based on their established intellectual properties policies. Any interested investigators should contact the corresponding author to begin the licensing procedure. Groups using a different database system (e.g., MySQL) may need to modify the database scripts and perform additional testing. But in general both the code and the relational database schema are highly portable.