A data review and re-assessment of ovarian cancer serum proteomic profiling

Sorace, James M; Zhan, Min

doi:10.1186/1471-2105-4-24

Proteomic Pattern Diagnostics: Producers and Consumers in the Era of Correlative Science

Emanuel Petricoin, Food and Drug Administration

12 March 2004

Proteomic Pattern Diagnostics: Producers and Consumers in the Era of Correlative Science

Emanuel F. Petricoin1*, David A. Fishman2, Thomas P. Conrads3, Timothy D. Veenstra3, and Lance A. Liotta4*

1FDA-NCI Clinical Proteomics Program, Office of Cell and Gene Therapies, CBER, FDA, Bethesda, MD 20892 2National Ovarian Cancer Early Detection Program, Northwestern University, Chicago Ill 60611 3NCI Biomedical Proteomics Program, Laboratory of Proteomics and Analytical Technologies, SAIC-Frederick, Inc., Frederick, MD 21702 4 FDA-NCI Clinical Proteomics Program, Laboratory of Pathology, Center for Cancer Research, NCI, NIH, Bethesda MD 20892

*To whom correspondence should be addressed: Emanuel F. Petricoin, Bldg 29A, Room 2D12, 8800 Rockville Pike, Office of Cellular and Gene Therapy, CBER, FDA, Bethesda, MD 20892

Introduction

A recently published article in BMC Bioinformatics “A data review and re-assessment of ovarian cancer serum proteomic profiling” (1), by Sorace and Zhan described the analysis of mass spectrometry derived data streams that we produced and posted in the public domain without restriction (Clinical Proteomics Program: www.ncifdaproteomics.com). The study set “Ovarian dataset 8-7-02” used as the total source data for the Sorace article, was one of a series of files made available on the aforementioned website. This particular data set was derived from a surface-enhanced laser desorption ionization (SELDI) time-of-flight (TOF) mass spectral (MS) analysis of serum collected from 162 subjects with ovarian cancer and 91 subjects with benign pathologies or high-risk women followed for 5 years with no clinical evidence of disease. All serum was collected prior to clinical workup and surgery. We have previously posted several example discriminating ion patterns and pattern recognition attempts using this same study set (2). We have also posted examples patterns with exact ion classifiers that achieve highly accurate (100%) discrimination of our August 2003 web-posted high-resolution TOF MS data (acquired on a hybrid quadrupole (Qq) TOF MS fitted with a SELDI source). Our posted results were generated solely by the use of a separate blinded testing set.

Sorace and Zhan describe the focus of their paper as the development of “a simple approach: a “benchmark method” to which other methods can be compared” (1). The quest for establishing benchmarking procedures for statistical analysis - the original stated goal of the Sorace paper - is a noble, if not illusive, goal. These investigators imply that their procedure should be considered as the new gold standard. However, it may be premature to establish “best practices” at this early time in the field. Perhaps a panel of analytical tools will ultimately prove optimal for benchmarking. A plethora of pattern recognition approaches are showing great promise for discovering highly accurate and specific patterns in highly complex mass spectral data (3-6). Two recent studies (6,7) have been published by groups analyzing our low-resolution SELDI-TOF MS data. While one approach (7) concluded that the data show evidence of noise bias and the inability of features to transcend separate data sets, the other group concluded that transcendent features could be found (6). In fact, Zhu et al. identified features that not only were able to accurately classify our 4-3-02 data set in a training/testing mode, but that the features used were able to predict (with 100% sensitivity and 100% specificity) the “Ovarian dataset 8-7-02” used as a separate blinded validation set. At this early time, it would seem wise to use a full complement of approaches to find buried diagnostic information. Clinical implementation of such a combined analytical investigation is our current approach.

Correlative science now exists in a metastable state where high n-dimensional data, produced by gene microarrays, protein arrays, and mass spectrometry, is being produced and “consumed” without a system to ensure that the producers and consumers have the full complement of technical expertise and information to correctly, fairly, and objectively analyze the data. While we applaud scientific investigations using our publicly available mass spectral data, the publication by Sorace et al. (1) highlights the dangerous potential for error propagation that may arise if a disconnect is allowed to exist between the data producers and the data consumers. We believe that this emerging disconnect could have a dramatic negative impact in the fields of both proteomics and genomics.

Noise vs. Bias in Mass Spectral Data: Not Equivalent

In the report by Sorace and Zhan (1), which focuses solely on one low resolution SELDI-TOF MS serum data set, the authors determined several sets of mass spectral features, different from those described on our web site, that are able to accurately classify 100% of a randomly selected testing set from a subset of this specific SELDI-TOF MS serum data set. Some of the discriminating mass spectral features that were found in the study by Sorace and Zhan were of extremely low m/z, such as 2.79. Thus the authors concluded, “The ability to discriminate between cancer and control based on m/z values of 2.79 and 245.5 reveals the presence of a significant experimental bias not related to disease pathology, that likely involves machine noise and matrix effects” (1). The authors then state “This is particularly true of the m/z value at 2.79 which represents a bias of the mass spectrometer itself” (1). The authors then extend this conclusion to question the entire SELDI-TOF MS data set, including many other data sets that they did not in fact analyze. We respectfully challenge these broad conclusions as judgmentally biased and scientifically unfounded, and propose that the authors made assumptions that are incorrect considering the physical principles and standard calibration methods of mass spectrometry.

Conclusions about the ascribed size of the features identified in a TOF mass spectrum can only be rigorously made within the m/z region that the mass spectrometer is calibrated. To achieve the most accurate calibration, the collection of peptide/protein calibrants should significantly span most of the m/z region of interest. The finite mass accuracy of the experimental measurements is a direct function of the specific mass spectrometer (e.g. mass linearity and electronic stability) and the accuracy of the peak centroid, which depends on the statistical reproducibility of the signal). In our own laboratories, we have a formal calibration process, written as a standard operating procedure (SOP) applied at the beginning and end of every run. The calibrant with the lowest molecular weight that we use for the Ciphergen PBS machines is arg-vasopressin, with an accurate mass of 1084.2474. As a result of a standard linear extrapolation of the collection of calibrant measurements, TOF data collected below and above the least and highest mass calibrant should not be used for pattern classification, especially in the clinical setting. In fact, a by-product of this extrapolation can be seen in the data itself whereby negative m/z value artifacts arise due to the results of the curve fitting. Hence, a feature found in our data at m/z 2.79 is no more accurate than if it were found at -2.79. The exact molecular weight of this feature is not 1.79 (assuming an [M+H]+ ion, which is also not necessarily true) any more than it is 26.9 or 278.0. Even though a feature can be reproducibly observed, measurement is not accurate because it falls outside the effective working calibration range. While some investigators wondered in their paper about our calibration method (7), we adhere to strict SOP’s whereby any TOF MS is calibrated at the beginning of every analysis and checked at the end of every analysis for drift using a collection of several peptides that span the m/z region of interest. If contacted, as producers of the data, we could have specified the calibration boundaries and described the detailed methodology in our laboratory, as we have done with many other investigators. Sorace and Zhan conclude that bias in the mass spectrometer accounted for the results, and since they believe “noise” is a discriminator, then biology is not being accounted for (1). However noise and bias are independent properties of data and should not be equated.

By definition, noise, because of its unpredictable and random nature, would be unable to classify clinical states in blinded samples. Bias, on the other hand is not random, and is based on a tractable perturbation. The argument is not over whether accurate classification is possible based on real peak information within the mass spectra. Instead the question is whether or not the diagnostic mass spectral features are underpinned by the fundamental disease mechanism (e.g. biology) or arise from non-biologic perturbations that are exclusive to one biologic state. Bias can be introduced at a variety of points, and can certainly confound pattern recognition methods by producing real features that are not biology-related. Sources of bias include sample handling differences between cases and controls, preparation and application of the LDI matrix, the use of different SELDI ProteinChip lots for cases versus controls, or a bias in the mass spectrometer at the time the cases or controls are separately analyzed. The latter, in fact, is what Sorace and Zhan seem to propose as a classifying bias (1), but do not present scientific evidence supporting this assertion.

Bias is certainly something we highly scrutinize and attempt to eliminate at every level in our investigations. Sources of bias can exist in the actual clinical study set itself (i.e. lead-time bias, length bias, etc.) and need to be understood. If the investigators would have contacted us, we could have elaborated, as previously stated on our website, that the SELDI-TOF MS data was produced by randomly commingling cases and controls. On any given 8 spot ProteinChip array, both cases and control samples were applied in random spot locations, to minimize systematic chip-to-chip variation. Bioprocessor sample application, roboticized matrix application, and MS analysis are all performed in a batch process to insure that each sample, regardless of disease category were treated identically, thus minimizing any bias due to chemical noise or process effects. All samples were run on the same single MS machine. We also controlled for bias in the sample collection and handling. Identical SOP’s were used for cases and control sample collection. In fact the same staff collected, processed and archived all samples used for these studies. Importantly, the sera from the cases were all obtained prior to diagnosis, surgery and clinical staging. Artifact caused by changes in process method during experimentation can be very problematic, and is an issue we minimize by close monitoring. Discontinuance of the original hydrophobic interaction H4 chip surface, initially used several years ago for our first report (3), has made it impossible to prospectively test potential process variances for that particular surface. Subsequent analysis and immediate web posting of the same sample sets using alternative chip chemistries have indicated that there does not appear to be bias in the samples themselves.

The notion that any low molecular weight ion classification is categorically caused by a systematic bias in the instrumentation or sample processing is unwarranted. As Sorace and Zhan note, investigators analyzing mass spectral data often choose not to analyze portions of the mass spectra that contain features with very small m/z values, since these regions contain ions that arise from the ionized organic acid matrix, the so called “chemical noise” which could possibly confound features that emanate from the clinical sample of interest. The detailed processes underlying desorption and ionization of molecules by LDI is rather poorly understood at the most fundamental levels. It is known that a number of events can contribute to “noisy” mass spectra that include molecular ion fragmentation or metastable ion formation from improper laser or ion source conditions, contaminants such as salt or detergents in the sample generating a signal or causing adduct formation, and dimer/trimer formation in plume reactions in the plasma just above the LDI target surface. Peaks can also arise from mass spectral peak “ringing”, or periodic noise that can be propagated throughout the spectra. Baggerly et al (7) concluded that they could achieve perfect classification of our posted data sets based on “noise”. We respectfully believe what the scientists meant was that since the discriminatory regions they found were of such low m/z and intensity and appear where many common matrix-associated peaks are found, that they cannot be associated with a bona fide biological process but must be emanating from a sample or instrumental process bias. In fact, we believe that the investigators may be confusing the terms “noise” and “bias”. If features are found that are located at m/z regions and can accurately classify phenotype in blinded sets of data, these features cannot be noise by definition.

Thus, the real question is not discrimination by noise, but do these mass spectral features relate to a biological process or are they being generated as a result of systematic bias. Before discussion of bias and bias reduction methods used in our investigations, it would be helpful to reflect on what little we know about the information archive of serum and plasma in the range we are studying by mass spectrometry in our laboratory (< 15,000 Da). The field of clinical chemistry has not yet established a thorough knowledge base of the compendium of molecular entities that exist in serum or plasma in the low molecular weight range. At this time, there is no complete list of all of the peptides and molecules that are normally found in the circulation of humans. Only recently has the first attempt been made at understanding this complex compartment of the blood proteome (8). Moreover, it would be a misconception, often made, that MALDI signatures of complex body fluids such as serum are comprised solely of whole proteins. Organic metabolites, lipids (such as lysophosphatidic acid or LPA), small peptides, and protein fragments are all efficiently analyzed by LDI-TOF MS. It should be self-evident that molecules such as LPA would not be “noise”, but could represent a real ion peak(s) below 2000 m/z and have a biological basis for existence. We were careful, in fact, to make note of this point in our original Lancet publication (3) that the observed mass spectral features populating the patterns could arise from both proteomic, lipid and metabolic (e.g. the collection of small molecule metabolites) information. Discussions about noise and bias are all very important. However sweeping conclusions but must be couched in the reality that very little information is available about the low molecular weight biologic content of blood. The region below 2000 m/z contains all of the above types of noise, with electronic noise spread throughout. As we have no working knowledge of the chemicals, metabolites and small peptides that exist in the region, it is our opinion that any feature that can reproducibly classify important disease states such as early stage ovarian cancer warrant further evaluation. Indeed, in the application of metabolomics (the use of NMR to study disease-associated small molecule metabolite patterns in biological fluids) disease diagnosis is predicated on the ability to observe metabolic dysregulation events associated with cancer cells (9). Our contention is that only extensive validation will enable us to determine which features are driven from biologically related metabolic and peptidic signatures in this region. Conclusions that any peak in the sub-2000 m/z region can only represent some non-biologically produced bias is speculative at best, unsubstantiated by the lack of knowledge about information content in this region, and ignores the known existence of biologically relevant molecules in this MW region (e.g. LPA). All conclusions about the importance of a mass spectral feature, irrespective of the perceived mass, should be based solely on blinded validation results.

A further example of incorrect assumptions relates to statements about the median age of the control and cancer patients used in the 8-7-02 study, Sorace and Zhan claimed these values were 47 and 60 years old respectively (1). We are not clear where these numbers were derived, as they were not stated or included with the “Ovarian dataset 8-7-02” data study set, nor were the authors privy to that information from the clinician providing the samples (D. Fishman). This inaccuracy forms the basis of the next conclusion by Sorace and Zhan that based on this age difference, “introduce a bias in the results reported in this study as well as others derived from this dataset” (1). The ranges of the ages of patients in the 8-7-02 study set overlap substantially (cases = 32-78 years; controls = 23-83 years). Furthermore, and most importantly, there are many pre-menopausal stage I cancers that are much younger than many of the post-menopausal controls in all of our study sets. If age was a driver for classification, then the accuracy seen in blinded testing would be hard to explain: all of the pre-menopausal stage I cancers were classified correctly, as were all of the post-menopausal controls.

Source of Diagnostic Information in Mass Spectral Data

What is the source of the low and ultra low molecular weight information content of the blood that we observe by SELDI-TOF MS? Recent published studies from our laboratory reveal that this low mass archive seen by SELDI-TOF exists in the serum in a bound form- most likely to albumin and/or a combination of other high abundance circulating carrier proteins (10,11,12). These high abundance carrier proteins harvest low abundance biomarkers and accumulate their information over time. Some investigators have wondered how mass spectrometry can be sensitive enough to detect low abundance biomarkers, and have questioned repeatedly whether the biomarkers found are not just highly abundant non-specific molecules (13). Potential identification of some of the MS peaks from different disease biomarker studies has been reported (14,15); with many of these diagnostic peaks representing clipped or modified (e.g. glycosylated) forms of higher abundant proteins. A specific sized fragment or specific post-translational modification of a high abundance protein could be both diagnostic and low abundance. It would be scientifically inaccurate to generalize that these alternate forms of high abundant proteins are epiphenomena (13), and wrong to assume that these isoforms exist in the same molarity as the parental intact molecule (13). It is very possible that disease processes can generate specific products that are highly disease specific. For example, isoforms of otherwise highly abundant proteins such as beta-tubulin may be specifically produced and become immunogenic, as well as possibly serve as a cancer biomarker (16). The “harvesting” of these small biomarker fragments may prevent their rapid clearance (glomerular filtration) by the kidney, and can explain why mass spectrometry may indeed by capable of detecting them. We have now shown that low abundance biomarker fragments can be amplified at least 200 fold by accumulation on highly abundant carrier proteins such as albumin (10). Currently, most investigators deplete the serum and plasma of these high abundance carrier proteins in an attempt to see the lower abundance proteins. We hope that this new scientific discovery will illuminate the means to prevent the a priori elimination of a vast repository of information when embarking on a discovery-based endeavor.

We commend scientists for endeavoring to guard the accuracy of MS data, and we will continue to post all of our spectral data in the public domain, without restriction. Each web posting by our group represents a “bookmark” of where our laboratory is in the continued development of proteomic pattern based testing. We would like to point out that both the 4-3-02 and 8-7-02 ovarian cancer related data sets were posted by us as “producers” as a service to the community and were not used as a basis of a scientific publication. To prevent the dissemination of inaccuracies and speculative conclusions, we believe that the producers of genomic and proteomic data should be intercalated more fully into the publication process, particularly when the focus of the publication is the analysis of data that the submitting authors have not generated.

From Patterns to Identity: When and Why?

The state of the MS-based profiling field today is analogous to a time point in the recent past when diagnostic antigens were identified by their appearance at a specific molecular mass on Western blot. PSA and CA125, some of the most widely used cancer biomarkers today, began at this level. The underlying protein sequence of PSA and CA125 was determined long after they were in clinical use as an antigen marker (17). A reproducible peak at an exact m/z value as detected by a mass spectrometer can be considered a current equivalent to the appearance of an antigen band on a gel, but with potentially much higher accuracy and precision. Thus, history tells us that sequence identity is not required for clinical utility, and may not immediately reveal pathogenesis. Dogmatic and politicized statements about the need for sequence identity prior to clinical implementation are not supported by historical context. We believe a much more circumspect tone by all of us is appropriate. Today, clinicians are urgently in need of any useful biomarker, and the dry pipeline of new FDA approved protein markers from the past decade is worrisome (18). The kallikrein family of proteins, proposed as potential cancer markers (19) is illustrative of the clinical quandary we face with many biomarkers today. For clinical utility as a biomarker for a disease such as ovarian cancer, a test will not only need to be highly sensitive, but must have a specificity near 100%.The publication record on kallikreins indicates a relatively poor specificity and sensitivity, such that real clinical utility will be limited.

We actually know very little about human disease pathogenesis and the unique microenvironment in which biomarkers would be produced. Only recently has the field begun to acquire the necessary tools for molecular discovery and profiling. The archive of diagnostic information is just now being explored. In fact, the decision not to explore promising approaches that offer clinical utility today would be scientifically unjustified, and morally wrong.

Our laboratories are now currently sequencing the low abundance disease related protein fragments that are amplified by collection on circulating carrier proteins (10,11,12). This complex set of information comprises the patterns that we are detecting by SELDI-TOF MS, and explains why multiple discriminatory patterns are found within the same data set. Our first publication described just one of many combinations of ions and discriminatory patterns (3,20), while our recent high-resolution data revealed still more (21,22). Based on the biomarker complexity, it would be expected that separate classifying patterns could be discovered using identical starting input data. Imagine only ten biomarkers measured as one of two states (up or down). Mathematically the ten biomarkers can generate 2^10 patterns. In reality the number of low mass biomarkers is likely to be considerably greater. The labor intensive effort to definitively sequence proteins that contribute to the diagnostic pattern should be restricted to those that have been validated extensively using large clinical study sets. Very soon, a high-resolution MS pattern of ions will be replacing a pattern of identified protein fragments. A mass spectrometer will likely remain the analytical tool into the future, since it can rapidly detect diagnostic low molecular weight protein fragments, perhaps more accurately than a traditional immunoassay.

Concluding Remarks

In order to insure the highest level of scientific rigor and objectivity for proteomic pattern diagnostics, we are currently gathering retrospective and prospective data required for a non-commercial research-driven FDA 510 (k) submission. The indication will be the detection of ovarian cancer recurrence, and the predicate test will be CA125. The criterion to be met is equivalency between the predicate test and the new test. Data will be generated from at least three separate instruments, and prospective validation performed. Our reference laboratory performing the analysis and data generation is CLIA/CAP licensed. All raw data, and the results of the blinded trial will be posted for public view and analysis.

It is important, and disconcerting to us that while the low-resolution spectral data has been the focus of much attention and swirling controversy, our high-resolution SELDI-TOF MS data, which was produced on an ABI Qq-TOF instrument has been seemingly ignored. This data has been posted on our website and publicly available for over 8 months now, and represents an example iteration of the state of the art in our laboratory at this time. This data is the basis of an upcoming publication that will highlight our most recent efforts for clinical implementation (22). It is our opinion that high resolution spectrometry affords the best spectral reproducibility over time, can produce accurate mass tagging, as well as overall superior clinical accuracy (22). Other laboratories have also reported difficulties in maintaining low-resolution machine calibration and spectral reproducibility over time (23). We believe that a fair accounting of the state of the science can only be achieved by a thorough analysis of the entirety of all of the data sets. It is of questionable logic to us that unpublished data sets that were produced on a low-resolution spectrometer over 2 years ago would be analyzed so thoroughly, but not the high-resolution data that was produced by a platform that is much closer, in our opinion, to the clinic.

The aforementioned issues highlight the growing need for the editorial process to more thoughtfully consider the possibility that neither the “consumer” author(s) nor the reviewers have all of the necessary information about the posted data sets before a final conclusion is rendered. These are very important times for biomarker discovery. Establishing open communication among scientists is an essential part of the inherent contract we all have as scientists with the patients and families who are suffering daily. We are concerned that unchecked and unsubstantiated conclusions about openly available proteomic and genomic data will drive producing groups away from open access posting. This would be most unfortunate at the present critical time period in biomarker discovery. Finally, we must not forget that clinical applications and clinical efficacy should be the ultimate gauge by which we assess the impact of our biomarker work.

References

1. Sorace JM, and Zhan, M. A data review and re-assessment of ovarian cancer serum proteomic profiling BMC Bioinformatics 2003, 4:24.

2. Clinical Proteomics Program Databank website at www.ncifdaproteomics.com, results posted and updated as of 8/1/03.

3. Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone C, Fishman DA, Kohn EC, Liotta LA: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 2002, 359:572-577.

4. Cazares LH, Adam BL, Ward MD, Nasim S, Schellhammer PF, Semmes OJ, Wright Jr GL: Normal, benign, preneoplastic, and malignant prostate cells have distinct protein expression profiles resolved by surface enhanced laser desorption/ionization mass spectrometry. Clin Cancer Res 2002, 8:2541-2552.

5. Li J, Zhan Z, Rosenzweig J, Wang YY, Chan DW. Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin Chem. 2002 Aug;48(8):1296-304.

6. Zhu W, Wang X, Ma Y, Rao M, Glimm J, Kovach JS. Detection of cancer-specific markers amid massive mass spectral data. Proc Natl Acad Sci U S A. 2003 Dec 9;100(25):14666-71.

7. Baggerly KA, Morris JS, Coombes KR. Reproducibility of SELDI-TOF protein patterns in serum: comparing data sets from different experiments.

Bioinformatics. 2004 Jan 29.

8. Tirumalai RS, Chan KC, Prieto DA, Issaq HJ, Conrads TP, Veenstra TD. Characterization of the low molecular weight human serum proteome. Mol Cell Proteomics. 2003 Oct;2(10):1096-103. Epub 2003 Aug 13.

9. Lindon JC, Holmes E, Nicholson JK. (2003) So what’s the deal with metabonomics? Anal. Chem. 75, 384A-391A.

10. Mehta AI, Ross S, Lowenthal MS, Fusaro V, Fishman DA, Petricoin EF 3rd, Liotta LA. Biomarker amplification by serum carrier protein binding. Dis Markers. 2003-2004;19(1):1-10.

11. Liotta LA, Ferrari M, Petricoin E. Clinical proteomics: written in blood. Nature. 2003 Oct 30;425(6961):905.

12. Zhou, M. Lucas, D. A., Chan, K., Issaq, H. J., Petricoin, E. A. III, Liotta, L. A., Veenstra, T. D., and Conrads, T. P (2004) Investigation into the Human Serum Interactome, Electrophoresis, in press

13. Diamandis E, Mass spectrometry as a diagnostic and a cancer biomarker discovery tool: Opportunities and potential limitations. Mol Cell Proteomics. 2004 Jan 30

14. Zhang, Z., Bast Jr., R.C., Fung, E.T., Yu, Y., Li, J., and Rosenzweig, J., et al. (2003) A panel of three potential biomarkers discovered from serum proteomic profiling improves the sensitivity of CA125 in the detection of early stage ovarian cancer – a multi-institutional study. Proc. AACR Abstract #5739

15. Ye B, Cramer, DW, Skates, SJ, Gygi SP, Pratomo V, Fu, L, Hornick NK, Licklider LJ, Schorge JO, Berkowitz RS, and Mok SC. Haptoglobin-alpha subunit as potential serum biomarker in ovarian cancer: identification and characterization using proteomic profiling and mass spectrometry. Clin. Cancer Res. 2003 9, 2904-2911

16. Prasannan L, Misek DE, Hinderer R, Michon J, Geiger JD, Hanash SM. Identification of beta-tubulin isoforms as tumor antigens in neuroblastoma. Clin Cancer Res. 2000 Oct;6(10):3949-56.

17. Bast RC Jr, Klug TL, St John E, Jenison E, Niloff JM, Lazarus H, Berkowitz RS, Leavitt T, Griffiths CT, Parker L, Zurawski VR Jr, Knapp RC. A radioimmunoassay using a monoclonal antibody to monitor the course of epithelial ovarian cancer. N Engl J Med. 1983 Oct 13;309(15):883-7.

18. Anderson NL, Anderson NG. The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics. 2002 Nov;1(11):845-67.

19. Yousef GM, Diamandis EP. An overview of the kallikrein gene families in humans and other species: emerging candidate tumour markers. Clin Biochem. 2003 Sep;36(6):443-52.

20. Petricoin, EF, Mills, G, Kohn, E., and Liotta, LA “Proteomic Patterns in serum and identification of ovarian cancer” 2002. The Lancet. Letter: 360: 170-171.

21. Conrads TP, Zhou M, Petricoin EF 3rd, Liotta L, Veenstra TD. Cancer diagnosis using proteomic patterns. 2003 Expert Rev Mol Diagn. Jul;3(4):411-20.

22. Conrads TP, Fusaro VA, Ross S, Johann D, Rajapakse V, Hitt BA, Steinberg SM, Kohn EC, Fishman DA, Whiteley G, Barrett JC, Liotta LA, Petricoin EF, and Veenstra TD. High-resolution Serum Proteomic Features for Ovarian Cancer Detection. In press: Endocrine Related Cancer, June 2004.

23. Rogers MA, Clarke P, Noble J, Munro NP, Paul A, Selby PJ, Banks RE. Proteomic profiling of urinary proteins in renal cancer by surface enhanced laser desorption ionization and neural-network analysis: identification of key issues affecting potential clinical utility. Cancer Res. 2003 Oct 15;63(20):6971-83.

Competing interests

None-

We are the sole producers of the data analyzed by Sorace and Zhan

Proteomic Pattern Diagnostics: Producers and Consumers in the Era of Correlative Science

Emanuel Petricoin, Food and Drug Administration

12 March 2004

Proteomic Pattern Diagnostics: Producers and Consumers in the Era of Correlative Science
Emanuel F. Petricoin1*, David A. Fishman2, Thomas P. Conrads3, Timothy D. Veenstra3, and Lance A. Liotta4*
1FDA-NCI Clinical Proteomics Program, Office of Cell and Gene Therapies, CBER, FDA, Bethesda, MD 20892 2National Ovarian Cancer Early Detection Program, Northwestern University, Chicago Ill 60611 3NCI Biomedical Proteomics Program, Laboratory of Proteomics and Analytical Technologies, SAIC-Frederick, Inc., Frederick, MD 21702 4 FDA-NCI Clinical Proteomics Program, Laboratory of Pathology, Center for Cancer Research, NCI, NIH, Bethesda MD 20892
*To whom correspondence should be addressed: Emanuel F. Petricoin, Bldg 29A, Room 2D12, 8800 Rockville Pike, Office of Cellular and Gene Therapy, CBER, FDA, Bethesda, MD 20892
Introduction
A recently published article in BMC Bioinformatics “A data review and re-assessment of ovarian cancer serum proteomic profiling” (1), by Sorace and Zhan described the analysis of mass spectrometry derived data streams that we produced and posted in the public domain without restriction (Clinical Proteomics Program: www.ncifdaproteomics.com). The study set “Ovarian dataset 8-7-02” used as the total source data for the Sorace article, was one of a series of files made available on the aforementioned website. This particular data set was derived from a surface-enhanced laser desorption ionization (SELDI) time-of-flight (TOF) mass spectral (MS) analysis of serum collected from 162 subjects with ovarian cancer and 91 subjects with benign pathologies or high-risk women followed for 5 years with no clinical evidence of disease. All serum was collected prior to clinical workup and surgery. We have previously posted several example discriminating ion patterns and pattern recognition attempts using this same study set (2). We have also posted examples patterns with exact ion classifiers that achieve highly accurate (100%) discrimination of our August 2003 web-posted high-resolution TOF MS data (acquired on a hybrid quadrupole (Qq) TOF MS fitted with a SELDI source). Our posted results were generated solely by the use of a separate blinded testing set.
Sorace and Zhan describe the focus of their paper as the development of “a simple approach: a “benchmark method” to which other methods can be compared” (1). The quest for establishing benchmarking procedures for statistical analysis - the original stated goal of the Sorace paper - is a noble, if not illusive, goal. These investigators imply that their procedure should be considered as the new gold standard. However, it may be premature to establish “best practices” at this early time in the field. Perhaps a panel of analytical tools will ultimately prove optimal for benchmarking. A plethora of pattern recognition approaches are showing great promise for discovering highly accurate and specific patterns in highly complex mass spectral data (3-6). Two recent studies (6,7) have been published by groups analyzing our low-resolution SELDI-TOF MS data. While one approach (7) concluded that the data show evidence of noise bias and the inability of features to transcend separate data sets, the other group concluded that transcendent features could be found (6). In fact, Zhu et al. identified features that not only were able to accurately classify our 4-3-02 data set in a training/testing mode, but that the features used were able to predict (with 100% sensitivity and 100% specificity) the “Ovarian dataset 8-7-02” used as a separate blinded validation set. At this early time, it would seem wise to use a full complement of approaches to find buried diagnostic information. Clinical implementation of such a combined analytical investigation is our current approach.
Correlative science now exists in a metastable state where high n-dimensional data, produced by gene microarrays, protein arrays, and mass spectrometry, is being produced and “consumed” without a system to ensure that the producers and consumers have the full complement of technical expertise and information to correctly, fairly, and objectively analyze the data. While we applaud scientific investigations using our publicly available mass spectral data, the publication by Sorace et al. (1) highlights the dangerous potential for error propagation that may arise if a disconnect is allowed to exist between the data producers and the data consumers. We believe that this emerging disconnect could have a dramatic negative impact in the fields of both proteomics and genomics.
Noise vs. Bias in Mass Spectral Data: Not Equivalent
In the report by Sorace and Zhan (1), which focuses solely on one low resolution SELDI-TOF MS serum data set, the authors determined several sets of mass spectral features, different from those described on our web site, that are able to accurately classify 100% of a randomly selected testing set from a subset of this specific SELDI-TOF MS serum data set. Some of the discriminating mass spectral features that were found in the study by Sorace and Zhan were of extremely low m/z, such as 2.79. Thus the authors concluded, “The ability to discriminate between cancer and control based on m/z values of 2.79 and 245.5 reveals the presence of a significant experimental bias not related to disease pathology, that likely involves machine noise and matrix effects” (1). The authors then state “This is particularly true of the m/z value at 2.79 which represents a bias of the mass spectrometer itself” (1). The authors then extend this conclusion to question the entire SELDI-TOF MS data set, including many other data sets that they did not in fact analyze. We respectfully challenge these broad conclusions as judgmentally biased and scientifically unfounded, and propose that the authors made assumptions that are incorrect considering the physical principles and standard calibration methods of mass spectrometry.
Conclusions about the ascribed size of the features identified in a TOF mass spectrum can only be rigorously made within the m/z region that the mass spectrometer is calibrated. To achieve the most accurate calibration, the collection of peptide/protein calibrants should significantly span most of the m/z region of interest. The finite mass accuracy of the experimental measurements is a direct function of the specific mass spectrometer (e.g. mass linearity and electronic stability) and the accuracy of the peak centroid, which depends on the statistical reproducibility of the signal). In our own laboratories, we have a formal calibration process, written as a standard operating procedure (SOP) applied at the beginning and end of every run. The calibrant with the lowest molecular weight that we use for the Ciphergen PBS machines is arg-vasopressin, with an accurate mass of 1084.2474. As a result of a standard linear extrapolation of the collection of calibrant measurements, TOF data collected below and above the least and highest mass calibrant should not be used for pattern classification, especially in the clinical setting. In fact, a by-product of this extrapolation can be seen in the data itself whereby negative m/z value artifacts arise due to the results of the curve fitting. Hence, a feature found in our data at m/z 2.79 is no more accurate than if it were found at -2.79. The exact molecular weight of this feature is not 1.79 (assuming an [M+H]+ ion, which is also not necessarily true) any more than it is 26.9 or 278.0. Even though a feature can be reproducibly observed, measurement is not accurate because it falls outside the effective working calibration range. While some investigators wondered in their paper about our calibration method (7), we adhere to strict SOP’s whereby any TOF MS is calibrated at the beginning of every analysis and checked at the end of every analysis for drift using a collection of several peptides that span the m/z region of interest. If contacted, as producers of the data, we could have specified the calibration boundaries and described the detailed methodology in our laboratory, as we have done with many other investigators. Sorace and Zhan conclude that bias in the mass spectrometer accounted for the results, and since they believe “noise” is a discriminator, then biology is not being accounted for (1). However noise and bias are independent properties of data and should not be equated.
By definition, noise, because of its unpredictable and random nature, would be unable to classify clinical states in blinded samples. Bias, on the other hand is not random, and is based on a tractable perturbation. The argument is not over whether accurate classification is possible based on real peak information within the mass spectra. Instead the question is whether or not the diagnostic mass spectral features are underpinned by the fundamental disease mechanism (e.g. biology) or arise from non-biologic perturbations that are exclusive to one biologic state. Bias can be introduced at a variety of points, and can certainly confound pattern recognition methods by producing real features that are not biology-related. Sources of bias include sample handling differences between cases and controls, preparation and application of the LDI matrix, the use of different SELDI ProteinChip lots for cases versus controls, or a bias in the mass spectrometer at the time the cases or controls are separately analyzed. The latter, in fact, is what Sorace and Zhan seem to propose as a classifying bias (1), but do not present scientific evidence supporting this assertion.
Bias is certainly something we highly scrutinize and attempt to eliminate at every level in our investigations. Sources of bias can exist in the actual clinical study set itself (i.e. lead-time bias, length bias, etc.) and need to be understood. If the investigators would have contacted us, we could have elaborated, as previously stated on our website, that the SELDI-TOF MS data was produced by randomly commingling cases and controls. On any given 8 spot ProteinChip array, both cases and control samples were applied in random spot locations, to minimize systematic chip-to-chip variation. Bioprocessor sample application, roboticized matrix application, and MS analysis are all performed in a batch process to insure that each sample, regardless of disease category were treated identically, thus minimizing any bias due to chemical noise or process effects. All samples were run on the same single MS machine. We also controlled for bias in the sample collection and handling. Identical SOP’s were used for cases and control sample collection. In fact the same staff collected, processed and archived all samples used for these studies. Importantly, the sera from the cases were all obtained prior to diagnosis, surgery and clinical staging. Artifact caused by changes in process method during experimentation can be very problematic, and is an issue we minimize by close monitoring. Discontinuance of the original hydrophobic interaction H4 chip surface, initially used several years ago for our first report (3), has made it impossible to prospectively test potential process variances for that particular surface. Subsequent analysis and immediate web posting of the same sample sets using alternative chip chemistries have indicated that there does not appear to be bias in the samples themselves.
The notion that any low molecular weight ion classification is categorically caused by a systematic bias in the instrumentation or sample processing is unwarranted. As Sorace and Zhan note, investigators analyzing mass spectral data often choose not to analyze portions of the mass spectra that contain features with very small m/z values, since these regions contain ions that arise from the ionized organic acid matrix, the so called “chemical noise” which could possibly confound features that emanate from the clinical sample of interest. The detailed processes underlying desorption and ionization of molecules by LDI is rather poorly understood at the most fundamental levels. It is known that a number of events can contribute to “noisy” mass spectra that include molecular ion fragmentation or metastable ion formation from improper laser or ion source conditions, contaminants such as salt or detergents in the sample generating a signal or causing adduct formation, and dimer/trimer formation in plume reactions in the plasma just above the LDI target surface. Peaks can also arise from mass spectral peak “ringing”, or periodic noise that can be propagated throughout the spectra. Baggerly et al (7) concluded that they could achieve perfect classification of our posted data sets based on “noise”. We respectfully believe what the scientists meant was that since the discriminatory regions they found were of such low m/z and intensity and appear where many common matrix-associated peaks are found, that they cannot be associated with a bona fide biological process but must be emanating from a sample or instrumental process bias. In fact, we believe that the investigators may be confusing the terms “noise” and “bias”. If features are found that are located at m/z regions and can accurately classify phenotype in blinded sets of data, these features cannot be noise by definition.
Thus, the real question is not discrimination by noise, but do these mass spectral features relate to a biological process or are they being generated as a result of systematic bias. Before discussion of bias and bias reduction methods used in our investigations, it would be helpful to reflect on what little we know about the information archive of serum and plasma in the range we are studying by mass spectrometry in our laboratory (< 15,000 Da). The field of clinical chemistry has not yet established a thorough knowledge base of the compendium of molecular entities that exist in serum or plasma in the low molecular weight range. At this time, there is no complete list of all of the peptides and molecules that are normally found in the circulation of humans. Only recently has the first attempt been made at understanding this complex compartment of the blood proteome (8). Moreover, it would be a misconception, often made, that MALDI signatures of complex body fluids such as serum are comprised solely of whole proteins. Organic metabolites, lipids (such as lysophosphatidic acid or LPA), small peptides, and protein fragments are all efficiently analyzed by LDI-TOF MS. It should be self-evident that molecules such as LPA would not be “noise”, but could represent a real ion peak(s) below 2000 m/z and have a biological basis for existence. We were careful, in fact, to make note of this point in our original Lancet publication (3) that the observed mass spectral features populating the patterns could arise from both proteomic, lipid and metabolic (e.g. the collection of small molecule metabolites) information. Discussions about noise and bias are all very important. However sweeping conclusions but must be couched in the reality that very little information is available about the low molecular weight biologic content of blood. The region below 2000 m/z contains all of the above types of noise, with electronic noise spread throughout. As we have no working knowledge of the chemicals, metabolites and small peptides that exist in the region, it is our opinion that any feature that can reproducibly classify important disease states such as early stage ovarian cancer warrant further evaluation. Indeed, in the application of metabolomics (the use of NMR to study disease-associated small molecule metabolite patterns in biological fluids) disease diagnosis is predicated on the ability to observe metabolic dysregulation events associated with cancer cells (9). Our contention is that only extensive validation will enable us to determine which features are driven from biologically related metabolic and peptidic signatures in this region. Conclusions that any peak in the sub-2000 m/z region can only represent some non-biologically produced bias is speculative at best, unsubstantiated by the lack of knowledge about information content in this region, and ignores the known existence of biologically relevant molecules in this MW region (e.g. LPA). All conclusions about the importance of a mass spectral feature, irrespective of the perceived mass, should be based solely on blinded validation results.
A further example of incorrect assumptions relates to statements about the median age of the control and cancer patients used in the 8-7-02 study, Sorace and Zhan claimed these values were 47 and 60 years old respectively (1). We are not clear where these numbers were derived, as they were not stated or included with the “Ovarian dataset 8-7-02” data study set, nor were the authors privy to that information from the clinician providing the samples (D. Fishman). This inaccuracy forms the basis of the next conclusion by Sorace and Zhan that based on this age difference, “introduce a bias in the results reported in this study as well as others derived from this dataset” (1). The ranges of the ages of patients in the 8-7-02 study set overlap substantially (cases = 32-78 years; controls = 23-83 years). Furthermore, and most importantly, there are many pre-menopausal stage I cancers that are much younger than many of the post-menopausal controls in all of our study sets. If age was a driver for classification, then the accuracy seen in blinded testing would be hard to explain: all of the pre-menopausal stage I cancers were classified correctly, as were all of the post-menopausal controls.
Source of Diagnostic Information in Mass Spectral Data
What is the source of the low and ultra low molecular weight information content of the blood that we observe by SELDI-TOF MS? Recent published studies from our laboratory reveal that this low mass archive seen by SELDI-TOF exists in the serum in a bound form- most likely to albumin and/or a combination of other high abundance circulating carrier proteins (10,11,12). These high abundance carrier proteins harvest low abundance biomarkers and accumulate their information over time. Some investigators have wondered how mass spectrometry can be sensitive enough to detect low abundance biomarkers, and have questioned repeatedly whether the biomarkers found are not just highly abundant non-specific molecules (13). Potential identification of some of the MS peaks from different disease biomarker studies has been reported (14,15); with many of these diagnostic peaks representing clipped or modified (e.g. glycosylated) forms of higher abundant proteins. A specific sized fragment or specific post-translational modification of a high abundance protein could be both diagnostic and low abundance. It would be scientifically inaccurate to generalize that these alternate forms of high abundant proteins are epiphenomena (13), and wrong to assume that these isoforms exist in the same molarity as the parental intact molecule (13). It is very possible that disease processes can generate specific products that are highly disease specific. For example, isoforms of otherwise highly abundant proteins such as beta-tubulin may be specifically produced and become immunogenic, as well as possibly serve as a cancer biomarker (16). The “harvesting” of these small biomarker fragments may prevent their rapid clearance (glomerular filtration) by the kidney, and can explain why mass spectrometry may indeed by capable of detecting them. We have now shown that low abundance biomarker fragments can be amplified at least 200 fold by accumulation on highly abundant carrier proteins such as albumin (10). Currently, most investigators deplete the serum and plasma of these high abundance carrier proteins in an attempt to see the lower abundance proteins. We hope that this new scientific discovery will illuminate the means to prevent the a priori elimination of a vast repository of information when embarking on a discovery-based endeavor.
We commend scientists for endeavoring to guard the accuracy of MS data, and we will continue to post all of our spectral data in the public domain, without restriction. Each web posting by our group represents a “bookmark” of where our laboratory is in the continued development of proteomic pattern based testing. We would like to point out that both the 4-3-02 and 8-7-02 ovarian cancer related data sets were posted by us as “producers” as a service to the community and were not used as a basis of a scientific publication. To prevent the dissemination of inaccuracies and speculative conclusions, we believe that the producers of genomic and proteomic data should be intercalated more fully into the publication process, particularly when the focus of the publication is the analysis of data that the submitting authors have not generated.
From Patterns to Identity: When and Why?
The state of the MS-based profiling field today is analogous to a time point in the recent past when diagnostic antigens were identified by their appearance at a specific molecular mass on Western blot. PSA and CA125, some of the most widely used cancer biomarkers today, began at this level. The underlying protein sequence of PSA and CA125 was determined long after they were in clinical use as an antigen marker (17). A reproducible peak at an exact m/z value as detected by a mass spectrometer can be considered a current equivalent to the appearance of an antigen band on a gel, but with potentially much higher accuracy and precision. Thus, history tells us that sequence identity is not required for clinical utility, and may not immediately reveal pathogenesis. Dogmatic and politicized statements about the need for sequence identity prior to clinical implementation are not supported by historical context. We believe a much more circumspect tone by all of us is appropriate. Today, clinicians are urgently in need of any useful biomarker, and the dry pipeline of new FDA approved protein markers from the past decade is worrisome (18). The kallikrein family of proteins, proposed as potential cancer markers (19) is illustrative of the clinical quandary we face with many biomarkers today. For clinical utility as a biomarker for a disease such as ovarian cancer, a test will not only need to be highly sensitive, but must have a specificity near 100%.The publication record on kallikreins indicates a relatively poor specificity and sensitivity, such that real clinical utility will be limited.
We actually know very little about human disease pathogenesis and the unique microenvironment in which biomarkers would be produced. Only recently has the field begun to acquire the necessary tools for molecular discovery and profiling. The archive of diagnostic information is just now being explored. In fact, the decision not to explore promising approaches that offer clinical utility today would be scientifically unjustified, and morally wrong.
Our laboratories are now currently sequencing the low abundance disease related protein fragments that are amplified by collection on circulating carrier proteins (10,11,12). This complex set of information comprises the patterns that we are detecting by SELDI-TOF MS, and explains why multiple discriminatory patterns are found within the same data set. Our first publication described just one of many combinations of ions and discriminatory patterns (3,20), while our recent high-resolution data revealed still more (21,22). Based on the biomarker complexity, it would be expected that separate classifying patterns could be discovered using identical starting input data. Imagine only ten biomarkers measured as one of two states (up or down). Mathematically the ten biomarkers can generate 2^10 patterns. In reality the number of low mass biomarkers is likely to be considerably greater. The labor intensive effort to definitively sequence proteins that contribute to the diagnostic pattern should be restricted to those that have been validated extensively using large clinical study sets. Very soon, a high-resolution MS pattern of ions will be replacing a pattern of identified protein fragments. A mass spectrometer will likely remain the analytical tool into the future, since it can rapidly detect diagnostic low molecular weight protein fragments, perhaps more accurately than a traditional immunoassay.
Concluding Remarks
In order to insure the highest level of scientific rigor and objectivity for proteomic pattern diagnostics, we are currently gathering retrospective and prospective data required for a non-commercial research-driven FDA 510 (k) submission. The indication will be the detection of ovarian cancer recurrence, and the predicate test will be CA125. The criterion to be met is equivalency between the predicate test and the new test. Data will be generated from at least three separate instruments, and prospective validation performed. Our reference laboratory performing the analysis and data generation is CLIA/CAP licensed. All raw data, and the results of the blinded trial will be posted for public view and analysis.
It is important, and disconcerting to us that while the low-resolution spectral data has been the focus of much attention and swirling controversy, our high-resolution SELDI-TOF MS data, which was produced on an ABI Qq-TOF instrument has been seemingly ignored. This data has been posted on our website and publicly available for over 8 months now, and represents an example iteration of the state of the art in our laboratory at this time. This data is the basis of an upcoming publication that will highlight our most recent efforts for clinical implementation (22). It is our opinion that high resolution spectrometry affords the best spectral reproducibility over time, can produce accurate mass tagging, as well as overall superior clinical accuracy (22). Other laboratories have also reported difficulties in maintaining low-resolution machine calibration and spectral reproducibility over time (23). We believe that a fair accounting of the state of the science can only be achieved by a thorough analysis of the entirety of all of the data sets. It is of questionable logic to us that unpublished data sets that were produced on a low-resolution spectrometer over 2 years ago would be analyzed so thoroughly, but not the high-resolution data that was produced by a platform that is much closer, in our opinion, to the clinic.
The aforementioned issues highlight the growing need for the editorial process to more thoughtfully consider the possibility that neither the “consumer” author(s) nor the reviewers have all of the necessary information about the posted data sets before a final conclusion is rendered. These are very important times for biomarker discovery. Establishing open communication among scientists is an essential part of the inherent contract we all have as scientists with the patients and families who are suffering daily. We are concerned that unchecked and unsubstantiated conclusions about openly available proteomic and genomic data will drive producing groups away from open access posting. This would be most unfortunate at the present critical time period in biomarker discovery. Finally, we must not forget that clinical applications and clinical efficacy should be the ultimate gauge by which we assess the impact of our biomarker work.
References
1. Sorace JM, and Zhan, M. A data review and re-assessment of ovarian cancer serum proteomic profiling BMC Bioinformatics 2003, 4:24.
2. Clinical Proteomics Program Databank website at www.ncifdaproteomics.com, results posted and updated as of 8/1/03.
3. Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone C, Fishman DA, Kohn EC, Liotta LA: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 2002, 359:572-577.
4. Cazares LH, Adam BL, Ward MD, Nasim S, Schellhammer PF, Semmes OJ, Wright Jr GL: Normal, benign, preneoplastic, and malignant prostate cells have distinct protein expression profiles resolved by surface enhanced laser desorption/ionization mass spectrometry. Clin Cancer Res 2002, 8:2541-2552.
5. Li J, Zhan Z, Rosenzweig J, Wang YY, Chan DW. Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin Chem. 2002 Aug;48(8):1296-304.
6. Zhu W, Wang X, Ma Y, Rao M, Glimm J, Kovach JS. Detection of cancer-specific markers amid massive mass spectral data. Proc Natl Acad Sci U S A. 2003 Dec 9;100(25):14666-71.
7. Baggerly KA, Morris JS, Coombes KR. Reproducibility of SELDI-TOF protein patterns in serum: comparing data sets from different experiments.
Bioinformatics. 2004 Jan 29.
8. Tirumalai RS, Chan KC, Prieto DA, Issaq HJ, Conrads TP, Veenstra TD. Characterization of the low molecular weight human serum proteome. Mol Cell Proteomics. 2003 Oct;2(10):1096-103. Epub 2003 Aug 13.
9. Lindon JC, Holmes E, Nicholson JK. (2003) So what’s the deal with metabonomics? Anal. Chem. 75, 384A-391A.
10. Mehta AI, Ross S, Lowenthal MS, Fusaro V, Fishman DA, Petricoin EF 3rd, Liotta LA. Biomarker amplification by serum carrier protein binding. Dis Markers. 2003-2004;19(1):1-10.
11. Liotta LA, Ferrari M, Petricoin E. Clinical proteomics: written in blood. Nature. 2003 Oct 30;425(6961):905.
12. Zhou, M. Lucas, D. A., Chan, K., Issaq, H. J., Petricoin, E. A. III, Liotta, L. A., Veenstra, T. D., and Conrads, T. P (2004) Investigation into the Human Serum Interactome, Electrophoresis, in press
13. Diamandis E, Mass spectrometry as a diagnostic and a cancer biomarker discovery tool: Opportunities and potential limitations. Mol Cell Proteomics. 2004 Jan 30
14. Zhang, Z., Bast Jr., R.C., Fung, E.T., Yu, Y., Li, J., and Rosenzweig, J., et al. (2003) A panel of three potential biomarkers discovered from serum proteomic profiling improves the sensitivity of CA125 in the detection of early stage ovarian cancer – a multi-institutional study. Proc. AACR Abstract #5739
15. Ye B, Cramer, DW, Skates, SJ, Gygi SP, Pratomo V, Fu, L, Hornick NK, Licklider LJ, Schorge JO, Berkowitz RS, and Mok SC. Haptoglobin-alpha subunit as potential serum biomarker in ovarian cancer: identification and characterization using proteomic profiling and mass spectrometry. Clin. Cancer Res. 2003 9, 2904-2911
16. Prasannan L, Misek DE, Hinderer R, Michon J, Geiger JD, Hanash SM. Identification of beta-tubulin isoforms as tumor antigens in neuroblastoma. Clin Cancer Res. 2000 Oct;6(10):3949-56.
17. Bast RC Jr, Klug TL, St John E, Jenison E, Niloff JM, Lazarus H, Berkowitz RS, Leavitt T, Griffiths CT, Parker L, Zurawski VR Jr, Knapp RC. A radioimmunoassay using a monoclonal antibody to monitor the course of epithelial ovarian cancer. N Engl J Med. 1983 Oct 13;309(15):883-7.
18. Anderson NL, Anderson NG. The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics. 2002 Nov;1(11):845-67.
19. Yousef GM, Diamandis EP. An overview of the kallikrein gene families in humans and other species: emerging candidate tumour markers. Clin Biochem. 2003 Sep;36(6):443-52.
20. Petricoin, EF, Mills, G, Kohn, E., and Liotta, LA “Proteomic Patterns in serum and identification of ovarian cancer” 2002. The Lancet. Letter: 360: 170-171.
21. Conrads TP, Zhou M, Petricoin EF 3rd, Liotta L, Veenstra TD. Cancer diagnosis using proteomic patterns. 2003 Expert Rev Mol Diagn. Jul;3(4):411-20.
22. Conrads TP, Fusaro VA, Ross S, Johann D, Rajapakse V, Hitt BA, Steinberg SM, Kohn EC, Fishman DA, Whiteley G, Barrett JC, Liotta LA, Petricoin EF, and Veenstra TD. High-resolution Serum Proteomic Features for Ovarian Cancer Detection. In press: Endocrine Related Cancer, June 2004.
23. Rogers MA, Clarke P, Noble J, Munro NP, Paul A, Selby PJ, Banks RE. Proteomic profiling of urinary proteins in renal cancer by surface enhanced laser desorption ionization and neural-network analysis: identification of key issues affecting potential clinical utility. Cancer Res. 2003 Oct 15;63(20):6971-83.

Competing interests

None-
We are the sole producers of the data analyzed by Sorace and Zhan

Archived Comments for: A data review and re-assessment of ovarian cancer serum proteomic profiling

Proteomic Pattern Diagnostics: Producers and Consumers in the Era of Correlative Science

Competing interests

BMC Bioinformatics

Contact us