- Open Access
Identification of novel prognostic markers in cervical intraepithelial neoplasia using LDMAS (LOH Data Management and Analysis Software)
© Hamoudi et al; licensee BioMed Central Ltd. 2005
- Received: 21 October 2004
- Accepted: 26 January 2005
- Published: 26 January 2005
Detection of Loss of Heterozygosity (LOH) is one of the most common molecular applications in the study of human diseases, in particular cancer. The technique is commonly used to examine whether a known tumour suppressor gene is inactivated or to map unknown tumour suppressor gene(s). However, with the increasing number of samples analysed using different software, no tool is currently available to integrate and facilitate the extensive and efficient data retrieval and analyses, such as correlation of LOH data with various clinical data sets.
An algorithm to identify prognostic disease markers is devised and implemented as novel software called LDMAS. LDMAS is a software suite designed for data retrieval, management and integrated analysis of the clinico-pathological data and molecular results from independent databases. LDMAS is used in stratification of disease stages according to clinical stage or histological features and correlation of various clinico-pathological features with molecular findings to obtain relevant prognostic markers such as those used in predicting the outcome of cervical intraepithelial neoplasia (CIN). This approach lead to the identification of novel prognostic cervical cancer markers and extraction of useful clinical information such as correlation of Human Papilloma Virus (HPV) status with CIN lesions.
A novel software called LDMAS is implemented and used to extract and identify prognostic disease markers. The software is used to successfully identify 4 novel prognostic markers that can be used to predict the outcome of CIN. LDMAS provides an essential platform for the extraction of useful information from large amount of data generated by LOH studies. LDMAS provides three unique and novel features for LOH analysis : (1) automatic extraction of relevant data from patient records and reports (2) correlation of LOH data with clinico-pathological data and (3) storage of complex data in flexible format. The first feature automates the creation of database of clinically relevant information from huge amount of data, the second feature extracts useful biomedical information such as prognostic markers in CIN and the third feature simplifies the statistical analyses of the data and allows non-statisticians to carry out the analysis. Additionally, LDMAS can be used to extract clinically useful markers from other diseases and interface to high throughput genotyping analysis software such as GDAS used to generate LOH data from Affymetrix® GeneChip Mapping arrays.
- Human Papilloma Virus
- Cervical Intraepithelial Neoplasia
- Human Papilloma Virus Infection
- High Risk Human Papilloma Virus
- Human Papilloma Virus Status
Detection of LOH is one of the most common molecular applications in the study of human diseases, in particular cancer. It is commonly used to examine whether a known tumour suppressor gene is inactivated or to map unknown tumour suppressor gene(s). Detection of LOH not only helps in understanding the molecular mechanisms underlying the development of cancer, but also provides important information useful for disease diagnosis and prognosis. LOH detection is commonly carried out by the analysis of microsatellite markers using an automated DNA sequencer. With the raw data from the sequencer being stored in one file per lane together with corresponding clinical information and patient follow up data, each LOH study [1, 2] generates hundred of files that need to be organised and related in a structured format. However, with the increasing number of samples analysed using different software, no tool is currently available to facilitate the extensive and efficient data retrieval and analyses, such as correlation of LOH data with various clinical data sets. We have developed a novel software package: LOH Data Management and Analysis Software (LDMAS) in order to satisfy these needs. LDMAS can retrieve LOH data from automated DNA sequencer platform and clinical data from any patient record system and correlate different data sets according to the user's choice. Here we present how LDMAS interfaces to Genotyper software (ABI, Foster City, CA) which is used to determine the presence of LOH, and the patient record system SunQuest (San Francisco, CA), facilitating the identification of LOH markers associated with the development of CIN . CIN show variable clinical behaviour despite morphological homogeneity within each subgroup. Clinically, it is vital to distinguish CIN lesions with different behaviour and identify those likely to persist and progress despite treatment.
MRES (Medical Report Extractor Software) which parses patient report files, extracts the information of interest and organises it into a structured format, applicable to LDAS
LDAS (LOH Data Analysis Software) which obtains LOH data from Genotyper (Applied Biosystems, California) or GDAS (Affymetrix, California) software and correlates it to clinical data obtained from MRES
LDMS (LOH Data Management Software) which is used to gather patients' clinico-pathological data and extract significant relationship between the various data sets
Type of input data for LDMAS modules
Date of Birth
Pathological specimen Number
Date of Diagnosis
The user can manually check the data and use it as template for analysis. Data analysis is carried out using LDAS which obtains LOH data from Genotyper and correlates it to the clinical data obtained from MRES. LDAS obtains data in plain text format and can thus be easily interfaced to any LOH platform generating software such as Genotyper and GDAS. Finally all data are entered into LDMS for storage and intelligent mining of data. Database query results can be exported back to LDAS allowing correlation between LOH and various clinico-pathological parameters such as age, histological grade, treatment modality and their responses, and HPV status as well as carry out multivariate analysis to determine the sensitivity and specificity of the markers involved in the LOH study. A more detailed description of all the modules is provided in LDMAS user's guide [Additional file 1] and the example below.
Advantages of LDMAS
LDMAS offers several advantages to users. It is user friendly and its architecture is modular allowing versatility of use. It enforces the standardisation of procedures for studies involving large cohorts of individuals. The data is well organized since LDMAS systematically assigns LOH results of each case to its corresponding clinical information. Additionally, LDAS standardizes LOH data analysis implicitly and allows the user to edit the data manually if needed. Microsoft Excel has been chosen to implement LDAS because of its wide use, versatility and convenient statistical analysis features facilitating the implementation of multivariate analysis and correlation testing between LOH and clinico-pathological parameters.
LDMAS application in identification of LOH markers associated with persistence / progression of cervical intraepithelial neoplasia
We divided the CIN groups into disease free indicating cases that become CIN free after treatment, and disease persistence/progression indicating cases that develop show progression or persistence of CIN despite treatment. We used LDMAS to retrospectively examine the prognostic value of LOH at 12 microsatellite markers including 10 from 3p14, 3p22-21, 6p21 and 11q23 which are frequently deleted in cervical cancer [3, 4], in 164 cases of CIN lesions using archival cytological/histological specimens. LOH was further correlated with high risk HPV infection.
Validation of prognostic markers associated with persistence / progression of CIN
To validate this finding, LOH at these four markers was investigated in a further series of 93 cases. Compatible results were obtained from these additional cases.
comparison using χ2 (chi-squared) test of LOH at each of the four microsatellite markers with age, various methods of treatment, different subtypes of HPV infection and between CINs showing disease free or disease persistence/progression.
correlation of LOH data with histological grade of CIN, treatment response and various HPV subtypes.
Through such complex analysis, we showed that concurrent LOH at two of the four microsatellite markers could identify 47% of CINs that showed disease persistence/progression with 100% specificity . Furthermore, LOH at D3S1300 was found to be significantly associated with HPV16 infection. Part of this data analysis is supplied in the LDMAS guide [see Additional file 1]. More detailed analysis of this study is described in .
Algorithm for identifying prognostic disease markers
Divide the disease in groups according to the pathology staging
Parse patient data from clinical records and use the groups defined in part (1)
FOR each microsatellite marker
carry out a two tailed student t-test between the disease groups using LOH data
IF t-test p ≤ 0.05
Marker is significant in prognosis of the disease
Validate the prognostic markers using χ2 (chi-squared) test of LOH with clinico-pathological data and correlation of LOH data with histological grade of CIN, treatment response and various HPV subtypes.
LDMAS has been implemented using the above pseudocode.
We have devised an effective algorithm to identify and extract useful markers that can be used to predict the outcome of disease and used the algorithm to successfully identify 4 novel prognostic markers that can be used to predict the outcome of CIN. The algorithm was implemented in a novel software called LDMAS which provides an essential platform for the extraction of useful information from large amount of data generated by LOH studies. Furthermore, LDMAS is used to efficiently store, manage and track the data. Its flexible nature allows the easy manipulation of data facilitating complex analysis as demonstrated in the current study. The various modules of LDMAS can be easily adapted and used with other applications such as high throughput LOH and genotyping using SNPs on Affymetrix® GeneChip Mapping arrays and fingerprinting studies. Modules such as MRES can be used independently to parse medical records facilitating extraction of specific clinical information of interest. Additionally, LDMAS can be used to extract clinically useful markers for other diseases.
The source code and executable files for LDMAS modules as well as user manual including examples from real study data are freely available and can downloaded from our website at : http://molpath.his.path.cam.ac.uk/bioinformatics/LDMAS.shtml
Additionally examples of input files are provided from our website for users to test the software and assess its functionality.
- Leenstra S, Oskam NT, Bijleveld EH, Bosch DA, Troost D, Hulsebos TJ: Genetic sub-types of human malignant astrocytoma correlate with survival. Int J Cancer 1998, 79: 159–165. 10.1002/(SICI)1097-0215(19980417)79:2<159::AID-IJC11>3.0.CO;2-5View ArticlePubMedGoogle Scholar
- Tamura S, Nakamori S, Kuroki T, Sasaki Y, Furukawa H, Ishikawa O, Imaoka S, Nakamura Y: Association of cumulative allelic losses with tumor aggressiveness in hepatocellular carcinoma. Journal of Hepatology 1997, 27: 669–676. 10.1016/S0168-8278(97)80084-0View ArticlePubMedGoogle Scholar
- Giannoudis A, Herrington CS: Human papillomavirus variants and squamous neoplasia of the cervix. Journal of Pathology 2001, 193: 295–302.View ArticlePubMedGoogle Scholar
- Lazo PA: The molecular genetics of cervical carcinoma. British Journal of Cancer 1999, 80: 2008–2018. 10.1038/sj.bjc.6690635PubMed CentralView ArticlePubMedGoogle Scholar
- ELhamidi A, Hamoudi RA, Kocjan G, Du MQ: Cervical intraepithelial neoplasia: prognosis by combined LOH analysis of multiple loci. Gynecologic Oncology 2004, 94: 671–679. 10.1016/j.ygyno.2004.06.013View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.