QCScreen can be used to detect and examine predefined analytical features from full scan LC-HRMS data with the aim to provide a quick and easy-interpretable graphical overview of data quality.
Data files referring to specified sample categories (e.g. blanks, QC samples, QC standards, matrix QCs or experimental samples) originating from one or multiple measurement sequences can be selected and inspected using predefined features. For the features and samples under investigation, QCScreen visualizes the stability and precision of the chromatographic separation step, MS sensitivity, m/z value and mass accuracy (±ppm) over time. The generated graphical illustrations and a coloured quality overview table enable quickly spotting putatively problematic data, i.e. features, samples or measurement sequences. QCScreen was implemented in the Python programming language (v. 2.6) and uses the Qt framework for the graphical user interface (GUI), R Version 2.12 and SQlite for data management.
QCScreen loads the centroided raw data in the mzXML format [16]. The GUI of the software tool contains a main area for selection of measurement sequences, sample type categories and features to be investigated and provides the option to define parameters, display settings as well as tolerance limits for the parameters. The then observed feature characteristics are evaluated against predefined or data-based performance criteria. For each feature, QCScreen creates an illustration of extracted ion chromatogram (EIC), retention time (tR), mass-to-charge ratio (m/z), mass accuracy and feature area across the provided data files. Finally, evaluation results of all tested features and parameters are saved in tabular form.
In the following the implemented sections are described.
Sequence input and sample type selection
With the software, the user is able to load LC-HRMS data files which must be listed in comma separated value (csv) format, see Fig. 1a). For this, the LC-HRMS raw data first have to be centroided and converted to mzXML format, which can be done with several open-source software tools, e.g. MSConvert [17]. Typically, such a list may consist of a measurement sequence with data files for biological samples and QC samples periodically measured between the biological samples. Based on their file names, QCScreen automatically generates different sample type categories (e.g. blank QC, matrix QC or biological sample) (Fig. 1b). For this, file names must have a unique separator symbol marking the cutoff point to group the files for further data processing (see Additional file 1). The program allows the analysis of multiple sample type categories.
Target feature definition
Within a local table, target features, of which the quality-related parameters should be checked, are managed. A table view within the GUI allows to add, modify or delete target features. It is also possible to load existing target feature lists, which must be provided in a certain tabular format. Selected target features are then considered for further automated analysis by QCScreen. For every feature, at least an expected tR, the ion species and the target m/z value or molecular formula are required, see Fig. 1c.
Selection of processing parameters and evaluation criteria
For all QC related parameters the software tool offers default parameter settings. Before data processing is started, quality parameter settings can be adjusted. The GUI offers a menu to adjust settings for tR (±sec.) and m/z deviation (±ppm) tolerance windows, which are used for chromatographic peak picking via a wavelet implementation [18].
Additionally, for every target feature the parameters tR, m/z, EIC feature area and the associated tolerance windows for classification into four quality categories have to be specified. For this, either predefined fixed target values, or alternatively, data-based experimental arithmetic mean, relative bias and standard deviation of the respective feature parameter values can be used, around which the tolerance windows are constructed for performance classification.
Results generated by the software
When the software finished calculation, a coloured quality overview and for every evaluated performance criterion, a graphical illustration of the generated results is created. For each evaluated parameter i.e. extracted ion chromatograms (EICs), tR, mass accuracy and feature area, the results are plotted against the order of the data files. Moreover, settings e.g. for different scaling of x-axis according to chronological or real (acquisition) time of data files can be adjusted by the user.
For every parameter, an illustration consisting of plots for the parameter values per feature for all processed samples is depicted. For later reference and data evaluation, a list containing further data with the respective arithmetic mean, standard deviations and the calculated parameter values is available in tabular output format. Additionally, the predefined tolerance windows are plotted and for feature area, tR and m/z values, box plots are generated to illustrate the precision of these parameters per sample type category.
Coloured quality overview
The coloured evaluation overview offers an easy-to-interpret illustration of the specified performance criteria (i.e. target value ± tolerance limits, Additional file 1). Feature area precision, tR precision and mass accuracy are shown for evaluated features in the respective data files. Additionally, the results are flagged with four different colours ranging from green (parameter is within expected values) to red (parameter is not within expected values) to provide a visual impression of the overall analytical data quality according to the preselected performance criteria. The purpose of this illustration is to enable an immediate identification of problematic features, parameters or samples. The evaluation overview is arranged as a table of features (rows) and samples (columns). For each entry in this table, the quality-related parameter values are calculated. The retention time parameter values - tR (min), which were experimentally found for the respective features in the evaluated samples, and the tR deviation (min) to the tR specified in the target feature list or to the calculated average tR are displayed. Next to it, the mass parameter m/z of the feature found in the sample and the mass accuracy (±ppm) relative to the predefined standard mass or to the calculated average mass are given. At last, the feature area found in the sample and the relative bias for every feature in the respective sample, are displayed. A summary of the average parameter values per feature is given at the right end of the matrix.
Extracted ion chromatogram (EIC), relative isotopolog abundance (RIA) and feature area illustration
The EIC is defined as the feature intensities at a certain m/z (±ppm) value which is plotted as a function of retention time. For each target feature an overlay of all EICs, one per data file, is displayed. This illustration facilitates the visual assessment of the peak profile of the inspected feature(s) across all processed LC-HRMS data files. Optionally, the relative isotopolog abundances (RIA) for Carbon are calculated as the ratio of the 13C monoisotopic MS peak to the first isotopic MS peak 12C. The graphical illustration showing the accuracy of the experimentally derived RIA and its bias can help to decide whether the data is suitable for sum formula calculation from the experimental RIA values. It should be noted that depending on the mass analyser in use and the mass of the inspected molecule, the resolving power of a mass spectrometer may not allow to completely resolve the isotopic fine structure of the isotopologs under investigation. For sum formulas containing a high proportion of heteroatoms such as N, O or S, the calculated RIA can therefore be biased. To obtain a reliable prediction of the elemental composition this has to be considered. Please also refer to Additional file 1, page 24 for an example.
In the feature area illustration, the integrated area under the chromatographic peak of the respective feature is plotted against the measurement order. This plot can help detect changes or instability of the chromatographic process, MS detector drifts or feature area offsets between data files within or across multiple measurement sequences.
tR illustration
For every evaluated feature, the determined retention time of the EIC peak is plotted against the different samples to illustrate the chromatographic stability. If specified in the target feature list, different ion species originating from the same metabolite are displayed in parallel within one plot to visually check for their agreement and stability of retention time.
m/z value and mass accuracy illustration
QCScreen generates plots depicting the measured m/z (±ppm) values of a chromatographic peak as well as their arithmetic mean and standard deviation against the processed data files. With this type of illustration, the mass accuracy (±ppm) of a specified target feature against the given standard m/z or the calculated average m/z is determined and represented. These data can for example also be useful for the selection of input parameters for other LC-HRMS data processing software (e.g. XCMS [19]) or a later database search.