Volume 16 Supplement 18
Condensing Raman spectrum for single-cell phenotype analysis
- Shiwei Sun†1,
- Xuetao Wang†2, 3,
- Xin Gao4,
- Lihui Ren2,
- Xiaoquan Su2,
- Dongbo Bu1Email author and
- Kang Ning2, 3, 5Email author
© Sun et al. 2015
Published: 9 December 2015
In recent years, high throughput and non-invasive Raman spectrometry technique has matured as an effective approach to identification of individual cells by species, even in complex, mixed populations. Raman profiling is an appealing optical microscopic method to achieve this. To fully utilize Raman proling for single-cell analysis, an extensive understanding of Raman spectra is necessary to answer questions such as which filtering methodologies are effective for pre-processing of Raman spectra, what strains can be distinguished by Raman spectra, and what features serve best as Raman-based biomarkers for single-cells, etc.
In this work, we have proposed an approach called rDisc to discretize the original Raman spectrum into only a few (usually less than 20) representative peaks (Raman shifts). The approach has advantages in removing noises, and condensing the original spectrum. In particular, effective signal processing procedures were designed to eliminate noise, utilising wavelet transform denoising, baseline correction, and signal normalization. In the discretizing process, representative peaks were selected to signicantly decrease the Raman data size. More importantly, the selected peaks are chosen as suitable to serve as key biological markers to differentiate species and other cellular features. Additionally, the classication performance of discretized spectra was found to be comparable to full spectrum having more than 1000 Raman shifts. Overall, the discretized spectrum needs about 5storage space of a full spectrum and the processing speed is considerably faster. This makes rDisc clearly superior to other methods for single-cell classication.
KeywordsRaman Spectrum Linear Discriminant Analysis (LDA) K Nearest Neighbor(k-NN) Discretization
Bacteria, plants, animals, and all other organisms on the planet are derived from or composed of single cells. Genetically identical parent cells can produce cells with different functions due to the intrinsic variation among the individual offspring cells in gene structure, gene expression, and gene regulation[1, 2]. By monitoring microbial single cells in vitro along the time course and under varying conditions, we could effectively analyze how a population adapts to ever-changing conditions , such as those regarding nutrient supply or stress exposure. Microbiologists are especially interested in techniques centered on single cells because they serve as the basic unit of functional microorganisms, yet most microorganisms (>99%) have not yet been cultivated in the laboratory. There is increasing evidence that these uncultured microorganisms play crucial roles in ecosystems and have a pro-found impact on global warming (through carbon/nitrogen cycles), food security (through maintaining soil heathland-promoting plant growth), and environmental bioremediation.
In studying microorganisms, therefore, there is great promise of gaining substantial insight into fundamental physiological processes in microorganisms and of accelerating the development of superior strains for industrial biotechnology.
Single-cell technologies, such as the classical fluorescence-activated cell sorting (FACS) analysis  and the more recently developed Raman spectra profiling, can detect population diversity by observing distinct phenotypic parameters. Raman spectroscopy is an especially powerful analytical technique and has already been used in several studies on single cells . Raman spectroscopy is based on inelastic scattering of photons following their interaction with vibrating molecules of the sample. During this interaction, photons transfer (Stokes)/receive (Anti-Stokes) energy to/from molecules as vibrational energy. Thus the energy change of the scattered photons corresponds to the vibrational energy levels of the sample molecules. A single-cell Raman spectrum usually contains more than 1,000 Raman shifts, which provide rich information of the cell (e.g., nucleic acids, protein, carbohydrates and lipids), reflecting cellular genotypes, phenotypes, and physiological states . Therefore, a Raman spectrum could serve as a molecular ?fingerprint? of a single cell, enabling the distinction of various cells, including those from bacteria and animals, without prior knowledge of the cells (details about Raman spectroscopy can be found at ).
Materials and methods
Single-cell Raman spectrum datasets
Single-cells Raman datasets used in this work.
Number of cells
Thermoanaerobacter sp. X514
The rDisc includes two main parts: Raman quality-control procedure and Raman discretization via representative peaks. They are described, respectively, in the following sections.
The signals of Raman spectroscopy are weaker than others used during the signal acquisition, though surface-enhancing technology has been adopted to strengthen the energy of Raman signals[15, 16]. The Raman signals inevitably mix with several other components with energy a few orders higher, such as intrinsic fluorescence signals, and the random instrument noises, etc. Therefore, quality-control methods should be adopted for Raman spectrum preprocess.
In Figure 1, a three-step approach for quality control is defined to get a relatively high-quality spectrum: wavelet transform denoising, baseline correction, and normalization.
Discrete wavelet transformation denoising Due to the change of voltage, current and other instrument parameters, electromagnetic noise (such as Gaussian noise), and impulse noise are inevitably brought into the Raman spectrum, lowering the signal-noise ratio of Raman spectra. In this study we have used the discrete wavelet denoising method, which is suitable for complicated Raman spectra [17, 18].
Baseline correction The Raman spectrum baseline comes from the intrinsic back-ground signals , which always interfere with and even submerge the weak Raman signal. The baseline is generally much more intense than Raman signals and usually appears as low-frequent smooth curve. To mitigate the negative effect, it is necessary to correct the baseline Raman shift[22, 23]. To date, many baseline correction methods have been proposed, including (a) one-order or two-order derivation, (b) single or modified multi-polynomial fitting, and (c) wavelet transformation, widely used in the field of signal processing [24, 25]. However, for Raman spectrum with complicated background noise, these methods might not work well, possibly requiring certain user interference or taking a long time. In this study we have developed an automatic piecewise fitting method for Raman baseline correction.
First, an automatic algorithm detects the trough positions in a Raman spectrum; then each adjacent wave trough is fitted by a low-order fitting curves; and finally, all these fitted curves are connected to form the whole baseline. The result of correcting a Raman spectrum baseline is shown in Figure 1(C). It can additionally largely reduce the computational time entailed by other methods.
Raman spectrum discretization
Certain applications, such as Raman spectrum classification and searching for a large amount of spectra, are inefficient in light of the more than 1,000 spatial resolutions (Raman shifts) of Raman spectra (Figure 1(c)). Methods such as principal component analysis (PCA) and linear discriminant analysis (LDA) effectively reduce dimensionality of Raman spectra , However, while data dimensions in PCA and LDA analyses are linear combinations of all dimensions of original data, no individual dimension has special meaning. For single-cell Raman spectra, it was worth noting that some Raman shifts would have special biological meaning and might correspond to one or a set of compound structure(s) in the cell. Therefore, selecting representative peaks from all Raman shifts would be more suitable for dimension reduction of Raman spectrum (Figure 1(D)).
To accomplish this, we first defined a sliding window with a width of M Raman shifts, traversing through the whole Raman spectrum, and when the intensity in the center of the widow is larger than other points in this window, this center was considered as one candidate peak. In this way we can remove those small peaks such as peak "3" in Figure 3. Based on experience, we have set the value of M = 20 in this study. Secondly, we have calculated tan(θ) (in Figure 3 ) for each peak in step 1 to represent the peak sharpness, and when tan(θ) was less than threshold α, the peak (as peak "1" in Figure 3) too flat compared to other Raman peaks, would be moved.
The peak detection results in a discretization method (Figure 3) that can reduce the Raman spectrum size by more than 100 times. Thus the use of discretized spectrum, with its small amount of data, can significantly reduce time spent searching an unknown sample. Additionally, for selected peaks, or features, from single-cell Raman spectra, three pieces of base information could be used: peak position, peak intensity (height), or peak area information. The Raman spectrum after discretization process was referred to as 'discrete Raman spectrum', and we have also defined the rDisc format for these discrete spectrum. In rDisc format, each row constitutes a representative peak, with its shift position, intensity, and curate value (represent the peak's sharpness) provided. Single-cell Raman spectra in this format will be used in following analyses.
Similarity measurement between discrete spectra
The classification analysis was designed to evaluate whether the discretized spectrum could effectively represent the full spectrum. Accordingly, we defined some rules, by the discrete Raman spectra, to calculate the similarity of two Raman spectra, including peak position matching, peak intensity correlation, combination of peak position matching, and intensity correlation.
Availability of Raman spectrum data analysis method
We have integrated the above quality-control and discretized methods into a data analysis package named rDisc, available at: http://www.computationalbioenergy.org/imod.html All single-cell Raman spectra in raw data format as well as in rDisc format (representing discrete spectrum) have been stored or linked on this website.
Results and discussions
Evaluation of classification performance of discretized spectra
We performed experiments to evaluate the single-cell classification performance of a Raman discretized spectrum, compared to those based on full spectra. All Raman spectra were processed by the whole quality-control procedure. The five species from the QSpec Raman database were randomly divided into training group or test group by the separation ratio of 60%:40%. The separation and analysis has been repeated 10 times.
Comparison of different clustering methods.
Processing speed for single-cell Raman spectrum classification, with the rapidly increasing number of Raman spectra, could be a critical concern. rDisc based on discretized spectra has naturally enabled the efficient processing of single-cell Raman spectra as data size was reduced greatly. Here we have evaluated the execution efficiency of k-NN based on discretized spectra. We conducted all of these experiments on the computer with Intel Xeon E3-1225 CPU (4 cores in total, 3.2GHz), 4GB DDR3 ECC RAM, and 500GB hard drive. We have discovered that the classification time for each Raman spectrum in the dataset with 300 randomly selected Raman spectra from QSpec-DB was 0.009s for k-NN based on discretized spectra and approximately 0.051s for k-NN based on full spectra. Such a speed-up, of more than five times, would be especially useful in circumstances involved a large volume of single-cell Raman spectra.
There are several advantages to the discrete spectrum for single-cell Raman spectrum analysis, including speed, accuracy, and biomarker identification. It would therefore be very valuable across a variety of applications. First, its application in single-cell sorting could enable near real-time database comparison and search?due to its fast speed, culled from peak discretization, and high accuracy regarding Raman spectrum comparison. Thus, it would be especially suitable for single-cell profiling and large-scale sorting .
Second, it could be used for microbial community analysis based on single cells, in which single cells from hundreds or even thousands of unknown different strains need to be analyzed separately. Based on a large collection of single-cell Raman spectra, as well as its high accuracy for single-cell classification , rDisc could be used for such applications.
Third, discrete spectrum could be easier and more understandable to use for diagnosis based on single cells, as it could accurately identify biomarkers out of millions of single-cell Raman spectra. This would be especially useful in clinical application such as the diagnosis of early-stage cancer.
In applied settings, a method should classify closely related species based on Raman spectra data. However, the number of species with single-cell Raman spectra is currently small, meaning we do not have enough species to test the extent to which our method can classify closely related species. Given current results, we could say, already, that our method could accurately differentiate single cells based on their Raman spectra. We hope to soon obtain enough data to test it with the advancement of single-cell Raman spectrometry techniques.
Single-cell phenotype analysis, such as Raman spectrum analysis, is growing in popularity correspondent to increasing concerns and focus on single-cell analysis and applications. However, current single-cell Raman spectrum analysis is still limited by the lack of data model, data quality control, and high-performance data analysis methods.
In this work, we have proposed a single-cell Raman spectrum analysis method, named rDisc, to transfer a native Raman spectrum to a discrete spectrum, thus enabling the reasonably efficient and accurate analysis of a huge number of single cells. The results have shown discrete spectrum could improve the quality of Raman spectra and the classification accuracy. Second, single-cell Raman spectrum discretization would improve the speed of spectrum comparison. Third, representative peaks could be identified to differentiate certain species from others based on discrete spectra by rDisc, which could serve as biomarkers for that species. Having enabled all of these advanced features for single-cell Raman spectrum analysis, the rDisc method represents the latest advancement for single-cell phenotype analysis in the era of "big-data". Considering the ever-increasing number of single-cell Raman spectrum data, rDisc could be improved on two ends?speed and depth of data-mining. The speed of the current classification method could be further improved by utilizing the latest development of high-performance computation techniques such as GPU computation, as well as refined spectrum filtration method for their similarity assessment.
Given the huge number of different types of single-cells phenotype, deep datamining, including self-organizing map (SOM) and Bayesian network, could be used to analyze the relationships/redundancies between peaks, as well as refined clustering of single cells.
This work is supported in part by the National Basic Research Program of China (973 Program) grant 2012CB316502, National Science Foundation of China grants (31270834, 61103167, 31271410, 61303161, and 61272318), Ministry of Science and Technology's high-tech (863) grant (2012AA02A707, 2014AA21502), Chinese Academy of Sciences' e-Science grant (INFO-115-D01-Z006), and Sino-German Joint Research Project grant (GZ 878) Sino-German Center for Research Promotion (SGC). We truly appreciate Ansgar Poetschfrom Ruhr University of Bochum for insightful discussion on single-cell Raman spectra classification.
Publication charges for this article have been funded by National Science Foundation of China grant No.31270834.
This article has been published as part of BMC Bioinformatics Volume 16 Supplement 18, 2015: Joint 26th Genome Informatics Workshop and 14th International Conference on Bioinformatics: Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S18.
- Elowitz MB, Levine AJ, Siggia ED, Swain PS: Stochastic gene expression in a single cell. Science. 2002, 297 (5584): 1183-6.View ArticlePubMedGoogle Scholar
- Kalisky T, Quake SR: Single-cell genomics. Nat Methods. 2011, 8 (4): 311-4.View ArticlePubMedGoogle Scholar
- Cluzel P, Surette M, Leibler S: An ultrasensitive bacterial motor revealed by monitoring signaling proteins in single cells. Science. 2000, 287 (5458): 1652-5.View ArticlePubMedGoogle Scholar
- Chivian D, Brodie EL, Alm EJ, Culley DE, Dehal PS, DeSantis TZ, Gihring TM, Lapidus A, Lin LH, Lowry SR, Moser DP, Richardson PM, Southam G, Wanger G, Pratt LM, Andersen GL, Hazen TC, Brockman FJ, Arkin AP, Onstott TC: Environmental genomics reveals a single-species ecosystem deep within earth. Science. 2008, 322 (5899): 275-278.View ArticlePubMedGoogle Scholar
- Proctor GN: Mathematics of microbial plasmid instability and subsequent differential growth of plasmid-free and plasmid-containing cells, relevant to the analysis of experimental colony number data. Plasmid. 1994, 32 (2): 101-30.View ArticlePubMedGoogle Scholar
- Loy A: Sulfate reduction in peatlands - does a rare keystone microorganism drive a process that mitigates global warming?. Geochimica Et Cosmochimica Acta. 2010, 74 (12): 633-633.Google Scholar
- Backhed F, Manchester JK, Semenkovich CF, Gordon JI: Mechanisms underlying the resistance to diet-induced obesity in germ-free mice. Proc Natl Acad Sci USA. 2007, 104 (3): 979-84.View ArticlePubMedPubMed CentralGoogle Scholar
- Daniel R: The metagenomics of soil. Nature Reviews Microbiology. 2005, 3 (6): 470-478.View ArticlePubMedGoogle Scholar
- Kalyuzhnaya MG, Zabinsky R, Bowerman S, Baker DR, Lidstrom ME, Chistoserdova L: Fluorescence in situ hybridization-flow cytometry-cell sorting-based method for separation and enrichment of type i and type ii methanotroph populations. Appl Environ Microbiol. 2006, 72 (6): 4293-301.View ArticlePubMedPubMed CentralGoogle Scholar
- Li M, Xu J, Romero-Gonzalez M, Banwart SA, Huang WE: Single cell raman spectroscopy for cell sorting and imaging. Curr Opin Biotechnol. 2012, 23 (1): 56-63.View ArticlePubMedGoogle Scholar
- Huang WE, Griffiths RI, Thompson IP, Bailey MJ, Whiteley AS: Raman microscopic analysis of single microbial cells. Analytical Chemistry. 2004, 76 (15): 4452-4458.View ArticlePubMedGoogle Scholar
- Perlaki CM, Liu Q, Lim M: Raman spectroscopy based techniques in tissue engineering-an overview. Applied Spectroscopy Reviews. 2014, 49 (7): 513-532.View ArticleGoogle Scholar
- Boulesteix AL: Ten simple rules for reducing overoptimistic reporting in methodological computational research. PLOS Computional Biology. 2015, 11 (4): 1-6.Google Scholar
- Wang Y, Ji YT, Wharfe ES, Meadows RS, March P, Goodacre R, Xu J, Huang WE: Raman activated cell ejection for isolation of single cells. Analytical Chemistry. 2013, 85 (22): 10697-10701.View ArticlePubMedGoogle Scholar
- Frontiera RR, Henry AI, Gruenke NL, Van Duyne RP: Surface-enhanced femtosecond stimulated raman spectroscopy. Journal of Physical Chemistry Letters. 2011, 2 (10): 1199-1203.View ArticlePubMedGoogle Scholar
- Kandjani AE, Griffin MJ, Ramanathan R, Ippolito SJ, Bhargava SK, Bansal V: A new paradigm for signal processing of raman spectra using a smoothing free algorithm: Coupling continuous wavelet transform with signal removal method. Journal of Raman Spectroscopy. 2013, 44 (4): 608-621.View ArticleGoogle Scholar
- Gaci S: The use of wavelet-based denoising techniques to enhance the first-arrival picking on seismic traces. Ieee Transactions on Geoscience and Remote Sensing. 2014, 52 (8): 4558-4563.View ArticleGoogle Scholar
- Lacaux C, Muller-Gueudin A, Ranta R, Tindel S: Convergence and performance of the peeling wavelet denoising algorithm. Metrika. 2014, 77 (4): 509-537.View ArticleGoogle Scholar
- Bernuy B, Meurens M, Mignolet E, Turu C, Larondelle Y: Determination by fourier transform raman spectroscopy of conjugated linoleic acid in i-2-photoisomerized soybean oil. Journal of Agricultural and Food Chemistry. 2009, 57 (15): 6524-6527.View ArticlePubMedGoogle Scholar
- Liu Z, Abbas A, Jing BY, Gao X: Wavpeak: picking nmr peaks through wavelet-based smoothing and volume-based filtering. Bioinformatics. 2012, 28 (7): 914-20.View ArticlePubMedPubMed CentralGoogle Scholar
- Mazet V, Carteret C, Brie D, Idier J, Humbert B: Background removal from spectra by designing and minimising a non-quadratic cost function. Chemometrics and Intelligent Laboratory Systems. 2005, 76 (2): 121-133.View ArticleGoogle Scholar
- Zhao J, Lui H, McLean DI, Zeng H: Automated autofluorescence background subtraction algorithm for biomedical raman spectroscopy. Applied Spectroscopy. 2007, 61 (11): 1225-1232.View ArticlePubMedGoogle Scholar
- Schulze HG, Foist RB, Okuda K, Ivanov A, Turner RFB: A small-window moving average-based fully automated baseline estimation method for raman spectra. Applied Spectroscopy. 2012, 66 (7): 757-764.View ArticlePubMedGoogle Scholar
- Qin ZJ, Tao ZH, Liu JX, Wang GW: Baseline correction of raman spectrum based on piecewise linear fitting. Spectroscopy and Spectral Analysis. 2013, 33 (2): 383-386.PubMedGoogle Scholar
- Yang J, Zhang GC, Ci XQ, Swenson NG, Cao M, Sha LQ, Li J, Baskin CC, Slik JWF, Lin LX: Functional and phylogenetic assembly in a chinese tropical tree community across size classes, spatial scales and habitats. Functional Ecology. 2014, 28 (2): 520-529.View ArticleGoogle Scholar
- Ichimura T, Chiu LD, Fujita K, Kawata S, Watanabe TM, Yanagida T, Fujita H: Visualizing cell state transition using raman spectroscopy. Plos One. 2014, 9 (1):Google Scholar
- Spiller DG, Wood CD, Rand DA, White MRH: Measurement of single-cell dynamics. Nature. 2010, 465 (7299): 736-745.View ArticlePubMedGoogle Scholar
- Masyuko R, Driscoll CM, Lanni E, Shrout JD, Sweedler JV, Bohn PW: Correlated mass spectrometric and raman imaging of chemically communicating microbial communities. Abstracts of Papers of the American Chemical Society. 2013, 246:Google Scholar
- Dochow S, Krafft C, Neugebauer U, Bocklitz T, Henkel T, Mayer G, Albert J, Popp J: Tumour cell identification by means of raman spectroscopy in combination with optical traps and microfluidic environments. Lab on a Chip. 2011, 11 (8): 1484-1490.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.