An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2105-12-405.pdf

An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data

Vu et al. BMC Bioinformatics 2011, 12:405 http://www.biomedcentral.com/1471-2105/12/405 METHODOLOGY ARTICLE Open Access An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data Trung N Vu1,4*, Dirk Valkenborg2,5, Koen Smets1, Kim A Verwaest3, Roger Dommisse3, Filip Lemière3, Alain Verschoren1, Bart Goethals1 and Kris Laukens1,4 Abstract Background: Nuclear magnetic resonance spectroscopy (NMR) is a powerful technique to reveal and compare quantitative metabolic profiles of biological tissues. However, chemical and physical sample variations make the analysis of the data challenging, and typically require the application of a number of preprocessing steps prior to data interpretation. For example, noise reduction, normalization, baseline correction, peak picking, spectrum alignment and statistical analysis are indispensable components in any NMR analysis pipeline. Results: We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data. Conclusions: The workflow performance was evaluated using a previously published dataset. Correlation maps, spectral and grey scale plots show clear improvements in comparison to other methods, and the down-to-earth quantitative analysis works well for the CluPA-aligned spectra. The whole workflow is embedded into a modular and statistically sound framework that is implemented as an R package called “speaq” ("spectrum alignment and quantitation”), which is freely available from http://code.google.com/p/speaq/. Background Nuclear magnetic resonance spectroscopy (NMR) is a powerful and widely applied analytical high-throughput technique to reveal and compare the quantitative metabolic profile of a given tissue in relation to various environmental and clinical parameters. A typical NMR spectrum is composed out of an x-axis, which indicates the resonance frequencies of the observed molecule, and a y-axis, which * Correspondence: 1 Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium Full list of author information is available at the end of the article denotes the corresponding intensities, i.e., abundance. To analyse experimental NMR datasets, multivariate methods such as principle components analysis (PCA) or univariate techniques like Student-t test are commonly applied. However, chemical and physical sample variations due to, among others, differences in pH, temperature, ion content and the concentration of metabolites, make the analysis of the data challenging. To address these challenges, several preprocessing steps are commonly applied, including noise reduction, normalization, baseline correction, peak picking and spectrum alignment, prior to statistical analysis. © 2011 Vu et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Vu et al. BMC Bioinformatics 2011, 12:405 http://www.biomedcentral.com/1471-2105/12/405 A crucial and often depreciated aspect in this process is peak alignment, which aims to compensate for small variations in the position of corresponding peaks between spectra. A number of spectral alignment approaches have previously been proposed. However, most of them come with particular disadvantages. For example, some methods use dynamic programming, like Correlation Optimized Warping (COW) and Dynamic Time Warping (DTW) [1,2]. Due to their computational complexity an alignment task based on these techniques may take hours. Several authors worked towards solutions to speed up this alignment process [3] used a Fast Fourier Transformation (FFT) cross-correlation engine to improve the alignment speed (PAFFT). They also introduced an advanced extension, called recursive peak alignment by FFT (RAFFT), which recursively divides the spectrum into meaningful segments and aligns them until a certain degree of goodness is obtained. Some advanced peak picking approaches are Recursive Segment-wise Peak Alignment (RSPA) [4] and Generalized Fuzzy Hought Transform (GFHT) [5]. Other authors applied search algorithms to peak alignment, such as genetic algorithms in PAGA [6] and beam searching in PABS [7]. Recently [8], introduced the interval-correlation-shifting (Icoshift) algorithm, which aligns spectra by maximizing the cross-correlation between userdefined intervals. Another approach that is commonly employed for the peak alignment of mass spectral data is based on hierarchical clustering and could be applied as well on NMR spectral data [9-14]. Most of these methods apply hierarchical clustering to the entire collection of all peaks from the individual spectra and “cut off” the resulting dendrogram at a suitable height to produce a number of clusters used for alignment. This approach works well on NMR data that is already calibrated to some extent. However, in some datasets, the peak positions of chemical resonances are significantly shifted between the samples. This strong shift could make the NMR spectra unclear to separate, which may lead to the wrong clustering, i.e. alignment, of peaks. The effect of strongly shifted spectra also challenges the methods based on spectral binning, like COW and Icoshift, because peaks could mistakenly be assigned to the wrong bins. To address the problems with misaligned spectra, we first focus on the development of a robust and highly confident alignment algorithm. The method is based on a peak-picking approach for NMR spectra, called hierarchical Cluster-based Peak Alignment (CluPA). The alignment is embedded in a workflow (called speaq: “spectrum alignment and quanti (...truncated)