Institute for Advanced Biosciences Keio University
MathDAMP Mathematica package for differential analysis of metabolite profiles
Home Overview Examples Downloads TriDAMP References Contact
MathDAMP > Examples > 02-MathDAMP-Elements


This notebook introduces the basic functionality of the MathDAMP package. The core features of the package are described and the usage of individual functions is demonstrated with multiple options. For examples of performing common types of differential analysis of metabolite profiles, please refer to the notebooks: 03-MathDAMP-TwoDatasets.nb, 04-MathDAMP-Outliers.nb, 05-MathDAMP-TwoGroups.nb, and 06-MathDAMP-MultipleGroups.nb.

First, the MathDAMP package has to be loaded. Please assign the path leading to the MathDAMP files to the MathDAMPPath variable.

MathDAMPPath = "/home/baran/math/ms/MathDAMP.1.0.0/" ;

<< (MathDAMPPath<>"MathDAMP.m")

MathDAMP version 1.0.0 loaded (2006/04/26)

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Usage reference information for any of the functions from the MathDAMP package can be displayed by executing ? followed by the function's name. For details regarding the implementation of the functions please refer to the MathDAMP.nb notebook.

? DAMPPlotChromatogram

DAMPPlotChromatogram[{msdata,msdata,...},mz,options] plots multiple overlaid chromatograms/e ...  Automatic is specified, the SampleName from each msdata is used as a label (default: Automatic)

Data Import

MathDAMP functions operate on a relatively simple structure of datasets acquired by hyphenated mass spectrometry techniques. Several functions for the import of different data formats were implemented (please refer to the MathDAMP.nb notebook for details). Custom import functions for different data formats may be implemented by converting the data to MathDAMP's internal format (described in the MathDAMP.nb notebook) upon loading.
In this notebook, most of the operations will be performed on two datasets acquired by capillary electrophoresis coupled to a quadrupole mass spectrometer (CE-QMS) operated in a selected ion monitoring (SIM) mode. The two datafiles are among the sample data provided with the MathDAMP package. The data are stored in an Agilent MS format. DAMPImportMS function is used for importing Agilent MS files.

{ctrl, smpl} = DAMPImportMS[MathDAMPPath<>"/data/"<>#] &/@{"", ""} ;

The datasets are represented as lists with 4 elements in Mathematica.



The first element contains a matrix of signal intensities (rows being lists of signal intensities corresponding to individual chromatograms/electropherograms), the second element contains a list of m/z values, the third element contains a list of timepoints (in minutes), and the fourth element contains additional information about the dataset (as a list of rules). The dimensions of the signal intensity matrix are determined by the length of the m/z value list and the length of the list of timepoints.

ListDensityPlot[ctrl〚1〛, MeshFalse, ImageSize750, AspectRatio.4, TextStyleDAMPTextStyle] ;


Take[ctrl〚3〛, 40]//N



{131., 132., 133., 134., 135., 136., 137., 138., 139., 140., 141., 142., 143., 144., 145., 146., 147., 148., 149., 150., 151., 152., 153., 154., 155., 156., 157., 158., 159., 160., 182.}

{0.08745, 0.0919, 0.09635, 0.1008, 0.10525, 0.1097, 0.11415, 0.118617, 0.123067, 0.127517, 0 ... 20967, 0.225417, 0.229867, 0.234317, 0.238767, 0.243217, 0.247667, 0.252117, 0.256567, 0.261033}


The functions DAMPGetIntensities and DAMPGetChromatogram can be used to retrieve the list of signal intensities or the chromatogram/electropherogram corresponding to a specific m/z value. Note that the latter function returns a list of {timepoint,signal intensity} elements.

ListPlot[#[ctrl, 131], FrameTrue, PlotRangeAll, ImageSize750, Aspect ... yleDAMPTextStyle, PlotLabel#] &/@{DAMPGetIntensities, DAMPGetChromatogram} ;



Additional functions for data import include DAMPImportCSV, DAMPImportMZXML, DAMPImportCDF, and DAMPImportBDT. Please execute ?FunctionName or refer to the MathDAMP.nb notebook for details.


Plotting Chromatograms/Electropherograms

DAMPPlotChromatogram plots overlaid chromatograms/electropherograms from a list of datasets (passed as the first parameter) corresponding to a specific m/z value (passed as the second parameter).

DAMPPlotChromatogram[{ctrl, smpl}, 131] ;


The appearance of the plot may be modified by additional options (these can be listed by executing ?DAMPPlotChromatogram).

DAMPPlotChromatogram[{ctrl, smpl}, 131, PlotOptions {AspectRatio.5, ImageSiz ... /3, 1, .7]}, TextStyle {FontFamily"Helvetica", FontSize10}}] ;


Plotting the Datasets on Density Plots

The raw datasets may be plotted on density plots using the DAMPDensityPlot function. The axes represent the retention/migration time and m/z values. The peaks appear as colored bands or spots.

DAMPDensityPlot[ctrl] ;


The appearance of the density plot may be modified by options.

DAMPDensityPlot[ctrl, MaxScale200000, FrameTickFreqs {5, 5}, FrameTickOffset ... rameLabel {"Migration time (min)", "m/z"}, AspectRatio.5}] ;


To list the available options, execute ?DAMPDensityPlot. The plots may be further annotated to allow easier identification of the peaks. Plot annotations will be demonstrated further below.

? DAMPDensityPlot

Dataset cropping and range selection

The functions DAMPCrop, DAMPSelectMZs, and DAMPDropMZs can be used to select parts of interest from the datasets.

? DAMPCrop

? DAMPSelectMZs


DAMPCrop[msdata,options] reduces the msdata dataset to datapoints falling within the timeran ... fault: All)<br />TimeRange - two element list specifying the cropping time range (default: All).

DAMPSelectMZs[msdata,mzs] reduces the msdata datased to datapoints corresponding to m/z values specified in the mzs list.

DAMPDropMZs[msdata,mzs] reduces the msdata datased by eliminating datapoints corresponding to m/z values specified in the mzs list.

DAMPDensityPlot[DAMPCrop[ctrl, mzRange {135, 155}, TimeRange {5, 16}]] ;

DAMPDensityPlot[DAMPSelectMZs[ctrl, {136, 137, 143, 147, 148, 182}]] ;

DAMPDensityPlot[DAMPDropMZs[ctrl, {135, 145, 155}]] ;




Baseline subtraction, noise removal, and smoothing

DAMPSubtractBaselines performs baseline subtraction from all chromatograms/electropherograms in a MathDAMP dataset. Baselines are fitted to a polynomial by robust nonlinear regression. Execute ?DAMPSubtractBaselines for the description of options, or refer to the MathDAMP.nb notebook for details regarding the implementation. The usage reference information for additional functions related to noise removal and smoothing is shown below.

? DAMPSubtractBaselines

? DAMPRemoveNoise

? DAMPRemoveSpikes

? DAMPThreshold

? DAMPSmooth

DAMPSubtractBaselines[ctrl, SampleNameSuffix" baselines subtracted "] ;

DAMPRemoveNoise[%, SampleNameSuffix" noise removed "] ;

DAMPRemoveSpikes[%, SampleNameSuffix" spikes removed "] ;

DAMPThreshold[%, 1500, SampleNameSuffix" threshold 1500 "] ;

DAMPSmooth[ctrl, SmoothingFunction (DAMPMovingAverageFast[#, 20] &), SampleNameSuffix" smoothed "] ;

DAMPDensityPlot[#, MaxScale50000] &/@{ctrl, %%%%%, %%%%, %%%, %%, %} ;







The datasets will be preprocessed for subsequent use by subtracting the baselines and by noise removal (with default options).

{ppctrl, ppsmpl} = DAMPRemoveNoise[DAMPSubtractBaselines[#]] &/@{ctrl, smpl} ;

Peak picking

The DAMPPickPeaks function finds peaks in a MathDAMP dataset. Currently, the peak lists are used for the sole purpose of dataset alignmnet. Given the robustness of the alignment method, the peak lists do not have to contain all peaks from every dataset and may contain erroneously picked peaks as well (noise or jumping baseline related). The requirements on the quality of peak picking results are therefore not high and the algorithm is rather simple. For details regarding the implementation, please refer to the MathDAMP.nb notebook. However, the returned peak lists seem to be quite accurate (as can be seen on the next density plot below).

? DAMPPickPeaks

peaklists = DAMPPickPeaks[#, Threshold2000] &/@{ppctrl, ppsmpl} ;

TableForm[Take[peaklists〚1〛, 5], TableDepth1]


Peaks found in the first 5 electropherograms from the ctrl dataset are listed above. The layouts of picked peaks may be displayed on peak layout plots for comparison or for visual inspection of the alignment quality.

DAMPPlotPeakLayout[peaklists] ;


The peak lists may also be converted to an annotation table format for display on chromatograms or on density plots.

DAMPDensityPlot[ppctrl, MaxScale100000, AnnotationTables {DAMPPeakListToAnnotationTable[peaklists〚1〛]}] ;


Sometimes, a large number of peaks may be erroneously picked from a single chromatogram/electropherogram (noise above the peak-picking threshold) or an overwhelming number of redundant signals is present along the m/z dimension at a certain retention/migration time. The presence of these signals may bias the alignment and it could be desirable to select a representative set of peaks from a peak list. Also, the alignment procedure needs more running time for peak lists with large numbers of peaks. The function DAMPSelectRepresentativePeaks performs the representative peak selection.

? DAMPSelectRepresentativePeaks

DAMPDensityPlot[ppctrl, MaxScale100000, AnnotationTables {DAMPPeakListToAnno ... 〛, PeaksPerChromatogram5, PeaksPerInterval5, IntervalSize.5]]}] ;


Annotation table manipulation

Annotation tables are intended to facilitate easier identification of peaks. The annotation table may be constructed according to an analysis of a mixture of standard compounds. The table is expected to consist of 5 columns: m/z, retention/migration time, short compound name/id, full compound name, and relative position of the text with respect to the label on the density plots (1 - right, 2 - top, 3 - left, 4 - bottom, 1.5 - top right, etc).

annottab = DAMPLoadAnnotationTable[MathDAMPPath<>"/iab_cems_cation.csv"] ;

TableForm[Take[%, 5]]

71.0637 10.351 1 3-Aminopropionitrile monofumarate salt 4
75.0944 7.137 2 1,3-Diaminopropane dihydrochloride 4
76.0425 13.51 3 Glycine 2
76.0537 10.898 4 Semicarbazide hydrochloride 2
76.0793 11.41 5 Isopropanolamine 4

Multiple annotation tables can easily be displayed on a single density plot. Their appearance may be modified as well. The following example shows the above loaded (and unaligned) annotation table along with the labels for picked peaks (shown as gray dots). The alignment of annotation tables to MathDAMP datasets is demonstrated in the next section.

DAMPDensityPlot[ppctrl, MaxScale100000, AnnotationTables {annottab, DAMPPeak ... ; {GrayLevel[.5], AbsolutePointSize[1]}, TextStyle {FontColorGrayLevel[.5]}}}] ;


Dataset alignment

To align two datasets, parameters of a (custom) function describing the time shifts of corresponding peaks in two datasets are optimized. A combination of global optimization and dynamic programming is used for this purpose. The function is then used to rescale the timescale on one of the datasets, interpolate the chromatograms/electropherograms, and timepoints identical to timepoints in the reference datasets are selected. For details regarding the dataset alignment procedure, please refer to the MathDAMP.nb notebook.
DAMPFitShiftFunction performs the parameter optimization for the retention/migration time shift function.

? DAMPFitShiftFunction

fsrslt = DAMPFitShiftFunction[Sequence @@ (DAMPSelectRepresentativePeaks[#, PeaksPerChromato ... eaksPerInterval5, IntervalSize.5] &/@peaklists), GapPenalty {4, .5}]

{Score8.38966, BestFitFunc (1/(1/(0.99447 #1) - 0.00245727/2) &), BestFitPars {α0.99447, γ -0.00245727}}

Only a subset of peaks (selected by the DAMPSelectRepresentativePeaks) was used to achieve faster alignment. A function derived by Reijenga et al. (see the MathDAMP.nb notebook for details) is used by default as a retention/migration time shift function (due to the predominant use of capillary electrophoresis based techniques in the authors' institution - Institute for Advanced Biosciences, Keio University). Any function may be passed to DAMPFitShiftFunction as a retention/migration time shift function as demonstrated below with a second order polynomial.

DAMPFitShiftFunction[Sequence @@ (DAMPSelectRepresentativePeaks[#, PeaksPerChromatogramɳ ... # + a + b # + c #^2&), ShiftFunctionParametersAutomatic, GapPenalty {4, .5}]

{Score8.38822, BestFitFunc (#1 + 0.0140676 - 0.00849169 #1 + 0.00138 ... 1^2&), BestFitPars {a0.0140676, b -0.00849169, c0.00138294}}

The peak layouts may be shown for visual confirmation of the alignment (alignment done with the default time shift function). The layout of peaks prior to the alignment is shown in the previous section.

DAMPPlotPeakLayout[{peaklists〚1〛, DAMPAlignPeakList[peaklists〚2〛, BestFitFunc/.fsrslt]}] ;


After finding the time shift function, the aligned dataset is created using the function DAMPAlign.

? DAMPAlign

alignedsmpl = DAMPAlign[ppsmpl, BestFitFunc/.fsrslt, ctrl〚3〛] ;

DAMPDensityPlot[alignedsmpl] ;


Annotation tables may be aligned in a similar fashion. The step below demonstrates the robustness of the alignment procedure. Even a relatively small number of corresponding peaks is sufficient for finding the optimal alignment. The timeshifts of unaligned annotation labels were quite significant (over 5 min) when the density plots below are compared to the one at the end of the previous section.

? DAMPAlignAnnotationTable

alignedannottab = DAMPAlignAnnotationTable[DAMPSelectRepresentativePeaks[peaklists〚1& ... , PeaksPerChromatogram5, PeaksPerInterval5, IntervalSize.5], annottab] ;

DAMPDensityPlot[#, AnnotationTables {alignedannottab}] &/@{ppctrl, alignedsmpl} ;



Plots of chromatograms/electropherograms may be annotated as well. Full compound names are used instead of short names/ids in this case.

DAMPPlotChromatogram[{ppctrl, alignedsmpl}, #〚1〛, AnnotationTablealign ... #62754; {#〚2〛, All}}] &/@{{182, {12.5, 14}}, {147, {8, 14}}, {133, {8, 14}}} ;




DAMPNormalizeGroup function aligns and normalizes multiple datasets to a selected reference dataset. This function assembles the steps described in this section along with an intensity normalization step described in the next section. The DAMPNormalizeGroup function is used by the functions for common types of differential analysis of metabolite profiles demonstrated in the notebooks 03-MathDAMP-TwoDatasets.nb, 04-MathDAMP-Outliers.nb, 05-MathDAMP-TwoGroups, and 06-MathDAMP-MultipleGroups.nb.

Dataset normalization

Often the datasets' signal intensity values have to be normalized according to the peak of the internal standard. MathDAMP implements very simple peak integration functionality. A specified range of a chromatogram/electropherogram is integrated blindly. When normalizing multiple datasets using the DAMPNormalizeGroup function (mentioned at the end of the previous section), the location of the peak of the internal standard in the reference dataset has to be either specified explicitly or can be extrapolated from the aligned annotation table. In the latter case, only the short name/id of the internal standard is specified and the peak is located automatically. For more details about the DAMPNormalizeGroup function, please refer to the MathDAMP.nb notebook and the 03-MathDAMP-TwoDatasets.nb, 04-MathDAMP-Outliers.nb, 05-MathDAMP-TwoGroups, and 06-MathDAMP-MultipleGroups.nb notebooks.
Signal intensity normalization of the alignedsmpl dataset to the ppctrl dataset according to the area of the Methioninesulfone peak is shown below. The location of the peak is specified explicitly.

DAMPPlotChromatogram[{ppctrl, alignedsmpl}, 182, AnnotationTablealignedannottab, PlotOptions {PlotRange {{12.5, 14}, All}}] ;


? DAMPIntegrate

DAMPIntegrate[DAMPGetChromatogram[#, 182], TimeRange {13.3, 13.8}] &/@{ppctrl, alignedsmpl}

normcoefs = %〚1〛/%

{12793.3, 13437.9}

{1., 0.952029}

? DAMPNormalize

normasmpl = DAMPNormalize[alignedsmpl, normcoefs〚2〛] ;

DAMPPlotChromatogram[{ppctrl, normasmpl}, 182, AnnotationTablealignedannottab, PlotOptions {PlotRange {{12.5, 14}, All}}] ;


Comparing normalized datasets

One way to compare the normalized datasets is to interlace their chromatograms/electropherograms into each other and plot the resulting dataset on a parallel plot. Here, the electropherograms from the datasets corresponding to identical m/z values are plotted next to each other. Differences would appear as half-bands (like for m/z 136 at about 9.5 min or for m/z 137 between 12 and 13 min).

? DAMPParallelPlot

DAMPParallelPlot[{ppctrl, normasmpl}, mzGridLineFreq4] ;


Additionally, simple arithmetic operations may be performed on the signal intensity matrices of the two normalized datasets to highlight differences between them. Subtraction provides a dataset representing the difference in signal intensities. The normasmpl dataset contains identical timepoints to the ctrl dataset (via the DAMPAlign function) so the m/z value list as well as the list of timepoints is taken from the ctrl dataset.

absdif = {normasmpl〚1〛 - ppctrl〚1〛, ppctrl〚2〛, ppc ... pl〚4〛) <>" vs " <> (SampleName/.ppctrl〚4〛)}} ;

DAMPDensityPlot[absdif] ;


The result contains some signals indicating either positive (yellow/red) or negative (cyan/blue) difference between the datasets. Some ambiguous signals (red and blue in close proximity) appear on the result as well. These may be caused by partial misalignments of the corresponding peaks or small relative differences in significant peaks (the small relative difference is significant in absolute terms). For instance, there are two ambiguous signals in the topmost lane (m/z 182) around migration time of 13 min on the density plot above. These correspond to the peaks shown on the last electropherogram in the previous section. The signals on the density plot are due to an imperfect overlap of the corresponding peaks. The ambiguous signals may be ruled out as false positives either after the visual inspection of overlaid chromatograms/electropherograms (which can be generated automatically in a ranked order as described in the next section) or the presence of these signals may be suppressed using different kinds of visualization approaches described below.
In a way similar to the absolute difference, a relative difference between the two datasets can be calculated. In this case, the difference between the corresponding signal intensities is divided by the larger of the two (or an absolute value of the difference between the two signal intensities, if one of them is negative). The signal intensities in the resulting dataset fall within the range -1 to 1.

reldif = {(normasmpl〚1〛 - ppctrl〚1〛)/(Chop[Max[Join[Abs[#], {Abs ... pl〚4〛) <>" vs " <> (SampleName/.ppctrl〚4〛)}} ;

DAMPDensityPlot[reldif, MaxScale1] ;


Tiny peaks (often noise-related which evaded preprocessing) may provide significant signals on the relative result. A signal intensity threshold of 5000 suppresses these influences in the previous result. In spite of this, numerous misleading signals still remain in the result. A simple way to suppress signals originating from small relative changes in huge peaks (scoring high on the absolute difference plot) and significant relative differences in tiny noise-related peaks (scoring high on the relative difference plot) is to multiply the absolute and relative difference results (below). Differences significant in both absolute and relative terms tend to be highlighted. As shown below, ambiguous signals become suppressed and signals coming from actual differences acquire better visibility. This holds even if the threshold for the relative difference is set to 0.

absreldif = {absdif〚1〛 Abs[reldif〚1〛], ppctrl〚2〛,  ... pl〚4〛) <>" vs " <> (SampleName/.ppctrl〚4〛)}} ;

DAMPDensityPlot[absreldif] ;


Availability of replicate datasets allows the application of statistical tests to all corresponding signal intensities. Examples may be found in notebooks 04-MathDAMP-Outliers.nb (looking for outliers within multiple datasets using z-scores and by analyzing quartiles), 05-MathDAMP-TwoGroups (comparison of two groups of replicates includes the t-test), and 06-MathDAMP-MultipleGroups.nb (comparing multiple groups of replicates using F ratio). Noise removal proves not to be necessary when using these approaches. However, the resulting datasets are usually smoothed (by applying a moving average filter) to suppress strong signals originating from 'lucky' constellations of a particular set of corresponding noise related signal intensities (without any strong signals in their neighborhood).
Any resulting datasets may be further combined (in a way similar to the absolute×relative result) or used as a filtration criteria for other results. Below is an example of selecting only those datapoints from the relative difference, where at least one of the corresponding signal intensities in the ctrl and smpl datasets exceeds a threshold (10000). Using the DAMPFilter function may prove especially useful when a result of a statistical test (like the t-test for two groups of replicates) is used as the criteria dataset (to filter the absolute×relative result between the averaged groups for instance).

DAMPDensityPlot[DAMPFilter[reldif, DAMPApplyFunctionToGroup[{ppctrl, normasmpl}, Max], 10000]] ;


The DAMPApplyFunctionToGroup applies a specified pure function to corresponding signal intensities in the group of datasets. Above, the function was used to create a dataset containing maxima from corresponding signal intensities in the ctrl and the smpl datasets (by using the Max function as the specified pure function).

? DAMPApplyFunctionToGroup

? DAMPFilter

Listing the overlaid chromatograms/electropherograms in the vicinities of the most significant differences

For the visual confirmation of significant differences between the datasets (and for the rejection of false positives), overlaid chromatograms/electropherograms are plotted in descending order of significance. Below are the electropherograms of the top 12 differences from the absolute×relative difference result from above. The vertical dashed line indicates the position of the most significant difference according to the selected criteria.

DAMPPlotCandidates[{ppctrl, normasmpl}, absreldif, PlotCount12, PlotChromatogramOptions {AnnotationTablealignedannottab}] ;













This notebook demonstrated the basic core functionality of the MathDAMP package. For a more convenient usage, the core functions are assembled into modules for common types of differential analysis of metabolite profiles. Examples can be found in the additional notebooks (03-MathDAMP-TwoDatasets.nb, 04-MathDAMP-Outliers.nb, 05-MathDAMP-TwoGroups, and 06-MathDAMP-MultipleGroups.nb) from the MathDAMP package.