Institute for Advanced Biosciences Keio University
MathDAMP Mathematica package for differential analysis of metabolite profiles
Home Overview Examples Downloads TriDAMP References Contact
MathDAMP > Examples > 06-MathDAMP-MultipleGroups


This notebook provides a template for the comparison of multiple groups of replicate datasets with the MathDAMP package. F ratio (one-way ANOVA) is calculated for corresponding signal intensities from the normalized datasets to locate the differences among the groups.
Additional notebooks from the MathDAMP package (03-MathDAMP-TwoDatasets.nb, 04-MathDAMP-Outliers.nb, and 05-MathDAMP-TwoGroups.nb) provide templates for the comparison of two datasets, for the identification of outliers in a group of datasets, and for the comparison of two groups of replicate datasets. The notebook 02-MathDAMP-Elements.nb demonstrates the basic functionality of the MathDAMP package.

Step 1 : Loading the Data

First, the MathDAMP package has to be loaded. Please assign the path leading to MathDAMP files to the MathDAMPPath variable. Due to the size of the datasets and results the global variable $HistoryLength is set to 1 to save memory.  2GB of physical memory may be necessary to execute this notebook.

$HistoryLength = 1 ;

MathDAMPPath = "/home/baran/math/ms/MathDAMP.1.0.0/" ;

<< (MathDAMPPath<>"MathDAMP.m")

MathDAMP version 1.0.0 loaded (2006/04/26)

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Datasets acquired by capillary electrophoresis coupled to a time-of-flight mass spectrometer (CE-TOFMS) will be used for the demonstration in this notebook. The *.bdt datafiles were generated using a separate in house software from *.csv datafiles exported by the Analyst QS software (for Agilent TOFMS). The *.csv data were binned to a 0.02 m/z units resolution along the m/z axis, baselines were subtracted from individual electropherograms (as with the DAMPSubtractBaselines function with default options), noise was removed (as with the DAMPRemoveNoise function with default options), the data were binned to 1 m/z units resolution and saved in a binary format as *.bdt datafiles.

fnames = FileNames[#<>"/*.bdt"] &/@Join[{"/home/baran/data/tissue/control"}, FileNames["/home/baran/data/tissue/*h"]] ;

fnames = fnames〚 {1, 3, 6, 2, 4} 〛

replicates = If[Equal @@ (Length[#] &/@fnames), Length[fnames〚1〛], Print[S ... rror: Number of replicates not indentical within groups!", FontColorHue[0]]] ; 0] ;

data = DAMPImportBDT[#, StringReplace[#, __ ~~ "/" ~~ str__ ~~ "/" ~~ __str]] &/@Join @@ fnames ;

NumberForm[MemoryInUse[], DigitBlock3]


Optional : Exploring the data, locating the peak of the internal standard in the reference dataset

Step 2 : Performing the Differential Analysis

The function DAMPMultiGoups performs the comparison of multiple groups of replicate datasets. DAMPNormalizeGroup function is used internally to align and normalize the datasets along with the annotation tables. Please refer to the MathDAMP.nb notebook for more details about the implementation of functions DAMPNormalizeGroup and DAMPMultiGroups. Execute ?FunctionName to list a brief description of the respective function's available options.

? DAMPNormalizeGroup

? DAMPMultiGroups

Most of the options for the DAMPMultiGroups and DAMPNormalizeGroup are specified explicitly in the command below to allow easy editing of the options. The annotation table for the anion mode CE-MS analysis is used. This table was assembled according to the CE-TOFMS analysis of a mixture of standard compounds. 2-Morpholinoethanesulfonic acid is used as the internal standard. Its short name (in the annotation table) 1 is passed to the DAMPNormalizeGroup function via the InternalStandard option. The location of the peak of the internal standard will be extrapolated from the aligned annotation table. Overlaid electropherograms of the vicinities of the expected peaks of the internal standard are plotted along with indicators of the beginning and the end of blindly integrated regions for visual confirmation. To specify the location of the peak explicitly, use the notation {mz,{starttime,endtime}} instead of the short name. In this case it would be {194,{18.3,18.9}} (according to the electropherogram at the end of the optional section).
The first dataset from the data list will be used as the reference dataset (as specified by the options Reference->1).
The DAMPMultiGroups function returns the F ratio map by calculating the F ratio (one-way ANOVA) for corresponding groups of signal intensities.

Clear[rslt] ;

rslt = DAMPMultiGroups[data, replicates, NormalizeGroupOptions {Reference1,  ...  {4, 26}, ExternalNormalizationCoefficientsNone}, GroupNamesAutomatic] ;






IS normalization coefficients : {1., 0.289182, 1.31722, 1.60302, 1.21884, 1.34691, 1.25213,  ... 44326, 1.40677, 1.51483, 1.34428, 1.43056, 1.27646, 1.33602, 1.12665, 1.16868, 1.09453, 1.24563}

Step 3 : Exploring the Results, Listing the Candidates

The individual normalized datasest can be explored by plotting them on density plots along with the aligned annotation. The resulting F ratio map may be plotted as well to show peaks  which differ significantly among the groups.

DAMPDensityPlot[(NormalizedDatasets/.rslt) 〚1〛, MaxScale20000, AnnotationTables (AlignedAnnotationTables/.rslt), Sequence @@ DAMPCETOFMSDensityPlotOptions] ;

DAMPDensityPlot[(NormalizedDatasets/.rslt) 〚7〛, MaxScale20000, AnnotationTables (AlignedAnnotationTables/.rslt), Sequence @@ DAMPCETOFMSDensityPlotOptions] ;

DAMPDensityPlot[DAMPSmooth[FRatios/.rslt], MaxScale100, AnnotationTables (AlignedAnnotationTables/.rslt), Sequence @@ DAMPCETOFMSDensityPlotOptions] ;




Overlaid electropherograms in the vicinities of the most significant differences may be plotted in descending order of significance for visual confirmation (and for the rejection of false positives). Below are the electropherograms of the top 24 candidate differences from the F ratio result. The vertical dashed line indicates the position of the most significant difference according to the selected criteria.

plotcolors = DAMPGenColors[Length[fnames]] ;

DAMPPlotCandidates[(NormalizedDatasets/.rslt), DAMPSmooth[FRatios/.rslt], PlotCount2 ... oupNames/.rslt)}], AnnotationTable (AlignedAnnotationTables/.rslt) 〚1〛}] ;