Institute for Advanced Biosciences Keio University
MathDAMP Mathematica package for differential analysis of metabolite profiles
Home Overview Examples Downloads TriDAMP References Contact
MathDAMP > Examples > 05-MathDAMP-TwoGroups

05-MathDAMP-TwoGroups

This notebook provides a template for the comparison two groups of replicate datasets with the MathDAMP package. First, the respective group's datasets are averaged and compared in a similar way as demonstrated for two datasets in the 03-MathDAMP-TwoDatasets notebook. Additionally, t-test is performed for the groups of corresponding signal intensities from all datasets to locate statistically significant differences.
Additional notebooks from the MathDAMP package (04-MathDAMP-Outliers.nb and 06-MathDAMP-MultipleGroups.nb) provide templates for identifying outliers in a group of datasets and for the comparison of multiple groups of replicate datasets. The notebook 02-MathDAMP-Elements.nb demonstrates the basic functionality of the MathDAMP package.

Step 1 : Loading the Data

First, the MathDAMP package has to be loaded. Please assign the path leading to MathDAMP files to the MathDAMPPath variable. Due to the size of the datasets and results the global variable $HistoryLength is set to 1 to save memory. 1GB of physical memory may be necessary to execute this notebook.

$HistoryLength = 1 ;

MathDAMPPath = "/home/baran/math/ms/MathDAMP.1.0.0/" ;

<< (MathDAMPPath<>"MathDAMP.m")

MathDAMP version 1.0.0 loaded (2006/04/26)

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Datasets acquired by capillary electrophoresis coupled to a time-of-flight mass spectrometer (CE-TOFMS) will be used for the demonstration in this notebook. The *.bdt datafiles were generated using a separate in house software from *.csv datafiles exported by the Analyst QS software (for Agilent TOFMS). The csv data were binned to a 0.02 m/z units resolution along the m/z axis, baselines were subtracted from the individual electropherograms (as by the DAMPSubtractBaselines function with default options), noise was removed (as with the DAMPRemoveNoise function with default options), the data were binned to 1 m/z units resolution and saved in a binary format as *.bdt datafiles.

fnames = FileNames["/home2/baran/data/examples/"<>#<>"/*.bdt"] &/@{"1h", "4h"} ;

{ctrl, smpl} = DAMPImportBDT[#, StringReplace[#, __ ~~ "/" ~~ str__ ~~ "/" ~~ __str]] &/@#&/@fnames ;

NumberForm[MemoryInUse[], DigitBlock3]

301,988,872

Optional : Exploring the data, locating the peak of the internal standard in the reference dataset

Step 2 : Performing the Differential Analysis

The function DAMPTwoGoups performs the comparison of two groups of replicate datasets. DAMPNormalizeGroup function is used internally to align and normalize the datasets along with the annotation tables. Please refer to the MathDAMP.nb notebook for more details about the implementation of functions DAMPNormalizeGroup and DAMPTwoGroups. Execute ?FunctionName to list a brief description of the respective function's available options.

? DAMPNormalizeGroup

? DAMPTwoGroups

Most of the options for the DAMPTwoGroups and DAMPNormalizeGroup functions are specified explicitly in the command below to allow easy editing of the options. The annotation table for the cation mode CE-MS analysis is used. This table was assembled according to the CE-TOFMS analysis of a mixture of standard compounds. Methioninesulfone is used as the internal standard. Its short name (in the annotation table) 363 is passed to the DAMPNormalizeGroup function via the InternalStandard option. The location of the peak of the internal standard will be extrapolated from the aligned annotation table. Overlaid electropherograms of the vicinities of the expected peaks of the internal standard are plotted along with indicators of the beginning and the end of blindly integrated regions for visual confirmation. To specify the location of the peak explicitly, use the notation {mz,{starttime,endtime}} instead of the short name. In this case it would be {182,{14,14.6}} (according to the electropherogram at the end of the optional section).
The first dataset from the ctrl list will be used as the reference dataset (as specified by the options Reference->1).
The DAMPTwoGroups function returns the absolute, relative, and absolute×relative difference between the averages of the datasets of both groups. Additionally, a t-score map is generated by performing a t-test for the groups of corresponding signal intensities. In the case below, only peaks picked in the migration time range 8 - 23 min in the reference sample are used for the alignment. This leads to a better alignment of the peaks in this range at the expense of the alignment of the stack of peaks with migration times around 25 min.

Clear[rslt] ;

rslt = DAMPTwoGroups[ctrl, smpl, NormalizeGroupOptions {Reference1, Alignmen ... None}, ThresholdForRelative0, GroupNamesAutomatic, AbsRelTrendFilter2] ;

[Graphics:HTMLFiles/index_22.gif]

[Graphics:HTMLFiles/index_23.gif]

[Graphics:HTMLFiles/index_24.gif]

[Graphics:HTMLFiles/index_25.gif]

[Graphics:HTMLFiles/index_26.gif]

IS normalization coefficients : {1., 1.10901, 1.02771, 1.04268, 1.16882, 1.18705, 1.15713, 1.24596}

Step 3 : Exploring the Results, Listing the Candidates

The normalized datasets may be explored on density plots (annotated). The averaged normalized datasets from the first group are shown on the plot below.

DAMPDensityPlot[AveragedGroup1/.rslt, MaxScale20000, AnnotationTables (AlignedAnnotationTables/.rslt), Sequence @@ DAMPCETOFMSDensityPlotOptions] ;

[Graphics:HTMLFiles/index_29.gif]

The t-score map or any of the result datasets calculated using the averages of the groups may be plotted to show differences between the groups. Additionally, the result datasets may be combined or used as filters against each other. For example, the absolute×relative difference (of the averages) result may be filtered to allow only those signals for which the t-score is above certain threshold.

filtset = DAMPFilter[AbsRel/.rslt, DAMPSmooth[TScores/.rslt], 5] ;

DAMPDensityPlot[filtset, MaxScale20000, AnnotationTables (AlignedAnnotationTables/.rslt), Sequence @@ DAMPCETOFMSDensityPlotOptions] ;

[Graphics:HTMLFiles/index_32.gif]

Overlaid electropherograms in the vicinities of the most significant differences (according to a particular result) may be plotted in descending order of significance for visual confirmation (and for the rejection of false positives). Below are the electropherograms of the top 50 candidates from the filtered absolute×relative difference result. The vertical dashed line indicates the position of the most significant difference according to the result dataset. (Further below are the top 50 candidates from the t-test result).

plotcolors = Transpose[{DAMPGenColors[2], GroupCounts/.rslt}] ;

DAMPPlotCandidates[NormalizedDatasets/.rslt, filtset, PlotCount50, PlotChromatogramO ... #12315;, LegendDataTranspose[{plotcolors〚All, 1〛, (GroupNames/.rslt)}]}] ;

[Graphics:HTMLFiles/index_35.gif]

[Graphics:HTMLFiles/index_36.gif]

[Graphics:HTMLFiles/index_37.gif]

[Graphics:HTMLFiles/index_38.gif]

[Graphics:HTMLFiles/index_39.gif]

[Graphics:HTMLFiles/index_40.gif]

[Graphics:HTMLFiles/index_41.gif]

[Graphics:HTMLFiles/index_42.gif]

[Graphics:HTMLFiles/index_43.gif]

[Graphics:HTMLFiles/index_44.gif]

[Graphics:HTMLFiles/index_45.gif]

[Graphics:HTMLFiles/index_46.gif]

[Graphics:HTMLFiles/index_47.gif]

[Graphics:HTMLFiles/index_48.gif]

[Graphics:HTMLFiles/index_49.gif]

[Graphics:HTMLFiles/index_50.gif]

[Graphics:HTMLFiles/index_51.gif]

[Graphics:HTMLFiles/index_52.gif]

[Graphics:HTMLFiles/index_53.gif]

[Graphics:HTMLFiles/index_54.gif]

[Graphics:HTMLFiles/index_55.gif]

[Graphics:HTMLFiles/index_56.gif]

[Graphics:HTMLFiles/index_57.gif]

[Graphics:HTMLFiles/index_58.gif]

[Graphics:HTMLFiles/index_59.gif]

[Graphics:HTMLFiles/index_60.gif]

[Graphics:HTMLFiles/index_61.gif]

[Graphics:HTMLFiles/index_62.gif]

[Graphics:HTMLFiles/index_63.gif]

[Graphics:HTMLFiles/index_64.gif]

[Graphics:HTMLFiles/index_65.gif]

[Graphics:HTMLFiles/index_66.gif]

[Graphics:HTMLFiles/index_67.gif]

[Graphics:HTMLFiles/index_68.gif]

[Graphics:HTMLFiles/index_69.gif]

[Graphics:HTMLFiles/index_70.gif]

[Graphics:HTMLFiles/index_71.gif]

[Graphics:HTMLFiles/index_72.gif]

[Graphics:HTMLFiles/index_73.gif]

[Graphics:HTMLFiles/index_74.gif]

[Graphics:HTMLFiles/index_75.gif]

[Graphics:HTMLFiles/index_76.gif]

[Graphics:HTMLFiles/index_77.gif]

[Graphics:HTMLFiles/index_78.gif]

[Graphics:HTMLFiles/index_79.gif]

[Graphics:HTMLFiles/index_80.gif]

[Graphics:HTMLFiles/index_81.gif]

[Graphics:HTMLFiles/index_82.gif]

[Graphics:HTMLFiles/index_83.gif]

[Graphics:HTMLFiles/index_84.gif]

Different results datasets provide different ranking. For example, differences in small peaks usually do not score high in the absolute×relative difference result. On the other hand, presence of an outlying peak (in terms of signal intensities) in one dataset causes the drop of the difference in t-score ranking. Since different results have their strengths and weaknesses, it may prove beneficial to generate the lists of candidates based on multiple results. Below are electropherograms of candidate differences ranked according to the t-test result. Apart from the different ranking, smaller peaks achieve higher ranking in the list below when compared to ranking according to the absolute×relative result. However, the difference for L-Alanine (ranked second according to the absolute×relative result) is not among the first 50 candidates from the t-test result. Also note, that the vertical dashed line indicating the most significant difference often does not correspond to the peak top for the t-test result.

plotcolors = Transpose[{DAMPGenColors[2], GroupCounts/.rslt}] ;

DAMPPlotCandidates[NormalizedDatasets/.rslt, DAMPSmooth[TScores/.rslt], PlotCount50, ... #12315;, LegendDataTranspose[{plotcolors〚All, 1〛, (GroupNames/.rslt)}]}] ;

[Graphics:HTMLFiles/index_87.gif]

[Graphics:HTMLFiles/index_88.gif]

[Graphics:HTMLFiles/index_89.gif]

[Graphics:HTMLFiles/index_90.gif]

[Graphics:HTMLFiles/index_91.gif]

[Graphics:HTMLFiles/index_92.gif]

[Graphics:HTMLFiles/index_93.gif]

[Graphics:HTMLFiles/index_94.gif]

[Graphics:HTMLFiles/index_95.gif]

[Graphics:HTMLFiles/index_96.gif]

[Graphics:HTMLFiles/index_97.gif]

[Graphics:HTMLFiles/index_98.gif]

[Graphics:HTMLFiles/index_99.gif]

[Graphics:HTMLFiles/index_100.gif]

[Graphics:HTMLFiles/index_101.gif]

[Graphics:HTMLFiles/index_102.gif]

[Graphics:HTMLFiles/index_103.gif]

[Graphics:HTMLFiles/index_104.gif]

[Graphics:HTMLFiles/index_105.gif]

[Graphics:HTMLFiles/index_106.gif]

[Graphics:HTMLFiles/index_107.gif]

[Graphics:HTMLFiles/index_108.gif]

[Graphics:HTMLFiles/index_109.gif]

[Graphics:HTMLFiles/index_110.gif]

[Graphics:HTMLFiles/index_111.gif]

[Graphics:HTMLFiles/index_112.gif]

[Graphics:HTMLFiles/index_113.gif]

[Graphics:HTMLFiles/index_114.gif]

[Graphics:HTMLFiles/index_115.gif]

[Graphics:HTMLFiles/index_116.gif]

[Graphics:HTMLFiles/index_117.gif]

[Graphics:HTMLFiles/index_118.gif]

[Graphics:HTMLFiles/index_119.gif]

[Graphics:HTMLFiles/index_120.gif]

[Graphics:HTMLFiles/index_121.gif]

[Graphics:HTMLFiles/index_122.gif]

[Graphics:HTMLFiles/index_123.gif]

[Graphics:HTMLFiles/index_124.gif]

[Graphics:HTMLFiles/index_125.gif]

[Graphics:HTMLFiles/index_126.gif]

[Graphics:HTMLFiles/index_127.gif]

[Graphics:HTMLFiles/index_128.gif]

[Graphics:HTMLFiles/index_129.gif]

[Graphics:HTMLFiles/index_130.gif]

[Graphics:HTMLFiles/index_131.gif]

[Graphics:HTMLFiles/index_132.gif]

[Graphics:HTMLFiles/index_133.gif]

[Graphics:HTMLFiles/index_134.gif]

[Graphics:HTMLFiles/index_135.gif]

[Graphics:HTMLFiles/index_136.gif]