Institute for Advanced Biosciences Keio University
MathDAMP Mathematica package for differential analysis of metabolite profiles
Home Overview Examples Downloads TriDAMP References Contact
MathDAMP > Examples > MathDAMP source > Core Functionality > Dataset alignment

Dataset alignment

A combination of global optimization and dynamic programming (DP) is used for dataset alignment. DP score serves as a measure of goodness of the alignment between two datasets (their peaklists). Parameters of a mathematical function, which is assumed to be able to fit the time shifts (as a function of retention/migration time in one of the samples) of corresponding peaks between two datasets, are optimized to achieve the lowest DP score.
For a discussion on using DP in Mathematica, please refer to [3].

Options[DAMPDPChromatogramScore] = {Global`GapPenalty.5} ; 

DAMPDPChromatogramScore[pl1_, pl2_, opts___] := Module[{mpos, gappenalty}, gappenalt ... enalty, mpos[i, j - 1] + gappenalty] ; mpos[Length[pl1], Length[pl2]] ] 

Options[DAMPDPScore] = {Global`GapPenalty.5} ; 

DAMPDPScore[pls1_, pls2_, opts___] := Module[{gappenalty, itm}, gappenalty = (Global ... &[Select[pls2, #〚1〛itm〚1〛&]]) &/@pls1) ]

DAMPFitShiftFunction optimizes the parameters of a retention/migration time shift function to find the optimum alignment between two peak lists. The parameters are optimized (using the NMinimize function), the timescale on one of the peaklists is modified accordingly, and the goodness of the alignment evaluated using DAMPDPScore. The DAMPDPScore is used as the objective function to be minimized.
The default function to fit the migration time shifts is the one derived by Reijenga [4] for normalizing migration times in capillary electrophoresis. The default choice is influenced by the predominant use of CE-based approaches in the authors' institute (Institute for Advanced Biosciences, Keio University). Any custom migration time shift function can be passed to the DAMPFitShiftFunction via the ShiftFunction option. Parameters for the retention/migration time shift function with lower and upper bounds for the optimization range can be passed via the ShiftFunctionParameters option. If ShiftFunctionParameters is set to Automatic, the parameter names are extracted from the shift function automatically and are assigned no bounds for the optimization range. The peaks to be used for fitting the time shift function may be limited to those (for the peaklist pls1) falling within a specific time range set by the TimeRange option.
The GapPenalty option may hold a list of gap penalty values to be used iteratively for fitting the time shift function. This may be desirable when a small gap penalty value is required for good alignment. If a big gap penalty value is used, noncorresponding peaks from the two peaklists which are close enough to fall within the gap penalty value may negatively affect the alignment. However, if only a small gap penalty value is used and time shifts between the two peaklists are significant, NMinimize may not find the region of convergence to the global minimum. Subproblem scores are assigned to gap penalty values for a wider range of shift function parameters so the objective function does not change (if all scores are assigned the gap penalty values) or may converge to a local minimum (if noncorresponding peaks are within the gap penalty distance). A reasonable approach in cases, where time shifts are significant, appears to be a two-step fitting of the time shift function. First, the fitting is performed with a big gap penalty value to find an approximate alignment. Then, second fitting is performed with a small gap penalty value with initial regions of parameters set to the neighborhood of optimized parameter values from the first fitting.

DAMPAlignPeakList[peaklist_, shiftfunc_] := Module[{}, Transpose[{peaklist〚Al ... 1〛], #〚All, 2〛}] &/@peaklist〚All, 2〛}] ] 

Options[DAMPFitShiftFunction] = {Global`ShiftFunction (1/(1/(Global`α #) + Glob ... mizeOptions {MaxIterations1000}, Global`TimeRange {0, ∞}} 

DAMPFitShiftFunction[pls1_, pls2_, opts___] := Module[{shiftfunc, params, gappenalty, autopa ... FitShiftFunction], Global`NMinimizeOptions/.Options[DAMPFitShiftFunction]]]] ] ]

Annotation table may be converted to a peak list format for displaying it on the peak layout plots and also for purposes of aligning the annotation table onto a reference dataset. There are no default options for the DAMPAlignAnnotationTable, all options passed to the DAMPAlignAnnotationTable are passed further to the DAMPFitShiftFunction, which is used internally.

Options[DAMPAnnotationTableToPeakList] = {Global`Resolution1} 

DAMPAnnotationTableToPeakList[atbl_, opts___] := Module[{resol, tmpannot, mz}, resol ... 2314;1〛mz&]}) &/@Union[tmpannot〚All, 1〛] ] 

DAMPAlignAnnotationTable[peaklist_, atbl_, opts___] := Module[{atpl, shiftfunc, newatbl}, &# ... atbl〚All, 2〛 = shiftfunc[newatbl〚All, 2〛] ; newatbl]

Sample dataset is aligned to a reference dataset by applying the migration time shift function to the timepoints of the sample dataset, interpolating all resulting chromatograms in the sample dataset, and selecting timepoints (from this interpolation) identical to timepoints in the reference dataset. This will lead to datasets having corresponding datapoints. Direct datapoint-by-datapoint arithmetic operation will therefore be possible. Signal intensities are adjusted to compensate for peak broadening/compression and conserve peaks' areas. Due to interpolation and the discrete nature of the data, minor differences in the areas before and after rescaling are observed. These seemed to stay within 2% for our datasets, what could be acceptable for visual approach.

Options[DAMPAlign] = {Global`SampleNameSuffix"a"} 

DAMPAlign[msdata_, shiftfunc_, timepoints_, opts___] := Module[{compensationcoefs, ifunc, rs ... opts}/.Options[DAMPAlign]]} ; On[InterpolatingFunction :: dmval] ; rslt]