m/z trace extraction in LC-MS#
This guide describes theoretical background of ROI extraction algorithm for LC-MS datasets,
implemented by the LCTraceExtractor operator.
A ROI is region extracted from raw sample data that may contain features. In LC-MS datasets, a ROI is an m/z trace. m/z traces are similar to chromatograms but with two differences: information related to the m/z value used in each scan is included and the traces are defined in a time window where m/z values were detected.
(Source code, png, hires.png, pdf)
A m/z trace is comprises by three arrays storing m/z, time and intensity.#
In TidyMS2, ROI extraction is done by using an approach similar to the one described by Tautenhahn et al in [1], but with some modifications. m/z traces are created and extended connecting close m/z values across successive scans using the following method:
The m/z values in The first scan are used to initialize a list of ROI. If
targeted_mzis used, the ROI are initialized using this list.m/z values from the next scan extend the ROIs if they are closer than
toleranceto the mean m/z of the ROI. Values that don’t match any ROI are used to create new ROIs and are appended to the ROI list. Iftargeted_mzis used, these values are discarded.If more than one m/z value is within the tolerance threshold, m/z and intensity values are computed according to the
multiple_matchstrategy. Two strategies are available: merge multiple peaks into an average peak or use only the closest peak to extend the ROI and create new ROIs with the others.If a ROI can’t be extended with any m/z value from the new scan, it is extended using NaNs.
If there are more than
max_missingconsecutive NaN in a ROI, then the ROI is flagged as completed. If the maximum intensity of a completed ROI is greater thanmin_intensityand the number of points is greater thanmin_length, then the ROI is flagged as valid. Otherwise, the ROI is discarded.Repeat from step 2 until no more new scans are available.