LC-MS#

Utilities to process LC-MS datasets.

pydantic model tidyms2.lcms.LCFeatureMatcher#

Bases: AnnotationPatcher, FeatureCorrespondenceParameters

Perform feature correspondence on LC-MS datasets using a cluster-based approach.

Features are initially grouped by m/z and Rt similarity using DBSCAN. In a second step, these clusters are further processed using a GMM approach, obtaining clusters where each sample contributes with only one sample.

See the Feature Correspondence guide for a detailed description of the algorithm.

field mz_tolerance: Annotated[float] = 0.01#

m/z tolerance used to group close features. Sets the eps parameter in the DBSCAN algorithm.

field rt_tolerance: Annotated[float] = 3.0#

Rt tolerance in seconds used to group close features. Sets the eps parameter in the DBSCAN algorithm.

classmethod from_defaults(instrument, separation, polarity)#

Set the processor default parameters.

:param instrument : the instrument type used in the experimental setup :param separation : the LC platform used in the experimental setup :param polarity : the MS polarity used in the experiment

compute_patches(data)#

Compute annotation patches for feature matching.

Return type:

list[AnnotationPatch]

pydantic model tidyms2.lcms.LCPeakExtractor#

Bases: FeatureExtractor[MZTrace, Peak]

Extract peaks from LC m/z traces.

A complete description of the algorithm used for peak extraction can be found here.

classmethod from_defaults(instrument, separation, polarity)#

Set the processor default parameters.

:param instrument : the instrument type used in the experimental setup :param separation : the LC platform used in the experimental setup :param polarity : the MS polarity used in the experiment

extract_features(roi)#

Detect peaks in an LC trace.

pydantic model tidyms2.lcms.LCTraceBaselineEstimator#

Bases: RoiTransformer[MZTrace, Peak]

Estimate the noise level and baseline in an m/z trace.

The default values for this filter usually produce good results in most LC traces. Do not modify these values unless you know what you are doing. See here for a description of the noise estimation and baseline estimation algorithms.

field min_proba: float = 0.05#

The minimum probability of a signal chunk to be considered as baseline.

field min_slice_size: pydantic.PositiveInt = 200#

The minimum size of a signal slice for local noise estimation. If the signal size is smaller than this value, the noise is estimated using the whole array.

field n_slices: pydantic.PositiveInt = 5#

Number of slices to create. The size of each slice must be greater than min_slice_size.

field robust: bool = True#

If True, use the median absolute deviation as an estimator of the noise standard deviation. If False. use the standard deviation.

field smoothing_strength: pydantic.PositiveFloat | None = 1.0#

If specified, apply a temporary gaussian smoothing to the trace intensity. This step usually improves baseline estimation.

classmethod from_defaults(instrument, separation, polarity)#

Set the processor default parameters.

:param instrument : the instrument type used in the experimental setup :param separation : the LC platform used in the experimental setup :param polarity : the MS polarity used in the experiment

Return type:

Self

transform_roi(roi)#

Add noise and baseline to an LC trace.

pydantic model tidyms2.lcms.LCTraceExtractor#

Bases: RoiExtractor[MZTrace, Peak], MakeRoiParameters

Extracts regions-of-interest (ROI) from raw data represented as m/z traces.

Traces are created by connecting values across consecutive scans based on the closeness in m/z.

Refer to the Processing LC-MS datasets guide for examples on how to use this operator. See the m/z trace extraction in LC-MS for a description of the algorithm used.

See also

lcms.MZTrace : Representation of a ROI using m/z traces.

See also

lcms.MakeRoiParameters : Parameters used by the ROI extraction algorithm

classmethod from_defaults(instrument, separation, polarity)#

Set the processor default parameters.

:param instrument : the instrument type used in the experimental setup :param separation : the LC platform used in the experimental setup :param polarity : the MS polarity used in the experiment

Return type:

Self

extract_rois(sample)#

Apply ROI extraction to a sample with LC data.

Return type:

list[MZTrace]

pydantic model tidyms2.lcms.LCTraceSmoother#

Bases: RoiTransformer[MZTrace, Peak]

Smooth LC traces intensity using a gaussian kernel.

field strength: pydantic.PositiveFloat = 1.0#

The smoothing strength, defined as the standard deviation of the gaussian kernel

classmethod from_defaults(instrument, separation, polarity)#

Set the processor default parameters.

:param instrument : the instrument type used in the experimental setup :param separation : the LC platform used in the experimental setup :param polarity : the MS polarity used in the experiment

Return type:

Self

transform_roi(roi)#

Add noise and baseline to an LC trace.

pydantic model tidyms2.lcms.Peak#

Bases: AnnotableFeature[MZTrace]

Representation of a chromatographic peak.

field apex: pydantic.PositiveInt [Required]#

index in the m/z trace where the apex of the peak is located. Must be smaller than end

field end: pydantic.PositiveInt [Required]#

index in the m/z trace where the peak ends. Start and end used as slices defines the peak region.

field start: pydantic.NonNegativeInt [Required]#

index in the m/z trace where the peak begins. Must be smaller than apex

static compute_isotopic_envelope(*features)#

Compute the isotopic envelope (m/z and abundance) of a list of peaks.

Parameters:

features – the peaks that conform the envelope

Return type:

IsotopicEnvelope

compare(other)#

Compute the similarity between a pair of peaks.

The similarity is defined as the cosine distance between the overlapping region of two peaks.

Return type:

float

property area: float#

The peak area.

property extension: float#

The peak extension, defined as the length of the peak region.

property height: float#

Peak height, defined as the difference between the peak intensity and the peak baseline at the apex.

property mz: float#

The peak m/z, defined as the weighted average of the trace m/z in the peak region.

The trace height is used as weights.

property mz_std: float#

The peak m/z standard deviation.

property rt: float#

Peak retention time, defined as the weighted average of the trace time in the peak region.

The trace height is used as weights.

property rt_end: float#

The peak end time.

property rt_start: float#

The peak start time.

property snr: float#

The peak signal-to-noise ratio.

The SNR is defined as the quotient between the peak height and the noise level at the apex. If the noise level is not available, the SNR is set to nan.

property width: float#

Compute the peak width.

The peak width is defined as the region where the 95 % of the total peak area is distributed.

Returns#

width : positive number.

tidyms2.lcms.create_lcms_assay(id, *, instrument, separation, polarity, annotate_isotopologues=True, on_disk=False, max_workers=1, storage_path=None)#

Create a new Assay instance for LC-MS data.

Parameters:
  • id (str) – the assay name

  • instrument (MSInstrument | str) – the instrument used in the experimental measurements. Used to define operator defaults.

  • separation (SeparationMode | str) – the separation mode used in the experimental measurements. Used to define operator defaults.

  • polarity (Polarity | str) – the instrument polarity. Used to define operator defaults.

  • instrument – the instrument used for the assay. Used to define operator defaults.

  • annotate_isotopologues (bool) – If set to True and isotopologue annotation step is included in the sample pipeline.

  • on_disk (bool) – store assay results on disk to reduce memory consumption. Recommended for large datasets.

  • storage_path (str | None) – path to the DB file to store assay data. Only used if on_disk is set to True.

Return type:

Assay[MZTrace, Peak]