Base utilities#

Base utilities for simulation.

pydantic model tidyms2.simulation.base.AbundanceSpec#

Bases: BaseModel

Define the abundance of a chemical species in a sample.

The abundance is computed as follows:

  1. draw a value \(u\) from the uniform distribution \([0, 1]\).

  2. If \(u\) is lower than the prevalence field, set the abundance to zero.

  3. Otherwise, set the abundance to a value sampled from a gaussian distribution.

field mean: pydantic.PositiveFloat = 100.0#

The mean of the gaussian distribution used to sample the abundance.

field prevalence: float = 1.0#

The probability of the species to be present in a sample.

field std: pydantic.PositiveFloat = 0.0#

The standard deviation of the gaussian distribution used to sample the abundance

sample_abundance()#

Get a realization of the abundance.

Return type:

float

pydantic model tidyms2.simulation.base.BaseChemicalSpeciesSpec#

Bases: BaseModel

Define the how the signal generated by a chemical species are computed.

\[x_{j} = c * p_{j} * f + \epsilon\]

Where \(x_{j}\) is the signal intensity for the j-th isotopologue, \(c\) is the abundance of the species that generates the adducts, \(p_{j}\) is the abundance of the j-th isotopologue included in the simulation, \(f\) is the response factor of the instrument and \(epsilon\) is an additive error term.

field abundance: dict[str, AbundanceSpec] | AbundanceSpec = AbundanceSpec(mean=100.0, std=0.0, prevalence=1.0)#

Define the abundance \(c\) of the chemical species that generates the ion. Multiple abundance specifications may be defined for different sample groups. In this case, the corresponding specification will be selected based on the simulated sample group. If the sample groups is not found a ValueError will be raised.

field formula: str [Required]#

The ion formula. Used as a tidyms.chem.Formula argument.

field n_isotopologues: pydantic.PositiveInt = 1#

The number of isotopologues to simulate.

field noise: MeasurementNoiseSpec = MeasurementNoiseSpec(base_snr=None, min_snr=10.0)#

Defines how the additive noise \(epsilon\) is computed

field response: InstrumentResponseSpec = InstrumentResponseSpec(base_response_factor=1.0, max_sensitivity_loss=0.0, sensitivity_decay=0.0, interbatch_variation=1.0)#

Define how the response factor \(f\) is computed

compute_abundance(group=None)#

Compute a realization of the species abundance in the specified group.

Parameters:

group (str | None) – the group name if multiple groups where provided for the abundance specification. If not provided a default group is chosen.

Raises:

ValueError – if the group is not found in the specification

Return type:

float

compute_intensity(group=None, order=0, batch=0)#

Compute a realization of features intensity all isotopologues.

Return type:

list[float]

get_mz()#

Compute the m/z of features in the adduct.

Return type:

list[float]

pydantic model tidyms2.simulation.base.DataAcquisitionSpec#

Bases: BaseModel

Define the acquisition parameters of a simulated.

field grid: MZGridSpec | None = None#

The m/z grid specification. If not specified, a grid is created using features m/z

field int_std: pydantic.NonNegativeFloat = 0.0#

additive noise added to spectral intensity on each scan

field min_int: pydantic.PositiveFloat | None = None#

If specified, elements in a spectrum with intensity values lower than this parameter are removed

field ms_level: pydantic.PositiveInt = 1#

The spectra MS level

field mz_std: pydantic.NonNegativeFloat = 0.0#

Additive noise added to m/z in each scan

field mz_width: pydantic.PositiveFloat = 0.005#

The peak width in the m/z domain. Used only when a grid specification is provided

field n_scans: pydantic.PositiveInt = 100#

The number of scans in the sample

field time_resolution: pydantic.PositiveFloat = 1.0#

The time spacing between scans

pydantic model tidyms2.simulation.base.InstrumentResponseSpec#

Bases: BaseModel

Define the instrument response factor for an adduct.

Computed as the product of the base response factor, the inter-batch effect factor and the sensitivity loss factor.

The default parameters of this specification will generate a response factor without sensitivity loss over time and no additive noise.

Refer to the Simulating data for more details.

field base_response_factor: pydantic.PositiveFloat = 1.0#

The adduct base response when no inter-batch or sensitivity loss effects are present.

field interbatch_variation: float = 1.0#

A factor applied to all samples from the same analytical batch. The interbatch variation factor is random but equal for all observations within a batch. The value for a given batch is samples from a uniform distribution with minimum value equal to this parameter and maximum value equal to 1.0. In the default configuration the inter-batch factor is set to 1.0 always.

field max_sensitivity_loss: float = 0.0#

The maximum sensitive loss in an analytical batch.

field sensitivity_decay: float = 0.0#

The decay parameter for the time-dependent sensitivity loss. We suggest to use values between 0.001 and 1.0, as larger values decays to the maximum sensitivity value too fast. This value should also be selected based on the batch size, as in longer batches it is possible to see the effect of smaller decay values.

compute_response_factor(order, batch)#

Compute the response factor for a specific sample run order and analytical batch.

Parameters:
  • order (int) – the relative run order within a batch.

  • batch (int) – the analytical batch number

Return type:

float

get_sensitivity_loss_factor(order)#

Compute the sensitivity loss factor applied to the base response.

Return type:

float

pydantic model tidyms2.simulation.base.MZGridSpec#

Bases: BaseModel

Define the minimum, maximum and spacing of m/z values in spectra.

field high: pydantic.PositiveFloat = 1200.0#

The maximum m/z value in the grid

field low: pydantic.PositiveFloat = 100.0#

The minimum m/z value in the grid

field size: pydantic.PositiveInt = 10000#

The number of elements in the grid

create()#

Create a m/z grid.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(FloatDtype, bound= floating)]]

pydantic model tidyms2.simulation.base.MeasurementNoiseSpec#

Bases: BaseModel

Define an additive error term added to the measured signal of an adduct.

The noise for a sample is computed as follows:

  1. compute the snr using the base_snr field and the isotopologue abundance as a scaling factor. Set the snr to the maximum value between this value and the min_snr field.

  2. Compute the noise level \(\sigma:\) as the quotient between the signal and the snr.

  3. Sample the noise from a distribution \(~N(0, \sigma)\)

field base_snr: pydantic.PositiveFloat | None = None#

The base snr of the additive noise applied to isotopologue signal

field min_snr: pydantic.PositiveFloat = 10.0#

The minimum snr of the additive noise applied to isotopologues signal. This value allows to set a lower bound on the snr for low intensity features. If the snr parameter is not set, this value is ignored.

compute_snr(pk)#

Compute the snr level for an isotopologue signal.

Parameters:

pk (float) – the isotopologue relative abundance, used to scale the base snr

Return type:

float | None

sample_noise(signal, pk)#

Draw a noise term sample for an isotopologue signal.

Parameters:
  • signal (float) – the observed signal, as the product of the instrument response, the base abundance and the isotopologue relative abundance.

  • pk (float) – the relative abundance of an isotopologue, used to scale the snr for isotopologues with lower signals.

Return type:

float