Data models#

TidyMS core data models.

All models defined here define how data is shared in data pipelines.

Refer to the architecture overview for an introduction to the TidyMS data model.

For considerations on customizing these models, refer to the Extending TidyMS2 guide.

pydantic model tidyms2.core.models.AnnotableFeature#

Bases: Feature, ABC

Abstract feature class which inherits from Feature.

Provides extra functionality to perform feature annotation. Base feature with also implements methods for feature annotation.

field annotation: Annotation | None = None#

Annotation data of the feature.

field id: UUID [Optional]#

A unique id for the model.

field roi: RoiType [Required]#

The ROI where the feature was detected.

classmethod descriptor_names(cls)#

Retrieve the available descriptor names.

Return type:

set[str]

Returns:

the descriptor names.

classmethod from_str(s, roi, annotation)#

Create a feature instance from a string.

Parameters:
  • s (str) – feature string generated with to_str.

  • roi (TypeVar(RoiType, bound= Roi)) – ROI where the feature was detected.

  • annotation (Annotation) – the feature annotation.

Return type:

Self

Returns:

a new feature instance.

abstractmethod static compute_isotopic_envelope(*features)#

Compute the isotopic envelope from a list of isotopologue features.

Parameters:

features – the Collection of features used to compute the envelope.

Return type:

IsotopicEnvelope

Returns:

The normalized isotopic envelope

abstractmethod compare(other)#

Compare the similarity between two features.

Must be a symmetric function that returns a number between 0.0 and 1.0.

Parameters:

other – feature to compare with.

Return type:

float

Returns:

the similarity between the feature pair.

describe()#

Compute all available descriptors for the feature.

A descriptor is any method that starts with get_.

Return type:

dict[str, float]

Returns:

a dictionary that maps descriptor names to descriptor values.

get(descriptor)#

Compute a descriptor value.

Parameters:

descriptor (str) – the descriptor name.

Return type:

float

Returns:

the descriptor value.

Raises:

ValueError – if an invalid descriptor name is passed.

has_descriptors_in_range(**bounds)#

Check if feature descriptors fall between lower and upper bounds.

Parameters:

bounds (tuple[float, float]) – descriptor lower and upper bound values.

Return type:

bool

Returns:

True if all descriptors fall between the bounds. False otherwise.

to_str()#

Serialize the feature data into a string.

Return type:

str

Returns:

a string serialization of the feature.

property area: float#

The feature area.

property height: float#

The feature height.

property mz: float#

The feature m/z.

pydantic model tidyms2.core.models.Annotation#

Bases: TidyMSBaseModel

Store feature annotation data.

field charge: int = -1#

Feature charge state. If set to -1 the feature charge state is not defined

field group: int = -1#

The feature group id. Group features from different samples based on their chemical identity. Used to create a data matrix. If set to -1 the feature is not assigned to any group.

field id: UUID [Optional]#

A unique id for the model.

field isotopologue_index: int = -1#

Position of the feature in an isotopic envelope. If set to -1 the feature is not associated with any group of isotopologues in a sample.

field isotopologue_label: int = -1#

Group features from the same isotopic envelope in a sample. If set to -1 the feature is not associated with any group of isotopologues in a sample.

field roi_id: UUID [Required]#

The ROI id where the feature was extracted from

field sample_id: str = ''#

The sample id where the feature was extracted from

pydantic model tidyms2.core.models.AnnotationPatch#

Bases: BaseModel

Store an annotation patch.

field field: str [Required]#

The annotation field fo patch.

field id: UUID [Required]#

The feature id to patch.

field value: int [Required]#

The new value.

pydantic model tidyms2.core.models.Chromatogram#

Bases: TidyMSBaseModel

Chromatogram representation.

field id: UUID [Optional]#

A unique id for the model.

field index: int = -1#

The chromatogram order in a data file

field int: FloatArray1D [Required]#

The intensity data

field name: str | None = None#

The chromatogram name in a data file

field time: FloatArray1D [Required]#

The time data

pydantic model tidyms2.core.models.DescriptorPatch#

Bases: BaseModel

Store a descriptor patch.

field descriptor: str [Required]#

The descriptor to patch.

field id: UUID [Required]#

The feature id to patch.

field value: float [Required]#

The new descriptor value to apply

pydantic model tidyms2.core.models.Feature#

Bases: TidyMSBaseModel, Generic[RoiType]

Base class to represent a feature extracted from a ROI.

Feature inherits from pydantic BaseModel, and support most of its functionality. New Feature subclasses are created by inheritance of this class and setting data fields using Pydantic’s standard approach.

There are two field types for features:

Data fields

contain information to represent the feature. e.g. the start and end position of a chromatographic peak. These fields are represented as standard pydantic fields.

Descriptors

describe feature characteristics. e.g, the peak width or peak area in a chromatographic peak. ALL descriptors MUST be floats. These fields are represented as pydantic computed fields. Descriptors MUST be decorated with pydantic.computed_field. It is also recommended to use the functools.cached_property decorator to cache the descriptor value. As an example:

from functools import cached_property
from pydantic import computed_field

class MyFeature(Feature):

    data_field: float = 1.0
    '''A feature data field.'''

    @computed_field
    @cached_property
    def custom_descriptor(self):
        self.custom_descriptor = 100.0

The mz, area and height descriptors are set as abstract methods and need to be implemented for all concrete Feature classes.

Finally, three attributes are defined for the Feature class: id, roi and annotation. These parameters are managed internally by the library and they MUST never be set directly by the user.

Refer to the developer guides for an example on how to create a new Feature class.

field annotation: Annotation | None = None#

Annotation data of the feature.

field id: UUID [Optional]#

A unique id for the model.

field roi: RoiType [Required]#

The ROI where the feature was detected.

classmethod descriptor_names(cls)#

Retrieve the available descriptor names.

Return type:

set[str]

Returns:

the descriptor names.

classmethod from_str(s, roi, annotation)#

Create a feature instance from a string.

Parameters:
  • s (str) – feature string generated with to_str.

  • roi (TypeVar(RoiType, bound= Roi)) – ROI where the feature was detected.

  • annotation (Annotation) – the feature annotation.

Return type:

Self

Returns:

a new feature instance.

describe()#

Compute all available descriptors for the feature.

A descriptor is any method that starts with get_.

Return type:

dict[str, float]

Returns:

a dictionary that maps descriptor names to descriptor values.

get(descriptor)#

Compute a descriptor value.

Parameters:

descriptor (str) – the descriptor name.

Return type:

float

Returns:

the descriptor value.

Raises:

ValueError – if an invalid descriptor name is passed.

has_descriptors_in_range(**bounds)#

Check if feature descriptors fall between lower and upper bounds.

Parameters:

bounds (tuple[float, float]) – descriptor lower and upper bound values.

Return type:

bool

Returns:

True if all descriptors fall between the bounds. False otherwise.

to_str()#

Serialize the feature data into a string.

Return type:

str

Returns:

a string serialization of the feature.

property area: float#

The feature area.

property height: float#

The feature height.

property mz: float#

The feature m/z.

pydantic model tidyms2.core.models.FeatureGroup#

Bases: BaseModel

Store feature group information.

field annotation: GroupAnnotation [Required]#

The feature group annotation.

field descriptors: dict[str, float] [Required]#

Aggregated feature descriptors.

field group: int = -1#

the the group id

has_descriptors_in_range(**filters)#

Check if feature descriptors fall between lower and upper bounds.

Parameters:

bounds – descriptor lower and upper bound values.

Return type:

bool

Returns:

True if all descriptors fall between the bounds. False otherwise.

property mz: float#

The feature m/z.

pydantic model tidyms2.core.models.FillValue#

Bases: BaseModel

Container class that stores values to fill missing entries in the data matrix.

field feature_group: int [Required]#

The feature group to input the missing value.

field sample_id: str [Required]#

The id of the sample to input the missing value.

field value: float [Required]#

The fill value.

pydantic model tidyms2.core.models.GroupAnnotation#

Bases: BaseModel

Store annotation of a feature group.

field charge: int | None = None#

The numerical charge of the feature. Only defined if an isotopologue annotation algorithm was applied to the dataset.

field envelope: IsotopicEnvelope | None = None#

The m/z and abundance values of the envelope members. Only defined if an isotopologue annotation algorithm was applied to the dataset and isotopologue_index is 0.

field isotopologue_group: int | None = None#

Label shared between isotopologue features. Only defined if an isotopologue annotation algorithm was applied to the dataset.

field isotopologue_index: int | None = None#

The position of the feature in an envelope. Only defined if an isotopologue annotation algorithm was applied to the dataset.

field label: int [Required]#

The feature group label. Identifies features across samples that are originated from the same ionic species.

field name: str | None = None#

An optional name for the feature group.

pydantic model tidyms2.core.models.IsotopicEnvelope#

Bases: BaseModel

Store m/z and normalized abundance of a set of isotopic envelope.

field mz: list[float] [Required]#

The envelope sorted m/z

field p: list[float] [Required]#

The envelope normalized abundance

pydantic model tidyms2.core.models.MSSpectrum#

Bases: TidyMSBaseModel

Representation of a Mass Spectrum.

field centroid: bool = True#

Set to True if the spectrum was converted to centroid mode. False otherwise.

field id: UUID [Optional]#

A unique id for the model.

field int: FloatArray1D [Required]#

Spectral intensity

field ms_level: pydantic.PositiveInt = 1#

MS level of the current spectrum

field mz: FloatArray1D [Required]#

Sorted m/z data

field time: pydantic.NonNegativeFloat = 0.0#

Acquisition time of the spectrum

get_nbytes()#

Get the number of bytes stored in m/z and intensity arrays.

Return type:

int

pydantic model tidyms2.core.models.MZTrace#

Bases: Roi

ROI implementation using m/z traces.

An m/z trace is a 1D trace containing m/z, time and intensity information across scans.

field baseline: FloatArray1D | None = None#

if provided, represent the baseline level at teach time point.

field id: UUID [Optional]#

A unique id for the model.

field mz: FloatArray1D [Required]#

m/z in each scan. All values are assumed to be non-negative.

field noise: FloatArray1D | None = None#

if provided, represent the noise level at teach time point.

field sample: Sample [Required]#

The sample where the ROI was extracted from.

field scan: IntArray1D [Required]#

scan numbers where the ROI is defined. All values are assumed to be non-negative.

field spint: FloatArray1D [Required]#

intensity in each scan. All values are assumed to be non-negative.

field time: FloatArray1D [Required]#

time in each scan. All values are assumed to be non-negative.

classmethod from_str(ser, sample)#

Create a ROI instance from a JSON string.

Parameters:
  • ser (str) – a serialized ROI obtained using the to_str method

  • sample (Sample) – a sample to associate with the ROI

Return type:

Self

Returns:

a new ROI instance.

equals(other)#

Check if two m/z traces are equal.

Return type:

bool

get_slice_height(start, end)#

Compute the trace height in a slice.

Parameters:
  • start (int) – slice start index.

  • end (int) – slice end index.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(FloatDtype, bound= floating)]]

to_str()#

Serialize a ROI into a string.

Return type:

str

Returns:

a string serialization of the ROI.

pydantic model tidyms2.core.models.Roi#

Bases: TidyMSBaseModel

Base class for ROIs extracted from raw MS data.

Roi inherits from pydantic BaseModel, and support most of its functionality. New ROIs subclasses are created by inheritance of this class and setting data fields using Pydantic’s standard approach.

For Numpy array fields, check out the tidyms.utils.numpy.FloatingArray and tidyms.utils.numpy.IntArray types which provide type checking for arrays and efficient serialization/deserialization.

The id field contains a unique identifier for the ROI and is managed internally by the library. It MUST not be set directly by the user.

Refer to the developer guides for an example on how to create a new ROI class.

field id: UUID [Optional]#

A unique id for the model.

field sample: Sample [Required]#

The sample where the ROI was extracted from.

classmethod from_str(ser, sample)#

Create a ROI instance from a JSON string.

Parameters:
  • ser (str) – a serialized ROI obtained using the to_str method

  • sample (Sample) – a sample to associate with the ROI

Return type:

Self

Returns:

a new ROI instance.

to_str()#

Serialize a ROI into a string.

Return type:

str

Returns:

a string serialization of the ROI.

pydantic model tidyms2.core.models.Sample#

Bases: BaseModel

Store information required to load data from a raw data file.

field end_time: pydantic.NonNegativeFloat | None = None#

Maximum acquisition time of MS scans to include. If None, end at the last scan

field id: str [Required]#

A unique sample identifier

field meta: SampleMetadata = SampleMetadata(type='', group='', order=0)#

Sample metadata.

field ms_data_mode: MSDataMode = MSDataMode.CENTROID#

the mode in which the sample data is stored.

field ms_level: pydantic.PositiveInt = 1#

the sample MS level

field path: Path [Required]#

Path to a raw data file

field reader: str | None = None#

The name of a registered data reader to read sample data. If None, the optimal reader is inferred from the file extension.

field start_time: pydantic.NonNegativeFloat = 0.0#

Minimum acquisition time of MS scans to include. If None, start from the first scan

serialize_path(path, _info)#

Serialize path into a string.

Return type:

str

pydantic model tidyms2.core.models.SampleMetadata#

Bases: BaseModel

Sample metadata container.

field batch: pydantic.NonNegativeInt = 0#

the sample analytical batch number in an assay.

field group: str = ''#

the sample group

field order: pydantic.NonNegativeInt = 0#

the sample measurement order in an assay

field type: str = ''#

the sample type

pydantic model tidyms2.core.models.TidyMSBaseModel#

Bases: BaseModel

Base model that all other library models inherit from.

field id: UUID [Optional]#

A unique id for the model.