Data models#
TidyMS core data models.
All models defined here define how data is shared in data pipelines.
Refer to the architecture overview for an introduction to the TidyMS data model.
For considerations on customizing these models, refer to the Extending TidyMS2 guide.
- pydantic model tidyms2.core.models.AnnotableFeature#
Bases:
Feature,ABCAbstract feature class which inherits from Feature.
Provides extra functionality to perform feature annotation. Base feature with also implements methods for feature annotation.
- field annotation: Annotation | None = None#
Annotation data of the feature.
- field id: UUID [Optional]#
A unique id for the model.
- field roi: RoiType [Required]#
The ROI where the feature was detected.
- classmethod descriptor_names(cls)#
Retrieve the available descriptor names.
- Return type:
set[str]- Returns:
the descriptor names.
- classmethod from_str(s, roi, annotation)#
Create a feature instance from a string.
- Parameters:
s (
str) – feature string generated with to_str.roi (
TypeVar(RoiType, bound=Roi)) – ROI where the feature was detected.annotation (
Annotation) – the feature annotation.
- Return type:
Self- Returns:
a new feature instance.
- abstractmethod static compute_isotopic_envelope(*features)#
Compute the isotopic envelope from a list of isotopologue features.
- Parameters:
features – the Collection of features used to compute the envelope.
- Return type:
- Returns:
The normalized isotopic envelope
- abstractmethod compare(other)#
Compare the similarity between two features.
Must be a symmetric function that returns a number between 0.0 and 1.0.
- Parameters:
other – feature to compare with.
- Return type:
float- Returns:
the similarity between the feature pair.
- describe()#
Compute all available descriptors for the feature.
A descriptor is any method that starts with get_.
- Return type:
dict[str,float]- Returns:
a dictionary that maps descriptor names to descriptor values.
- get(descriptor)#
Compute a descriptor value.
- Parameters:
descriptor (
str) – the descriptor name.- Return type:
float- Returns:
the descriptor value.
- Raises:
ValueError – if an invalid descriptor name is passed.
- has_descriptors_in_range(**bounds)#
Check if feature descriptors fall between lower and upper bounds.
- Parameters:
bounds (
tuple[float,float]) – descriptor lower and upper bound values.- Return type:
bool- Returns:
Trueif all descriptors fall between the bounds.Falseotherwise.
- to_str()#
Serialize the feature data into a string.
- Return type:
str- Returns:
a string serialization of the feature.
- property area: float#
The feature area.
- property height: float#
The feature height.
- property mz: float#
The feature m/z.
- pydantic model tidyms2.core.models.Annotation#
Bases:
TidyMSBaseModelStore feature annotation data.
- field charge: int = -1#
Feature charge state. If set to
-1the feature charge state is not defined
- field group: int = -1#
The feature group id. Group features from different samples based on their chemical identity. Used to create a data matrix. If set to
-1the feature is not assigned to any group.
- field id: UUID [Optional]#
A unique id for the model.
- field isotopologue_index: int = -1#
Position of the feature in an isotopic envelope. If set to
-1the feature is not associated with any group of isotopologues in a sample.
- field isotopologue_label: int = -1#
Group features from the same isotopic envelope in a sample. If set to
-1the feature is not associated with any group of isotopologues in a sample.
- field roi_id: UUID [Required]#
The ROI id where the feature was extracted from
- field sample_id: str = ''#
The sample id where the feature was extracted from
- pydantic model tidyms2.core.models.AnnotationPatch#
Bases:
BaseModelStore an annotation patch.
- field field: str [Required]#
The annotation field fo patch.
- field id: UUID [Required]#
The feature id to patch.
- field value: int [Required]#
The new value.
- pydantic model tidyms2.core.models.Chromatogram#
Bases:
TidyMSBaseModelChromatogram representation.
- field id: UUID [Optional]#
A unique id for the model.
- field index: int = -1#
The chromatogram order in a data file
- field int: FloatArray1D [Required]#
The intensity data
- field name: str | None = None#
The chromatogram name in a data file
- field time: FloatArray1D [Required]#
The time data
- pydantic model tidyms2.core.models.DescriptorPatch#
Bases:
BaseModelStore a descriptor patch.
- field descriptor: str [Required]#
The descriptor to patch.
- field id: UUID [Required]#
The feature id to patch.
- field value: float [Required]#
The new descriptor value to apply
- pydantic model tidyms2.core.models.Feature#
Bases:
TidyMSBaseModel,Generic[RoiType]Base class to represent a feature extracted from a ROI.
Feature inherits from pydantic BaseModel, and support most of its functionality. New Feature subclasses are created by inheritance of this class and setting data fields using Pydantic’s standard approach.
There are two field types for features:
- Data fields
contain information to represent the feature. e.g. the start and end position of a chromatographic peak. These fields are represented as standard pydantic fields.
- Descriptors
describe feature characteristics. e.g, the peak width or peak area in a chromatographic peak. ALL descriptors MUST be floats. These fields are represented as pydantic computed fields. Descriptors MUST be decorated with pydantic.computed_field. It is also recommended to use the functools.cached_property decorator to cache the descriptor value. As an example:
from functools import cached_property from pydantic import computed_field class MyFeature(Feature): data_field: float = 1.0 '''A feature data field.''' @computed_field @cached_property def custom_descriptor(self): self.custom_descriptor = 100.0
The mz, area and height descriptors are set as abstract methods and need to be implemented for all concrete Feature classes.
Finally, three attributes are defined for the Feature class: id, roi and annotation. These parameters are managed internally by the library and they MUST never be set directly by the user.
Refer to the developer guides for an example on how to create a new Feature class.
- field annotation: Annotation | None = None#
Annotation data of the feature.
- field id: UUID [Optional]#
A unique id for the model.
- field roi: RoiType [Required]#
The ROI where the feature was detected.
- classmethod descriptor_names(cls)#
Retrieve the available descriptor names.
- Return type:
set[str]- Returns:
the descriptor names.
- classmethod from_str(s, roi, annotation)#
Create a feature instance from a string.
- Parameters:
s (
str) – feature string generated with to_str.roi (
TypeVar(RoiType, bound=Roi)) – ROI where the feature was detected.annotation (
Annotation) – the feature annotation.
- Return type:
Self- Returns:
a new feature instance.
- describe()#
Compute all available descriptors for the feature.
A descriptor is any method that starts with get_.
- Return type:
dict[str,float]- Returns:
a dictionary that maps descriptor names to descriptor values.
- get(descriptor)#
Compute a descriptor value.
- Parameters:
descriptor (
str) – the descriptor name.- Return type:
float- Returns:
the descriptor value.
- Raises:
ValueError – if an invalid descriptor name is passed.
- has_descriptors_in_range(**bounds)#
Check if feature descriptors fall between lower and upper bounds.
- Parameters:
bounds (
tuple[float,float]) – descriptor lower and upper bound values.- Return type:
bool- Returns:
Trueif all descriptors fall between the bounds.Falseotherwise.
- to_str()#
Serialize the feature data into a string.
- Return type:
str- Returns:
a string serialization of the feature.
- property area: float#
The feature area.
- property height: float#
The feature height.
- property mz: float#
The feature m/z.
- pydantic model tidyms2.core.models.FeatureGroup#
Bases:
BaseModelStore feature group information.
- field annotation: GroupAnnotation [Required]#
The feature group annotation.
- field descriptors: dict[str, float] [Required]#
Aggregated feature descriptors.
- field group: int = -1#
the the group id
- has_descriptors_in_range(**filters)#
Check if feature descriptors fall between lower and upper bounds.
- Parameters:
bounds – descriptor lower and upper bound values.
- Return type:
bool- Returns:
Trueif all descriptors fall between the bounds.Falseotherwise.
- property mz: float#
The feature m/z.
- pydantic model tidyms2.core.models.FillValue#
Bases:
BaseModelContainer class that stores values to fill missing entries in the data matrix.
- field feature_group: int [Required]#
The feature group to input the missing value.
- field sample_id: str [Required]#
The id of the sample to input the missing value.
- field value: float [Required]#
The fill value.
- pydantic model tidyms2.core.models.GroupAnnotation#
Bases:
BaseModelStore annotation of a feature group.
- field charge: int | None = None#
The numerical charge of the feature. Only defined if an isotopologue annotation algorithm was applied to the dataset.
- field envelope: IsotopicEnvelope | None = None#
The m/z and abundance values of the envelope members. Only defined if an isotopologue annotation algorithm was applied to the dataset and isotopologue_index is
0.
- field isotopologue_group: int | None = None#
Label shared between isotopologue features. Only defined if an isotopologue annotation algorithm was applied to the dataset.
- field isotopologue_index: int | None = None#
The position of the feature in an envelope. Only defined if an isotopologue annotation algorithm was applied to the dataset.
- field label: int [Required]#
The feature group label. Identifies features across samples that are originated from the same ionic species.
- field name: str | None = None#
An optional name for the feature group.
- pydantic model tidyms2.core.models.IsotopicEnvelope#
Bases:
BaseModelStore m/z and normalized abundance of a set of isotopic envelope.
- field mz: list[float] [Required]#
The envelope sorted m/z
- field p: list[float] [Required]#
The envelope normalized abundance
- pydantic model tidyms2.core.models.MSSpectrum#
Bases:
TidyMSBaseModelRepresentation of a Mass Spectrum.
- field centroid: bool = True#
Set to
Trueif the spectrum was converted to centroid mode.Falseotherwise.
- field id: UUID [Optional]#
A unique id for the model.
- field int: FloatArray1D [Required]#
Spectral intensity
- field ms_level: pydantic.PositiveInt = 1#
MS level of the current spectrum
- field mz: FloatArray1D [Required]#
Sorted m/z data
- field time: pydantic.NonNegativeFloat = 0.0#
Acquisition time of the spectrum
- pydantic model tidyms2.core.models.MZTrace#
Bases:
RoiROI implementation using m/z traces.
An m/z trace is a 1D trace containing m/z, time and intensity information across scans.
- field baseline: FloatArray1D | None = None#
if provided, represent the baseline level at teach time point.
- field id: UUID [Optional]#
A unique id for the model.
- field mz: FloatArray1D [Required]#
m/z in each scan. All values are assumed to be non-negative.
- field noise: FloatArray1D | None = None#
if provided, represent the noise level at teach time point.
- field sample: Sample [Required]#
The sample where the ROI was extracted from.
- field scan: IntArray1D [Required]#
scan numbers where the ROI is defined. All values are assumed to be non-negative.
- field spint: FloatArray1D [Required]#
intensity in each scan. All values are assumed to be non-negative.
- field time: FloatArray1D [Required]#
time in each scan. All values are assumed to be non-negative.
- classmethod from_str(ser, sample)#
Create a ROI instance from a JSON string.
- Parameters:
ser (
str) – a serialized ROI obtained using the to_str methodsample (
Sample) – a sample to associate with the ROI
- Return type:
Self- Returns:
a new ROI instance.
- equals(other)#
Check if two m/z traces are equal.
- Return type:
bool
- get_slice_height(start, end)#
Compute the trace height in a slice.
- Parameters:
start (
int) – slice start index.end (
int) – slice end index.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(FloatDtype, bound=floating)]]
- to_str()#
Serialize a ROI into a string.
- Return type:
str- Returns:
a string serialization of the ROI.
- pydantic model tidyms2.core.models.Roi#
Bases:
TidyMSBaseModelBase class for ROIs extracted from raw MS data.
Roi inherits from pydantic BaseModel, and support most of its functionality. New ROIs subclasses are created by inheritance of this class and setting data fields using Pydantic’s standard approach.
For Numpy array fields, check out the tidyms.utils.numpy.FloatingArray and tidyms.utils.numpy.IntArray types which provide type checking for arrays and efficient serialization/deserialization.
The id field contains a unique identifier for the ROI and is managed internally by the library. It MUST not be set directly by the user.
Refer to the developer guides for an example on how to create a new ROI class.
- field id: UUID [Optional]#
A unique id for the model.
- field sample: Sample [Required]#
The sample where the ROI was extracted from.
- classmethod from_str(ser, sample)#
Create a ROI instance from a JSON string.
- Parameters:
ser (
str) – a serialized ROI obtained using the to_str methodsample (
Sample) – a sample to associate with the ROI
- Return type:
Self- Returns:
a new ROI instance.
- to_str()#
Serialize a ROI into a string.
- Return type:
str- Returns:
a string serialization of the ROI.
- pydantic model tidyms2.core.models.Sample#
Bases:
BaseModelStore information required to load data from a raw data file.
- field end_time: pydantic.NonNegativeFloat | None = None#
Maximum acquisition time of MS scans to include. If
None, end at the last scan
- field id: str [Required]#
A unique sample identifier
- field meta: SampleMetadata = SampleMetadata(type='', group='', order=0)#
Sample metadata.
- field ms_data_mode: MSDataMode = MSDataMode.CENTROID#
the mode in which the sample data is stored.
- field ms_level: pydantic.PositiveInt = 1#
the sample MS level
- field path: Path [Required]#
Path to a raw data file
- field reader: str | None = None#
The name of a registered data reader to read sample data. If
None, the optimal reader is inferred from the file extension.
- field start_time: pydantic.NonNegativeFloat = 0.0#
Minimum acquisition time of MS scans to include. If
None, start from the first scan
- serialize_path(path, _info)#
Serialize path into a string.
- Return type:
str
- pydantic model tidyms2.core.models.SampleMetadata#
Bases:
BaseModelSample metadata container.
- field batch: pydantic.NonNegativeInt = 0#
the sample analytical batch number in an assay.
- field group: str = ''#
the sample group
- field order: pydantic.NonNegativeInt = 0#
the sample measurement order in an assay
- field type: str = ''#
the sample type