Data IO#

Raw data#

class tidyms2.MSData(src, reader=None, mode=MSDataMode.CENTROID, centroider=None, cache=-1, ms_level=1, start_time=0.0, end_time=None, **kwargs)#

Provide access to raw MS data.

Data is read from disk in a lazy manner and cached in memory.

Parameters:
  • src (Path | Sample) – raw data source. It may be the path to a raw data file or a sample model. If the latter is provided, the path to the data source is fetch from the tidyms2.core.models.Sample.path field.

  • reader (type[Reader] | None) – the Reader to read raw data. If None, the reader is inferred using the file extension. If src is a sample model and reader is defined, the reader will be fetch fetched from the reader registry.

  • mode (MSDataMode) – the mode in which the data is stored. If src is a sample instance, this parameter is ignored and is fetched from the sample data.

  • centroider (Optional[Callable[[MSSpectrum], MSSpectrum]]) – a function that takes a spectrum in profile mode and converts it to centroid mode. Only used if mode is set to profile mode.

  • cache (int) – int, default=-1 The maximum cache size, in bytes. The cache will store spectrum data until it surpasses this value. At this point, old entries will be deleted from the cache. If set to``-1``, the cache can grow indefinitely.

  • ms_level (int) – skip spectra without this MS level when iterating over spectra. If src is a sample instance, this parameter is ignored and is fetched from the sample data.

  • start_time (float) – skip spectra with time lower than this value when iterating over data. If src is a sample instance, this parameter is ignored and is fetched from the sample data.

  • end_time (float | None) – skip spectra with time greater than this value when iterating over data. If src is a sample instance, this parameter is ignored and is fetched from the sample data.

  • kwargs – keyword arguments passed to the reader.

get_chromatogram(index)#

Retrieve a chromatogram by index.

Return type:

Chromatogram

get_n_chromatograms()#

Retrieve the total number of chromatograms stored in the source.

Return type:

int

get_n_spectra()#

Retrieve the total number of spectra stored in the source.

Return type:

int

get_sample()#

Retrieve the sample associated with the data.

Return type:

Sample

get_spectrum(index)#

Retrieve a spectrum by index.

Return type:

MSSpectrum

using_tmp_config(ms_level=None, start_time=None, end_time=None)#

Context manager that temporarily modifies MS level and scans time range.

Parameters:
  • ms_level (int | None) – temporary value for the MS level. If set to None the original value is not modified.

  • start_time (float | None) – temporary value for the start time. If set to None the original value is not modified.

  • end_time (float | None) – temporary value for the end time. If set to None the original value is not modified.

Matrix data readers#

Utilities to read matrix data in a variety of formats.

exception tidyms2.io.matrix.ProgenesisReaderError#

Exception raised when parsing a progenesis file fails.

tidyms2.io.matrix.read_progenesis(path)#

Read progenesis CSV data into a data matrix.

Parameters:

path (Path) – Path to the Progenesis data file.

Return type:

DataMatrix

Storage classes#

Data storage classes.

class tidyms2.storage.OnMemoryAssayStorage(id, roi_type, feature_type)#

Store assay data in memory.

add_feature_groups(*feature_groups)#

Add feature groups to the assay.

Return type:

None

add_fill_values(*fill_values)#

Add values to fill missing data matrix entries.

Return type:

None

add_sample_data(data)#

Add samples to the assay.

Return type:

None

create_snapshot(snapshot_id)#

Create a new sample data snapshot.

Parameters:

snapshot_id – the id for the new snapshot.

Raises:

RepeatedIdError – if a snapshot with this id already exists.

Return type:

None

fetch_annotations(sample_id=None)#

Fetch a copy of the feature annotations.

Parameters:

sample_id (str | None) – If provided, only fetch annotations from this sample. By default, fetch annotations from all samples.

Raises:

SampleNotFound – if a sample id that is not in the assay storage is provided.

Return type:

list[Annotation]

fetch_descriptors(sample_id=None, descriptors=None)#

Fetch a copy of the feature descriptors.

Parameters:
  • sample_id (str | None) – If provided, only fetch descriptors from this sample. Otherwise, fetch descriptors from all samples

  • descriptors (Optional[Iterable[str]]) – If provided only fetch values from these descriptors. By default, all descriptors are fetched.

Raises:
  • SampleNotFound – if a sample id that is not in the assay storage is provided.

  • InvalidFeatureDescriptor – If an undefined descriptor name for the assay feature type is provided.

Return type:

dict[str, list[float]]

fetch_feature_groups()#

Fetch feature groups from the assay.

Return type:

list[FeatureGroup]

fetch_features_by_group(group)#

Retrieve all features belonging to a feature group.

Return type:

list[TypeVar(FeatureType, bound= Feature)]

fetch_features_by_id(*feature_ids)#

Fetch a feature using its id.

Return type:

list[TypeVar(FeatureType, bound= Feature)]

fetch_features_by_sample(sample_id)#

Retrieve all features from a sample.

Return type:

list[TypeVar(FeatureType, bound= Feature)]

fetch_fill_values()#

Fetch fill values for missing data matrix entries.

Return type:

dict[str, dict[int, float]]

fetch_rois_by_id(*roi_ids)#

Fetch a ROI using its id.

Return type:

list[TypeVar(RoiType, bound= Roi)]

fetch_rois_by_sample(sample_id)#

Retrieve ROIs from the storage.

Return type:

list[TypeVar(RoiType, bound= Roi)]

fetch_sample(sample_id)#

Fetch Samples from the assay using their ids.

Return type:

Sample

fetch_sample_data(sample_id)#

Fetch Samples from the assay using their ids.

Return type:

OnMemorySampleStorage[TypeVar(RoiType, bound= Roi), TypeVar(FeatureType, bound= Feature)]

get_feature_type()#

Retrieve the Feature class used.

Return type:

type[TypeVar(FeatureType, bound= Feature)]

get_n_features()#

Get the total number of features in the assay.

Return type:

int

get_n_rois()#

Get the total number of ROIs in the assay.

Return type:

int

get_process_status()#

Retrieve the current process status.

Return type:

AssayProcessStatus

get_roi_type()#

Retrieve the ROI class used.

Return type:

type[TypeVar(RoiType, bound= Roi)]

get_snapshot_id()#

Get the current snapshot id.

Return type:

str

has_feature(feature_id)#

Check if a Feature with the provided id is in the storage.

Return type:

bool

has_feature_group(feature_group)#

Check if a group with the provided id is in the assay.

Return type:

bool

has_roi(roi_id)#

Check if a ROI is in the storage.

Return type:

bool

has_sample(sample_id)#

Check if the assay contains a sample with the provided id.

Return type:

bool

list_feature_groups()#

List all feature groups in the assay.

Return type:

list[int]

list_samples()#

Fetch all samples in the assay.

Return type:

list[Sample]

list_snapshots()#

List all snapshot ids.

Return type:

list[str]

patch_annotations(*patches)#

Update feature annotation values.

Return type:

None

patch_descriptors(*patches)#

Update feature descriptor values.

Return type:

None

set_process_status(status)#

Set the assay process status.

Return type:

None

set_snapshot(snapshot_id=None, reset=False)#

Set snapshot from which the storage will fetch data from.

Parameters:
  • snapshot_id (str | None) – the snapshot to set

  • reset (bool) – set the selected snapshot as the latest and delete posterior snapshots. Note that the selected snapshot id will be set to head.

Raises:

SnapshotNotFoundError – if the provided snapshot_id is not in the storage

Return type:

None

class tidyms2.storage.OnMemorySampleStorage(sample, roi_type, feature_type, **kwargs)#

Store sample data in memory.

Manages accession to sample and ROIs in O(1) time. Both add features and add ROI operations are atomic and consistent operations.

add_features(*features)#

Add features to the sample storage.

Parameters:

features (TypeVar(FeatureType, bound= Feature)) – the features to be add.

Raises:
  • RepeatedIdError – if a feature with an existing id is provided.

  • RoiNotFoundError – if trying to add a feature associated with a ROI not in the storage

Return type:

None

add_rois(*rois)#

Add ROIs to the sample storage.

Parameters:

rois (TypeVar(RoiType, bound= Roi)) – the rois to be add

Raises:

RepeatedIdError – if a ROI with this id already exists.

Return type:

None

create_snapshot(snapshot_id)#

Create a new sample data snapshot.

Parameters:

snapshot_id (str) – the id for the new snapshot.

Raises:

RepeatedIdError – if a snapshot with this id already exists.

Return type:

None

delete_features(*feature_ids)#

Delete features using their ids.

Non-existing ids are ignored.

Return type:

None

delete_rois(*roi_ids)#

Delete ROIs using their ids.

Non-existing ids are ignored.

Return type:

None

classmethod from_dict(sample, rois, features, snapshots, states, roi_type, feature_type)#

Create a new instance from sample, Roi and Feature data.

Parameters:
  • sample (Sample) – the sample associated with the data

  • rois (dict[str, list[TypeVar(RoiType, bound= Roi)]]) – a dictionary that maps snapshot ids to a list of ROIs in the snapshot

  • features (dict[str, list[TypeVar(FeatureType, bound= Feature)]]) – a dictionary that maps snapshot ids to a list of features in the snapshot

  • snapshots (list[str]) – the list of snapshots to create

  • state_list – a mapping from snapshot id to sample data state

Return type:

OnMemorySampleStorage[TypeVar(RoiType, bound= Roi), TypeVar(FeatureType, bound= Feature)]

classmethod from_sample_storage(sample_storage)#

Create a new instance using the provided sample storage.

Return type:

OnMemorySampleStorage[TypeVar(RoiType, bound= Roi), TypeVar(FeatureType, bound= Feature)]

get_feature(feature_id)#

Retrieve a feature by id.

Raises:

FeatureNotFoundError – if the provided feature_id is not in the storage

Return type:

TypeVar(FeatureType, bound= Feature)

get_n_features()#

Get the total number of features in the storage.

Return type:

int

get_n_rois()#

Get the total number of ROIs in the storage.

Return type:

int

get_roi(roi_id)#

Retrieve a ROI by id.

Raises:

RoiNotFoundError – if the provided roi_id is not in the storage

Return type:

TypeVar(RoiType, bound= Roi)

get_sample()#

Retrieve the storage sample.

Return type:

Sample

get_snapshot_id()#

Get the current snapshot id.

Return type:

str

get_status()#

Get the current process status.

Return type:

SampleProcessStatus

has_feature(feature_id)#

Check the existence of a feature using its id.

Return type:

bool

has_roi(roi_id)#

Check the existence of a ROI with the specified id.

Return type:

bool

list_features(roi_id=None)#

List stored features.

Parameters:

roi_id (UUID | None) – if provided, only features associated with this ROI are listed

Raises:

RoiNotFoundError – if the provided roi_id is not in the storage

Return type:

list[TypeVar(FeatureType, bound= Feature)]

list_rois()#

List all stored ROIs.

Return type:

list[TypeVar(RoiType, bound= Roi)]

list_snapshots()#

List all snapshots.

Return type:

list[str]

set_snapshot(snapshot_id=None, reset=False)#

Set snapshot from which the storage will fetch data from.

Parameters:
  • snapshot_id (str | None) – the snapshot to set

  • reset (bool) – set the selected snapshot as the latest and delete posterior snapshots. The selected snapshot id will be set to head.

Raises:

SnapshotNotFoundError – if the provided snapshot_id is not in the storage

Return type:

None

set_status(status)#

Set the current process status.

Return type:

None

class tidyms2.storage.SQLiteAssayStorage(id, host, roi_type, feature_type)#

Assay storage class for that persists data using a SQLite backend.

Parameters:
  • id (str) – an identifier for the storage

  • host (str | None) – the DB host string. If not provided an in-memory database is used

  • roi_type (type[TypeVar(RoiType, bound= Roi)]) – the ROI class stored in the DB.

  • feature_type (type[TypeVar(FeatureType, bound= Feature)]) – the feature class stored in the DB.

add_feature_groups(*groups)#

Add feature groups data to the assay storage.

Return type:

None

add_fill_values(*fill_values)#

Add values to fill missing entries in the data matrix.

Return type:

None

add_sample_data(data)#

Add sample data to DB.

Return type:

None

create_snapshot(snapshot_id)#

Create a new assay data snapshot.

Return type:

None

fetch_annotations(sample_id=None)#

Fetch the feature annotations.

Return type:

list[Annotation]

fetch_descriptors(sample_id=None, descriptors=None)#

Fetch the feature descriptors.

Return type:

dict

fetch_feature_groups()#

Fetch feature groups stored in the assay.

Return type:

list[FeatureGroup]

fetch_features_by_group(group)#

Fetch features using the feature group id.

Return type:

list[TypeVar(FeatureType, bound= Feature)]

fetch_features_by_id(*feature_ids)#

Fetch features using their ids.

Return type:

list[TypeVar(FeatureType, bound= Feature)]

fetch_features_by_sample(sample_id)#

Fetch features using the sample id.

Return type:

list[TypeVar(FeatureType, bound= Feature)]

fetch_fill_values()#

Fetch fill values for data matrix.

Return type:

dict[str, dict[int, float]]

fetch_rois_by_id(*roi_ids)#

Fetch ROIs using their ids.

Parameters:

roi_ids (UUID) – a list of ROI ids to fetch

Return type:

list[TypeVar(RoiType, bound= Roi)]

fetch_rois_by_sample(sample_id)#

Fetch all ROIs from a sample.

Return type:

list[TypeVar(RoiType, bound= Roi)]

fetch_sample(sample_id)#

Retrieve a sample from the assay.

Parameters:

sample_id (str) – the id of the sample to retrieve

Raises:

SampleNotFound – if the provided id is not found in the DB

Return type:

Sample

fetch_sample_data(sample_id)#

Fetch Samples from the assay using their ids.

Return type:

OnMemorySampleStorage[TypeVar(RoiType, bound= Roi), TypeVar(FeatureType, bound= Feature)]

get_feature_type()#

Retrieve the Feature class used.

Return type:

type[TypeVar(FeatureType, bound= Feature)]

get_n_features()#

Retrieve the number of features in the assay.

Return type:

int

get_n_rois()#

Retrieve the number of ROIs in the assay.

Return type:

int

get_n_samples()#

Retrieve the number of samples in the assay.

Return type:

int

get_process_status()#

Get the current process status.

Return type:

AssayProcessStatus

get_roi_type()#

Retrieve the ROI class used.

Return type:

type[TypeVar(RoiType, bound= Roi)]

get_snapshot_id()#

Retrieve the current snapshot id.

Return type:

str

has_feature(feature_id)#

Check if a feature with the provided id is stored in the DB.

Parameters:

feature_id (UUID) – the id of the ROI to check

Return type:

bool

has_feature_group(feature_group)#

Check if a group with the provided id is in the assay.

Return type:

bool

has_roi(roi_id)#

Check if a ROI with the provided id is stored in the DB.

Parameters:

roi_id (UUID) – the id of the ROI to check

Return type:

bool

has_sample(sample_id)#

Check if a sample with the provided id is stored in the DB.

Parameters:

sample_id (str) – the id of the sample to check

Return type:

bool

list_feature_groups()#

List all group ids in the assay.

Return type:

list[int]

list_samples()#

List samples in the assay.

Return type:

list[Sample]

list_snapshots()#

Retrieve the list of all snapshots.

Return type:

list[str]

patch_annotations(*patches)#

Update feature annotation values.

Return type:

None

patch_descriptors(*patches)#

Update feature descriptors values.

Return type:

None

set_process_status(status)#

Set the new process status.

Return type:

None

set_snapshot(snapshot_id=None, reset=False)#

Set assay storage data to specified snapshot.

If None, fetch data from the latest snapshot.

Return type:

None