Data IO#
Raw data#
- class tidyms2.MSData(src, reader=None, mode=MSDataMode.CENTROID, centroider=None, cache=-1, ms_level=1, start_time=0.0, end_time=None, **kwargs)#
Provide access to raw MS data.
Data is read from disk in a lazy manner and cached in memory.
- Parameters:
src (
Path|Sample) – raw data source. It may be the path to a raw data file or a sample model. If the latter is provided, the path to the data source is fetch from thetidyms2.core.models.Sample.pathfield.reader (
type[Reader] |None) – the Reader to read raw data. IfNone, the reader is inferred using the file extension. If src is a sample model andreaderis defined, the reader will be fetch fetched from the reader registry.mode (
MSDataMode) – the mode in which the data is stored. If src is a sample instance, this parameter is ignored and is fetched from the sample data.centroider (
Optional[Callable[[MSSpectrum],MSSpectrum]]) – a function that takes a spectrum in profile mode and converts it to centroid mode. Only used ifmodeis set to profile mode.cache (
int) – int, default=-1 The maximum cache size, in bytes. The cache will store spectrum data until it surpasses this value. At this point, old entries will be deleted from the cache. If set to``-1``, the cache can grow indefinitely.ms_level (
int) – skip spectra without this MS level when iterating over spectra. If src is a sample instance, this parameter is ignored and is fetched from the sample data.start_time (
float) – skip spectra with time lower than this value when iterating over data. If src is a sample instance, this parameter is ignored and is fetched from the sample data.end_time (
float|None) – skip spectra with time greater than this value when iterating over data. If src is a sample instance, this parameter is ignored and is fetched from the sample data.kwargs – keyword arguments passed to the reader.
- get_chromatogram(index)#
Retrieve a chromatogram by index.
- Return type:
- get_n_chromatograms()#
Retrieve the total number of chromatograms stored in the source.
- Return type:
int
- get_n_spectra()#
Retrieve the total number of spectra stored in the source.
- Return type:
int
- get_spectrum(index)#
Retrieve a spectrum by index.
- Return type:
- using_tmp_config(ms_level=None, start_time=None, end_time=None)#
Context manager that temporarily modifies MS level and scans time range.
- Parameters:
ms_level (
int|None) – temporary value for the MS level. If set toNonethe original value is not modified.start_time (
float|None) – temporary value for the start time. If set toNonethe original value is not modified.end_time (
float|None) – temporary value for the end time. If set toNonethe original value is not modified.
Matrix data readers#
Utilities to read matrix data in a variety of formats.
- exception tidyms2.io.matrix.ProgenesisReaderError#
Exception raised when parsing a progenesis file fails.
- tidyms2.io.matrix.read_progenesis(path)#
Read progenesis CSV data into a data matrix.
- Parameters:
path (
Path) – Path to the Progenesis data file.- Return type:
Storage classes#
Data storage classes.
- class tidyms2.storage.OnMemoryAssayStorage(id, roi_type, feature_type)#
Store assay data in memory.
- add_feature_groups(*feature_groups)#
Add feature groups to the assay.
- Return type:
None
- add_fill_values(*fill_values)#
Add values to fill missing data matrix entries.
- Return type:
None
- add_sample_data(data)#
Add samples to the assay.
- Return type:
None
- create_snapshot(snapshot_id)#
Create a new sample data snapshot.
- Parameters:
snapshot_id – the id for the new snapshot.
- Raises:
RepeatedIdError – if a snapshot with this id already exists.
- Return type:
None
- fetch_annotations(sample_id=None)#
Fetch a copy of the feature annotations.
- Parameters:
sample_id (
str|None) – If provided, only fetch annotations from this sample. By default, fetch annotations from all samples.- Raises:
SampleNotFound – if a sample id that is not in the assay storage is provided.
- Return type:
list[Annotation]
- fetch_descriptors(sample_id=None, descriptors=None)#
Fetch a copy of the feature descriptors.
- Parameters:
sample_id (
str|None) – If provided, only fetch descriptors from this sample. Otherwise, fetch descriptors from all samplesdescriptors (
Optional[Iterable[str]]) – If provided only fetch values from these descriptors. By default, all descriptors are fetched.
- Raises:
SampleNotFound – if a sample id that is not in the assay storage is provided.
InvalidFeatureDescriptor – If an undefined descriptor name for the assay feature type is provided.
- Return type:
dict[str,list[float]]
- fetch_feature_groups()#
Fetch feature groups from the assay.
- Return type:
list[FeatureGroup]
- fetch_features_by_group(group)#
Retrieve all features belonging to a feature group.
- Return type:
list[TypeVar(FeatureType, bound=Feature)]
- fetch_features_by_id(*feature_ids)#
Fetch a feature using its id.
- Return type:
list[TypeVar(FeatureType, bound=Feature)]
- fetch_features_by_sample(sample_id)#
Retrieve all features from a sample.
- Return type:
list[TypeVar(FeatureType, bound=Feature)]
- fetch_fill_values()#
Fetch fill values for missing data matrix entries.
- Return type:
dict[str,dict[int,float]]
- fetch_rois_by_id(*roi_ids)#
Fetch a ROI using its id.
- Return type:
list[TypeVar(RoiType, bound=Roi)]
- fetch_rois_by_sample(sample_id)#
Retrieve ROIs from the storage.
- Return type:
list[TypeVar(RoiType, bound=Roi)]
- fetch_sample_data(sample_id)#
Fetch Samples from the assay using their ids.
- Return type:
OnMemorySampleStorage[TypeVar(RoiType, bound=Roi),TypeVar(FeatureType, bound=Feature)]
- get_feature_type()#
Retrieve the Feature class used.
- Return type:
type[TypeVar(FeatureType, bound=Feature)]
- get_n_features()#
Get the total number of features in the assay.
- Return type:
int
- get_n_rois()#
Get the total number of ROIs in the assay.
- Return type:
int
- get_process_status()#
Retrieve the current process status.
- Return type:
- get_snapshot_id()#
Get the current snapshot id.
- Return type:
str
- has_feature(feature_id)#
Check if a Feature with the provided id is in the storage.
- Return type:
bool
- has_feature_group(feature_group)#
Check if a group with the provided id is in the assay.
- Return type:
bool
- has_roi(roi_id)#
Check if a ROI is in the storage.
- Return type:
bool
- has_sample(sample_id)#
Check if the assay contains a sample with the provided id.
- Return type:
bool
- list_feature_groups()#
List all feature groups in the assay.
- Return type:
list[int]
- list_snapshots()#
List all snapshot ids.
- Return type:
list[str]
- patch_annotations(*patches)#
Update feature annotation values.
- Return type:
None
- patch_descriptors(*patches)#
Update feature descriptor values.
- Return type:
None
- set_process_status(status)#
Set the assay process status.
- Return type:
None
- set_snapshot(snapshot_id=None, reset=False)#
Set snapshot from which the storage will fetch data from.
- Parameters:
snapshot_id (
str|None) – the snapshot to setreset (
bool) – set the selected snapshot as the latest and delete posterior snapshots. Note that the selected snapshot id will be set to head.
- Raises:
SnapshotNotFoundError – if the provided snapshot_id is not in the storage
- Return type:
None
- class tidyms2.storage.OnMemorySampleStorage(sample, roi_type, feature_type, **kwargs)#
Store sample data in memory.
Manages accession to sample and ROIs in O(1) time. Both add features and add ROI operations are atomic and consistent operations.
- add_features(*features)#
Add features to the sample storage.
- Parameters:
features (
TypeVar(FeatureType, bound=Feature)) – the features to be add.- Raises:
RepeatedIdError – if a feature with an existing id is provided.
RoiNotFoundError – if trying to add a feature associated with a ROI not in the storage
- Return type:
None
- add_rois(*rois)#
Add ROIs to the sample storage.
- Parameters:
rois (
TypeVar(RoiType, bound=Roi)) – the rois to be add- Raises:
RepeatedIdError – if a ROI with this id already exists.
- Return type:
None
- create_snapshot(snapshot_id)#
Create a new sample data snapshot.
- Parameters:
snapshot_id (
str) – the id for the new snapshot.- Raises:
RepeatedIdError – if a snapshot with this id already exists.
- Return type:
None
- delete_features(*feature_ids)#
Delete features using their ids.
Non-existing ids are ignored.
- Return type:
None
- delete_rois(*roi_ids)#
Delete ROIs using their ids.
Non-existing ids are ignored.
- Return type:
None
- classmethod from_dict(sample, rois, features, snapshots, states, roi_type, feature_type)#
Create a new instance from sample, Roi and Feature data.
- Parameters:
sample (
Sample) – the sample associated with the datarois (
dict[str,list[TypeVar(RoiType, bound=Roi)]]) – a dictionary that maps snapshot ids to a list of ROIs in the snapshotfeatures (
dict[str,list[TypeVar(FeatureType, bound=Feature)]]) – a dictionary that maps snapshot ids to a list of features in the snapshotsnapshots (
list[str]) – the list of snapshots to createstate_list – a mapping from snapshot id to sample data state
- Return type:
OnMemorySampleStorage[TypeVar(RoiType, bound=Roi),TypeVar(FeatureType, bound=Feature)]
- classmethod from_sample_storage(sample_storage)#
Create a new instance using the provided sample storage.
- Return type:
OnMemorySampleStorage[TypeVar(RoiType, bound=Roi),TypeVar(FeatureType, bound=Feature)]
- get_feature(feature_id)#
Retrieve a feature by id.
- Raises:
FeatureNotFoundError – if the provided feature_id is not in the storage
- Return type:
TypeVar(FeatureType, bound=Feature)
- get_n_features()#
Get the total number of features in the storage.
- Return type:
int
- get_n_rois()#
Get the total number of ROIs in the storage.
- Return type:
int
- get_roi(roi_id)#
Retrieve a ROI by id.
- Raises:
RoiNotFoundError – if the provided roi_id is not in the storage
- Return type:
TypeVar(RoiType, bound=Roi)
- get_snapshot_id()#
Get the current snapshot id.
- Return type:
str
- get_status()#
Get the current process status.
- Return type:
- has_feature(feature_id)#
Check the existence of a feature using its id.
- Return type:
bool
- has_roi(roi_id)#
Check the existence of a ROI with the specified id.
- Return type:
bool
- list_features(roi_id=None)#
List stored features.
- Parameters:
roi_id (
UUID|None) – if provided, only features associated with this ROI are listed- Raises:
RoiNotFoundError – if the provided roi_id is not in the storage
- Return type:
list[TypeVar(FeatureType, bound=Feature)]
- list_snapshots()#
List all snapshots.
- Return type:
list[str]
- set_snapshot(snapshot_id=None, reset=False)#
Set snapshot from which the storage will fetch data from.
- Parameters:
snapshot_id (
str|None) – the snapshot to setreset (
bool) – set the selected snapshot as the latest and delete posterior snapshots. The selected snapshot id will be set to head.
- Raises:
SnapshotNotFoundError – if the provided snapshot_id is not in the storage
- Return type:
None
- set_status(status)#
Set the current process status.
- Return type:
None
- class tidyms2.storage.SQLiteAssayStorage(id, host, roi_type, feature_type)#
Assay storage class for that persists data using a SQLite backend.
- Parameters:
id (
str) – an identifier for the storagehost (
str|None) – the DB host string. If not provided an in-memory database is usedroi_type (
type[TypeVar(RoiType, bound=Roi)]) – the ROI class stored in the DB.feature_type (
type[TypeVar(FeatureType, bound=Feature)]) – the feature class stored in the DB.
- add_feature_groups(*groups)#
Add feature groups data to the assay storage.
- Return type:
None
- add_fill_values(*fill_values)#
Add values to fill missing entries in the data matrix.
- Return type:
None
- add_sample_data(data)#
Add sample data to DB.
- Return type:
None
- create_snapshot(snapshot_id)#
Create a new assay data snapshot.
- Return type:
None
- fetch_annotations(sample_id=None)#
Fetch the feature annotations.
- Return type:
list[Annotation]
- fetch_descriptors(sample_id=None, descriptors=None)#
Fetch the feature descriptors.
- Return type:
dict
- fetch_feature_groups()#
Fetch feature groups stored in the assay.
- Return type:
list[FeatureGroup]
- fetch_features_by_group(group)#
Fetch features using the feature group id.
- Return type:
list[TypeVar(FeatureType, bound=Feature)]
- fetch_features_by_id(*feature_ids)#
Fetch features using their ids.
- Return type:
list[TypeVar(FeatureType, bound=Feature)]
- fetch_features_by_sample(sample_id)#
Fetch features using the sample id.
- Return type:
list[TypeVar(FeatureType, bound=Feature)]
- fetch_fill_values()#
Fetch fill values for data matrix.
- Return type:
dict[str,dict[int,float]]
- fetch_rois_by_id(*roi_ids)#
Fetch ROIs using their ids.
- Parameters:
roi_ids (
UUID) – a list of ROI ids to fetch- Return type:
list[TypeVar(RoiType, bound=Roi)]
- fetch_rois_by_sample(sample_id)#
Fetch all ROIs from a sample.
- Return type:
list[TypeVar(RoiType, bound=Roi)]
- fetch_sample(sample_id)#
Retrieve a sample from the assay.
- Parameters:
sample_id (
str) – the id of the sample to retrieve- Raises:
SampleNotFound – if the provided id is not found in the DB
- Return type:
- fetch_sample_data(sample_id)#
Fetch Samples from the assay using their ids.
- Return type:
OnMemorySampleStorage[TypeVar(RoiType, bound=Roi),TypeVar(FeatureType, bound=Feature)]
- get_feature_type()#
Retrieve the Feature class used.
- Return type:
type[TypeVar(FeatureType, bound=Feature)]
- get_n_features()#
Retrieve the number of features in the assay.
- Return type:
int
- get_n_rois()#
Retrieve the number of ROIs in the assay.
- Return type:
int
- get_n_samples()#
Retrieve the number of samples in the assay.
- Return type:
int
- get_process_status()#
Get the current process status.
- Return type:
- get_snapshot_id()#
Retrieve the current snapshot id.
- Return type:
str
- has_feature(feature_id)#
Check if a feature with the provided id is stored in the DB.
- Parameters:
feature_id (
UUID) – the id of the ROI to check- Return type:
bool
- has_feature_group(feature_group)#
Check if a group with the provided id is in the assay.
- Return type:
bool
- has_roi(roi_id)#
Check if a ROI with the provided id is stored in the DB.
- Parameters:
roi_id (
UUID) – the id of the ROI to check- Return type:
bool
- has_sample(sample_id)#
Check if a sample with the provided id is stored in the DB.
- Parameters:
sample_id (
str) – the id of the sample to check- Return type:
bool
- list_feature_groups()#
List all group ids in the assay.
- Return type:
list[int]
- list_snapshots()#
Retrieve the list of all snapshots.
- Return type:
list[str]
- patch_annotations(*patches)#
Update feature annotation values.
- Return type:
None
- patch_descriptors(*patches)#
Update feature descriptors values.
- Return type:
None
- set_process_status(status)#
Set the new process status.
- Return type:
None
- set_snapshot(snapshot_id=None, reset=False)#
Set assay storage data to specified snapshot.
If
None, fetch data from the latest snapshot.- Return type:
None