subset
- class deimos.subset.MultiSamplePartitions(features, split_on='mz', size=500, tol=2.5e-05)[source]
Generator object that will lazily build and return each partition constructed from multiple samples.
- features
Input feature coordinates and intensities.
- Type:
DataFrameorDataFrame
- split_on
Dimension to partition the data.
- Type:
str
- size
Target partition size.
- Type:
int
- tol
Largest allowed distance between unique split_on observations.
- Type:
float
- map(func, processes=1, **kwargs)[source]
Maps func to each partition, then returns the combined result.
- Parameters:
func (function) – Function to apply to partitions.
processes (int) – Number of parallel processes. If less than 2, a serial mapping is applied.
kwargs – Keyword arguments passed to func.
- Returns:
Combined result of func applied to partitions.
- Return type:
DataFrame
- class deimos.subset.Partitions(features, split_on='mz', size=1000, overlap=0.05)[source]
Generator object that will lazily build and return each partition.
- features
Input feature coordinates and intensities.
- Type:
DataFrame
- split_on
Dimension to partition the data.
- Type:
str
- size
Target partition size.
- Type:
int
- overlap
Amount of overlap between partitions to ameliorate edge effects.
- Type:
float
- map(func, processes=1, **kwargs)[source]
Maps func to each partition, then returns the combined result, accounting for overlap regions.
- Parameters:
func (function) – Function to apply to partitions.
processes (int) – Number of parallel processes. If less than 2, a serial mapping is applied.
kwargs – Keyword arguments passed to func.
- Returns:
Combined result of func applied to partitions.
- Return type:
DataFrame
- zipmap(func, b, processes=1, **kwargs)[source]
Maps func to each partition pair resulting from the zip operation of self and b, then returns the combined result, accounting for overlap regions.
- Parameters:
func (function) – Function to apply to zipped partitions. Must accept and return two
DataFrameinstances.b (
DataFrame) – Input feature coordinates and intensities.processes (int) – Number of parallel processes. If less than 2, a serial mapping is applied.
kwargs – Keyword arguments passed to func.
- Returns:
a, b – Result of func applied to paired partitions.
- Return type:
DataFrame
- deimos.subset.collapse(features, keep=['mz', 'drift_time', 'retention_time'], how=<function sum>)[source]
Collpases input data such that only specified dimensions remain, according to the supplied aggregation function.
- Parameters:
features (
DataFrame) – Input feature coordinates and intensities.keep (str or list) – Dimensions to keep during collapse operation.
how (function or str) – Aggregation function for collapse operation.
- Returns:
Collapsed feature coordinates and aggregated intensities.
- Return type:
DataFrame
- deimos.subset.locate(features, by=['mz', 'drift_time', 'retention_time'], loc=[0, 0, 0], tol=[0, 0, 0], return_index=False)[source]
Given a coordinate and tolerances, return a subset of the data.
- Parameters:
features (
DataFrame) – Input feature coordinates and intensities.by (str or list) – Dimension(s) by which to subset the data.
loc (float or list) – Coordinate location.
tol (float or list) – Tolerance in each dimension.
return_index (bool) – Return boolean index of subset if True.
- Returns:
DataFrame– Subset of feature coordinates and intensities.array– If return_index is True, boolean index of subset elements, i.e. features[index] = subset.
- deimos.subset.locate_asym(features, by=['mz', 'drift_time', 'retention_time'], loc=[0, 0, 0], low=[0, 0, 0], high=[0, 0, 0], relative=[False, False, False], return_index=False)[source]
Given a coordinate and asymmetrical tolerances, return a subset of the data.
- Parameters:
features (
DataFrame) – Input feature coordinates and intensities.by (str or list) – Dimension(s) by which to subset the data.
loc (float or list) – Coordinate location.
low (float or list) – Lower tolerance(s) in each dimension.
high (float or list) – Upper tolerance(s) in each dimension.
relative (bool or list) – Whether to use relative or absolute tolerance per dimension.
return_index (bool) – Return boolean index of subset if True.
- Returns:
DataFrame– Subset of feature coordinates and intensities.array– If return_index is True, boolean index of subset elements, i.e. features[index] = subset.
- deimos.subset.multi_sample_partition(features, split_on='mz', size=500, tol=2.5e-05)[source]
Partitions data along a given dimension. For use with features across multiple samples, e.g. in alignment.
- Parameters:
features (
DataFrameorDataFrame) – Input feature coordinates and intensities.split_on (str) – Dimension to partition the data.
size (int) – Target partition size.
tol (float) – Largest allowed distance between unique split_on observations.
- Returns:
A generator object that will lazily build and return each partition.
- Return type:
- deimos.subset.partition(features, split_on='mz', size=1000, overlap=0.05)[source]
Partitions data along a given dimension.
- Parameters:
features (
DataFrame) – Input feature coordinates and intensities.split_on (str) – Dimension to partition the data.
size (int) – Target partition size.
overlap (float) – Amount of overlap between partitions to ameliorate edge effects.
- Returns:
A generator object that will lazily build and return each partition.
- Return type:
- deimos.subset.slice(features, by=['mz', 'drift_time', 'retention_time'], low=[0, 0, 0], high=[0, 0, 0], return_index=False)[source]
Given a feature coordinate and bounds, return a subset of the data.
- Parameters:
features (
DataFrame) – Input feature coordinates and intensities.by (str or list) – Dimensions(s) by which to subset the data
low (float or list) – Lower bound(s) in each dimension.
high (float or list) – Upper bound(s) in each dimension.
return_index (bool) – Return boolean index of subset if True.
- Returns:
DataFrame– Subset of feature coordinates and intensities.array– If return_index is True, boolean index of subset elements, i.e. features[index] = subset.
- deimos.subset.threshold(features, by='intensity', threshold=0)[source]
Thresholds input
DataFrameusing by keyword, greater than value passed to threshold.- Parameters:
features (
DataFrame) – Input feature coordinates and intensities.by (str) – Variable to threshold by.
threshold (float) – Threshold value.
- Returns:
Thresholded feature coordinates.
- Return type:
DataFrame