deimos

deimos.build_factors(data, dims='detect')[source]

Determine sorted unique elements (factors) for each dimension in data.

Parameters:
  • data (DataFrame) – Feature coordinates and intensities.

  • dims (str or list) – Dimensions to determine factors for. Attempts to autodetect by default.

Returns:

Unique sorted values per dimension.

Return type:

dict of array

deimos.build_index(data, factors)[source]

Construct data index from precomputed factors.

Parameters:
  • data (DataFrame) – Feature coordinates and intensities.

  • factors (dict) – Per-dimension arrays of unique values.

Returns:

Index per dimension.

Return type:

dict of array

deimos.collapse(features, keep=['mz', 'drift_time', 'retention_time'], how=<function sum>)[source]

Collpases input data such that only specified dimensions remain, according to the supplied aggregation function.

Parameters:
  • features (DataFrame) – Input feature coordinates and intensities.

  • keep (str or list) – Dimensions to keep during collapse operation.

  • how (function or str) – Aggregation function for collapse operation.

Returns:

Collapsed feature coordinates and aggregated intensities.

Return type:

DataFrame

deimos.get_accessions(path)[source]

Determines accession fields available in the mzML file.

Parameters:

path (str) – Path to mzML file.

Returns:

Dictionary of accession fields.

Return type:

dict of str

deimos.load(path, key='ms1', columns=None, chunksize=10000000.0, meta=None, accession={}, dtype=<class 'numpy.float32'>)[source]

Loads data from HDF5 or mzML file.

Parameters:
  • path (str or list of str) – Path to input file (or files if HDF5).

  • key (str) – Access this level (group) of the HDF5 container. E.g., “ms1” or “ms2” for MS levels 1 or 2, respectively. HDF5 format only.

  • columns (list) – A list of columns names to return. HDF5 format only.

  • chunksize (int) – Dask partition chunksize. HDF5 format only. Unused when loading single file.

  • meta (dict) – Dictionary of meta data per path. HDF5 format only. Unused when loading single file.

  • accession (dict) – Key-value pairs signaling which features to parse for in the mzML file. mzML format only. See get_accessions() to obtain available values.

  • dtype (data type) – Data type to encode values. mzML format only.

Returns:

Feature coordinates and intensities for the specified level. Pandas is used when loading a single file, Dask for multiple files. Loading an mzML file returns a dictionary with keys per MS level.

Return type:

DataFrame or dict of DataFrame

deimos.locate(features, by=['mz', 'drift_time', 'retention_time'], loc=[0, 0, 0], tol=[0, 0, 0], return_index=False)[source]

Given a coordinate and tolerances, return a subset of the data.

Parameters:
  • features (DataFrame) – Input feature coordinates and intensities.

  • by (str or list) – Dimension(s) by which to subset the data.

  • loc (float or list) – Coordinate location.

  • tol (float or list) – Tolerance in each dimension.

  • return_index (bool) – Return boolean index of subset if True.

Returns:

  • DataFrame – Subset of feature coordinates and intensities.

  • array – If return_index is True, boolean index of subset elements, i.e. features[index] = subset.

deimos.locate_asym(features, by=['mz', 'drift_time', 'retention_time'], loc=[0, 0, 0], low=[0, 0, 0], high=[0, 0, 0], relative=[False, False, False], return_index=False)[source]

Given a coordinate and asymmetrical tolerances, return a subset of the data.

Parameters:
  • features (DataFrame) – Input feature coordinates and intensities.

  • by (str or list) – Dimension(s) by which to subset the data.

  • loc (float or list) – Coordinate location.

  • low (float or list) – Lower tolerance(s) in each dimension.

  • high (float or list) – Upper tolerance(s) in each dimension.

  • relative (bool or list) – Whether to use relative or absolute tolerance per dimension.

  • return_index (bool) – Return boolean index of subset if True.

Returns:

  • DataFrame – Subset of feature coordinates and intensities.

  • array – If return_index is True, boolean index of subset elements, i.e. features[index] = subset.

deimos.multi_sample_partition(features, split_on='mz', size=500, tol=2.5e-05)[source]

Partitions data along a given dimension. For use with features across multiple samples, e.g. in alignment.

Parameters:
  • features (DataFrame or DataFrame) – Input feature coordinates and intensities.

  • split_on (str) – Dimension to partition the data.

  • size (int) – Target partition size.

  • tol (float) – Largest allowed distance between unique split_on observations.

Returns:

A generator object that will lazily build and return each partition.

Return type:

Partitions

deimos.partition(features, split_on='mz', size=1000, overlap=0.05)[source]

Partitions data along a given dimension.

Parameters:
  • features (DataFrame) – Input feature coordinates and intensities.

  • split_on (str) – Dimension to partition the data.

  • size (int) – Target partition size.

  • overlap (float) – Amount of overlap between partitions to ameliorate edge effects.

Returns:

A generator object that will lazily build and return each partition.

Return type:

Partitions

deimos.save(path, data, key='ms1', **kwargs)[source]

Saves DataFrame to HDF5 or MGF container.

Parameters:
  • path (str) – Path to output file.

  • data (DataFrame) – Feature coordinates and intensities to be saved. Precursor m/z and intensities should be paired to MS2 spectra for MGF format.

  • key (str) – Save to this level (group) of the HDF5 container. E.g., “ms1” or “ms2” for MS levels 1 or 2, respectively. HDF5 format only.

  • kwargs – Keyword arguments exposed by to_hdf() or save_mgf().

deimos.slice(features, by=['mz', 'drift_time', 'retention_time'], low=[0, 0, 0], high=[0, 0, 0], return_index=False)[source]

Given a feature coordinate and bounds, return a subset of the data.

Parameters:
  • features (DataFrame) – Input feature coordinates and intensities.

  • by (str or list) – Dimensions(s) by which to subset the data

  • low (float or list) – Lower bound(s) in each dimension.

  • high (float or list) – Upper bound(s) in each dimension.

  • return_index (bool) – Return boolean index of subset if True.

Returns:

  • DataFrame – Subset of feature coordinates and intensities.

  • array – If return_index is True, boolean index of subset elements, i.e. features[index] = subset.

deimos.threshold(features, by='intensity', threshold=0)[source]

Thresholds input DataFrame using by keyword, greater than value passed to threshold.

Parameters:
  • features (DataFrame) – Input feature coordinates and intensities.

  • by (str) – Variable to threshold by.

  • threshold (float) – Threshold value.

Returns:

Thresholded feature coordinates.

Return type:

DataFrame