io

deimos.io._load_hdf(path, level='ms1')[source]

Deprecated version. Loads data frame from HDF5 container.

Parameters:
  • path (str) – Path to input HDF5 file.

  • level (str) – Access this level (group) of the HDF5 container. E.g., “ms1” or “ms2” for MS levels 1 or 2, respectively.

Returns:

Feature coordinates and intensities for the specified level.

Return type:

DataFrame

deimos.io._save_hdf(path, data, dtype={}, compression_level=5)[source]

Deprecated version. Saves dictionary of DataFrame to HDF5 container.

Parameters:
  • path (str) – Path to output file.

  • data (dict of DataFrame) – Dictionary of feature coordinates and intensities to be saved. Dictionary keys are saved as “groups” (e.g., MS level) and data frame columns are saved as “datasets” in the HDF5 container.

  • dtype (dict) – Specifies what data type to save each column, provided as column:dtype pairs. Defaults to 32-bit float if unspecified.

  • compression_level (int) – A value from 0-9 signaling the number of compression operations to apply. Higher values result in greater compression at the expense of computational overhead.

deimos.io.build_factors(data, dims='detect')[source]

Determine sorted unique elements (factors) for each dimension in data.

Parameters:
  • data (DataFrame) – Feature coordinates and intensities.

  • dims (str or list) – Dimensions to determine factors for. Attempts to autodetect by default.

Returns:

Unique sorted values per dimension.

Return type:

dict of array

deimos.io.build_index(data, factors)[source]

Construct data index from precomputed factors.

Parameters:
  • data (DataFrame) – Feature coordinates and intensities.

  • factors (dict) – Per-dimension arrays of unique values.

Returns:

Index per dimension.

Return type:

dict of array

deimos.io.get_accessions(path)[source]

Determines accession fields available in the mzML file.

Parameters:

path (str) – Path to mzML file.

Returns:

Dictionary of accession fields.

Return type:

dict of str

deimos.io.load(path, key='ms1', columns=None, chunksize=10000000.0, meta=None, accession={}, dtype=<class 'numpy.float32'>)[source]

Loads data from HDF5 or mzML file.

Parameters:
  • path (str or list of str) – Path to input file (or files if HDF5).

  • key (str) – Access this level (group) of the HDF5 container. E.g., “ms1” or “ms2” for MS levels 1 or 2, respectively. HDF5 format only.

  • columns (list) – A list of columns names to return. HDF5 format only.

  • chunksize (int) – Dask partition chunksize. HDF5 format only. Unused when loading single file.

  • meta (dict) – Dictionary of meta data per path. HDF5 format only. Unused when loading single file.

  • accession (dict) – Key-value pairs signaling which features to parse for in the mzML file. mzML format only. See get_accessions() to obtain available values.

  • dtype (data type) – Data type to encode values. mzML format only.

Returns:

Feature coordinates and intensities for the specified level. Pandas is used when loading a single file, Dask for multiple files. Loading an mzML file returns a dictionary with keys per MS level.

Return type:

DataFrame or dict of DataFrame

deimos.io.load_hdf(path, key='ms1', columns=None, chunksize=10000000.0, meta=None)[source]

Loads data frame from HDF5 container(s).

Parameters:
  • path (str or list of str) – Path to input HDF5 file or files.

  • key (str) – Access this level (group) of the HDF5 container. E.g., “ms1” or “ms2” for MS levels 1 or 2, respectively.

  • columns (list) – A list of columns names to return.

  • chunksize (int) – Dask partition chunksize. Unused when loading single file.

  • meta (dict) – Dictionary of meta data per path. Unused when loading single file.

Returns:

Feature coordinates and intensities for the specified level. Pandas is used when loading a single file, Dask for multiple files.

Return type:

DataFrame

deimos.io.load_hdf_multi(paths, key='ms1', columns=None, chunksize=10000000.0, meta=None)[source]

Loads data frame from HDF5 containers using Dask. Appends column to indicate source filenames.

Parameters:
  • paths (list of str) – Paths to input HDF5 files.

  • key (str) – Access this level (group) of the HDF5 container. E.g., “ms1” or “ms2” for MS levels 1 or 2, respectively.

  • columns (list) – A list of columns names to return.

  • chunksize (int) – Dask partition chunksize.

  • meta (dict) – Dictionary of meta data per path.

Returns:

Feature coordinates and intensities for the specified level.

Return type:

DataFrame

deimos.io.load_hdf_single(path, key='ms1', columns=None)[source]

Loads data frame from HDF5 container.

Parameters:
  • path (str) – Path to input HDF5 file.

  • key (str) – Access this level (group) of the HDF5 container. E.g., “ms1” or “ms2” for MS levels 1 or 2, respectively.

  • columns (list) – A list of columns names to return.

Returns:

Feature coordinates and intensities for the specified level.

Return type:

DataFrame

deimos.io.load_mzml(path, accession={}, dtype=<class 'numpy.float32'>)[source]

Loads in an mzML file, parsing for accession values, to yield a DataFrame.

Parameters:
  • path (str) – Path to input mzML file.

  • accession (dict) – Key-value pairs signaling which features to parse for in the mzML file. See get_accessions() to obtain available values. Scan, frame, m/z, and intensity are parsed by default.

  • dtype (data type) – Data type to encode values.

Returns:

Dictionary containing parsed feature coordinates and intensities, indexed by keys per MS level.

Return type:

dict of DataFrame

deimos.io.save(path, data, key='ms1', **kwargs)[source]

Saves DataFrame to HDF5 or MGF container.

Parameters:
  • path (str) – Path to output file.

  • data (DataFrame) – Feature coordinates and intensities to be saved. Precursor m/z and intensities should be paired to MS2 spectra for MGF format.

  • key (str) – Save to this level (group) of the HDF5 container. E.g., “ms1” or “ms2” for MS levels 1 or 2, respectively. HDF5 format only.

  • kwargs – Keyword arguments exposed by to_hdf() or save_mgf().

deimos.io.save_hdf(path, data, key='ms1', complevel=5, **kwargs)[source]

Saves DataFrame to HDF5 container.

Parameters:
  • path (str) – Path to output file.

  • data (DataFrame) – Feature coordinates and intensities to be saved.

  • key (str) – Save to this level (group) of the HDF5 container. E.g., “ms1” or “ms2” for MS levels 1 or 2, respectively.

  • kwargs – Keyword arguments exposed by to_hdf().

deimos.io.save_mgf(path, features, groupby='index_ms1', precursor_mz='mz_ms1', fragment_mz='mz_ms2', fragment_intensity='intensity_ms2', precursor_metadata=None, sample_metadata=None)[source]

Saves data to MGF format.

Parameters:
  • path (str) – Path to output file.

  • features (DataFrame) – Precursor m/z and intensities paired to MS2 spectra.

  • groupby (str or list of str) – Column(s) to group fragments by.

  • precursor_mz (str) – Column containing precursor m/z values.

  • fragment_mz (str) – Column containing fragment m/z values.

  • fragment_intensity (str) – Column containing fragment intensity values.

  • precursor_metadata (dict) – Precursor metadata key:value pairs of {MGF entry name}:{column name}.

  • sample_metadata (dict) – Sample metadata key:value pairs of {MGF entry name}:{value}.

deimos.io.save_msp(path, features, groupby='index_ms1', precursor_mz='mz_ms1', fragment_mz='mz_ms2', fragment_intensity='intensity_ms2', precursor_metadata=None, sample_metadata=None)[source]

Saves data to MSP format.

Parameters:
  • path (str) – Path to output file.

  • features (DataFrame) – Precursor m/z and intensities paired to MS2 spectra.

  • groupby (str or list of str) – Column(s) to group fragments by.

  • precursor_mz (str) – Column containing precursor m/z values.

  • fragment_mz (str) – Column containing fragment m/z values.

  • fragment_intensity (str) – Column containing fragment intensity values.

  • precursor_metadata (dict) – Precursor metadata key:value pairs of {MSP entry name}:{column name}.

  • sample_metadata (dict) – Sample metadata key:value pairs of {MSP entry name}:{value}.