MS2 Extraction

With MS1 features of interest determined by peak detection, corresponding tandem mass spectra, if available, must be extracted and assigned to the MS1 parent ion feature. For data independent acquisition, we use non-m/z dimensions to assign fragments; for instance, drift time and retention time are used to match fragments in LC-IMS-MS/MS. These additional separations enable better attribution of MS2 ions to parent ions, a form of deconvolution inherent in the acquisition, but convolution artifacts can still occur.

Explicit, algorithmic deconvolution has been implemented such that MS1 and MS2 features overlapping in non-m/z separation dimensions are disambiguated to minimize false assignments. In this form of deconvolution, the profile(s) of non-m/z separation dimensions are used to identify only those ions in the MS2 with distributions that correspond to the precursor ion distribution. This technique simultaneously excludes MS2 ions arising from noise or chemical background, while also attributing MS2 ions only to precursor ions with similar separation distributions.

Drift Time Offset

In these data, the mass analyzer connected to the drift tube acts in itself like a miniature ion mobility separation aparatus, wherein separations are a function of both the m/z of the ion as well as the voltage applied in the collision cell. As such, when an ion in the MS1 experiences a collsion energy of 0 eV, while its fragment ions experience higher collision energies (in this example, 20 eV), their reported drift times will be “offset” proportional to the collision energy difference and neutral loss. That is, the MS2 ions, under higher voltage, will travel faster than the corresponding MS1 ions, and thus will have a lower reported drift time.

To demonstrate, we’ll visualize the drift time dimension of a selected precursor and its putative MS2 fragments.

[1]:

import deimos
import matplotlib.pyplot as plt
import numpy as np

[2]:

ms1 = deimos.load('example_data.h5', key='ms1', columns=['mz', 'drift_time', 'retention_time', 'intensity'])
ms2 = deimos.load('example_data.h5', key='ms2', columns=['mz', 'drift_time', 'retention_time', 'intensity'])

[3]:

ms1 = deimos.threshold(ms1, threshold=100)
ms2 = deimos.threshold(ms2, threshold=100)

[4]:

# get maximal data point
mz_i, dt_i, rt_i, intensity_i = ms1.loc[ms1['intensity'] == ms1['intensity'].max(), :].values[0]

# subset the raw data
precursor = deimos.slice(ms1,
                       by=['mz', 'drift_time', 'retention_time'],
                       low=[mz_i - 0.1, dt_i - 1, rt_i - 1],
                       high=[mz_i + 0.2, dt_i + 1, rt_i + 2])

# putative fragments
fragment_profile = deimos.slice(ms2,
                                by=['drift_time', 'retention_time'],
                                low=[dt_i - 1.5, rt_i - 1],
                                high=[dt_i + 1, rt_i + 2])

[5]:

fragment_dt = deimos.collapse(fragment_profile, keep='drift_time').sort_values(by='drift_time')
precursor_dt = deimos.collapse(precursor, keep='drift_time').sort_values(by='drift_time')

[6]:

fig, ax = plt.subplots(1, dpi=150, facecolor='w')

ax.fill_between(precursor_dt['drift_time'],
                precursor_dt['intensity'] / precursor_dt['intensity'].max(),
                color='C0', alpha=0.4, label='Precursor')

ax.fill_between(fragment_dt['drift_time'],
                fragment_dt['intensity'] / fragment_dt['intensity'].max(),
                color='C3', alpha=0.4, label='Fragments')

ax.set_xlabel('Drift Time', fontweight='bold')
ax.set_ylabel('Normalized Intensity', fontweight='bold')
ax.set_ylim(0, None)
ax.legend()

plt.tight_layout()
plt.show()

../_images/user_guide_ms2_extraction_11_0.png

Fortunately, fragment offset correction is handled within the DEIMoS’s deconvolution module. The model to correct for offset was built from TuneMix data acquired on both our IMS instrument and an Orbitrap. Orbitrap MS1:MS2 pairs were used as ground truth, matching drift time information from the same MS1:MS2 pairs collected with IMS. Ordinary least squares (OLS) regression was used to relate model inputs to predicted values. The following model is supplied as default, but may be swapped out as needed.

[7]:

def offset_correction_model(dt_ms2, mz_ms2, mz_ms1, ce=0,
                            params=[1.02067031, -0.02062323,  0.00176694]):
    # Cast params as array
    params = np.array(params).reshape(-1, 1)

    # Convert collision energy to array
    ce = np.ones_like(dt_ms2) * np.log(ce)

    # Create constant vector
    const = np.ones_like(dt_ms2)

    # Sqrt
    mu_ms1 = np.sqrt(mz_ms1)
    mu_ms2 = np.sqrt(mz_ms2)

    # Ratio
    mu_ratio = mu_ms2 / mu_ms1

    # Create dependent array
    x = np.stack((const, mu_ratio, ce), axis=1)

    # Predict
    y = np.dot(x, params).flatten() * dt_ms2

    return y

Deconvolution

[8]:

ms1_peaks = deimos.load('example_data_peaks.h5', key='ms1')
ms2_peaks = deimos.load('example_data_peaks.h5', key='ms2')

For computational efficiency, we will only consider MS1 features of high intensity (>1E4) and MS2 features at least 10% of the MS1 intensity (>1E3).

[9]:

ms1_peaks = deimos.threshold(ms1_peaks, threshold=1E4)
ms2_peaks = deimos.threshold(ms2_peaks, threshold=1E3)

First, we’ll instantiate the MS2Deconvolution class.

[10]:

decon = deimos.deconvolution.MS2Deconvolution(ms1_peaks, ms1, ms2_peaks, ms2)

Next, putatively match MS1 and MS2 features to determine groupings in non-m/z dimensions (i.e. drift and retention time) for deconvolution. The follwoing will match MS2 features to MS1 with offset between -0.12 ms and +1.4 ms in drift time and ±0.1 minutes in retention time. The offset correction model is supplied here (we are using the pre-trained default). The flag require_ms1_greater_than_ms1 ensures that MS1 intensity must exceed MS2 intensity for putatively matched pairs. Finally, error_tolerance signals the amount of acceptable error after the offset correction model is applied.

[11]:

decon.construct_putative_pairs(dims=['drift_time', 'retention_time'],
                               low=[-0.12, -0.1], high=[1.4, 0.1], ce=20,
                               model=offset_correction_model,
                               require_ms1_greater_than_ms2=True,
                               error_tolerance=0.12)

[11]:

	index_ms1	mz_ms1	drift_time_ms1	retention_time_ms1	intensity_ms1	persistence_ms1	index_ms2	mz_ms2	drift_time_ms2	retention_time_ms2	intensity_ms2	persistence_ms2	drift_time_raw_ms2	drift_time_error
0	0	144.101166	14.674946	7.258184	34796.207031	506.324219	11	144.101120	14.702256	7.238883	17817.498047	38.572266	14.657956	0.027310
1	0	144.101166	14.674946	7.258184	34796.207031	506.324219	13	144.102509	14.702866	7.238883	17105.628906	142.082031	14.658571	0.027920
2	0	144.101166	14.674946	7.258184	34796.207031	506.324219	14	144.102448	14.721763	7.277484	18313.193359	1513.085938	14.677596	0.046817
3	0	144.101166	14.674946	7.258184	34796.207031	506.324219	16	144.102463	14.690876	7.354833	15995.425781	21.912109	14.646501	0.015930
4	0	144.101166	14.674946	7.258184	34796.207031	506.324219	23	58.064362	14.713208	7.354834	16546.968750	2007.302734	14.577117	0.038263
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1146	106	761.014832	29.930386	13.595733	12693.996094	12193.595199	1880	434.006134	29.956604	13.615034	1016.653931	124.465027	29.656542	0.026218
1147	106	761.014832	29.930386	13.595733	12693.996094	12193.595199	2269	515.005676	29.814772	13.615035	4266.412109	3606.785522	29.552746	0.115613
1148	106	761.014832	29.930386	13.595733	12693.996094	12193.595199	3113	434.005402	29.935779	13.499251	1186.000000	685.648323	29.635576	0.005394
1149	106	761.014832	29.930386	13.595733	12693.996094	12193.595199	3131	1007.023560	29.862866	13.537850	1132.000000	631.529305	29.790583	0.067520
1150	106	761.014832	29.930386	13.595733	12693.996094	12193.595199	3146	351.000610	30.015176	13.615034	1301.600586	800.714608	29.671553	0.084791

1151 rows × 14 columns

The per-dimension bounds to consider for each feature must be specified using the configure_profile_extraction method. Keyword arguments low and high indicate the upper and lower bound of each feature per dimension, and relative indicates whether to use relative or absolute tolerances per dimension. No computation is performed here.

[12]:

decon.configure_profile_extraction(dims=['mz', 'drift_time', 'retention_time'],
                                   low=[-200E-6, -0.05, -0.1],
                                   high=[600E-6, 0.05, 0.1],
                                   relative=[True, True, False])

Finally, we apply the configured deconvolution routine to yield a deconvolved result, signaling which dimensions to consider for the deconvolution with the dims keyword argument. Specifying resolution signals the sampling interval of the interpolated profile, per dimension. Computational cost scales with the dimensionality of the data and the number of putative MS1:MS2 pairs, and the number of data points in the raw data (since each profile requires a slice operation). Below we will only evaluate quality of drift time profiles.

[13]:

%%time
res = decon.apply(dims='drift_time', resolution=0.01)

CPU times: user 1min 6s, sys: 327 ms, total: 1min 6s
Wall time: 1min 6s

The data frame returned is in long format. To yield the deconvolution result per MS1 feature, simply perform a groupby operation. We filter by drift_time_score to keep only those fragment profiles with high agreement with the MS1 precursor. Appropriate selection here depends on the quality of the offset correction.

[14]:

res = res.loc[res['drift_time_score'] > 0.9].groupby(by=[x for x in res.columns if x.endswith('_ms1')],
                                                     as_index=False).agg(list).sort_values(by='persistence_ms1',
                                                                                           ascending=False).head(5)
res

[14]:

	index_ms1	mz_ms1	drift_time_ms1	retention_time_ms1	intensity_ms1	persistence_ms1	index_ms2	mz_ms2	drift_time_ms2	retention_time_ms2	intensity_ms2	persistence_ms2	drift_time_raw_ms2	drift_time_error	drift_time_score
61	66	496.337463	28.336977	1.926034	47870.839844	47370.407699	[9, 84, 112, 121, 191, 234, 381, 512, 572, 578...	[496.3372497558594, 478.3252868652344, 497.339...	[28.359545463198298, 28.280074184849394, 28.33...	[1.9067332744598389, 1.9453333616256714, 1.945...	[19621.005859375, 7959.99560546875, 6133.04687...	[375.21484375, 96.2119140625, 149.302734375, 3...	[28.264205932617188, 28.175649642944336, 28.23...	[0.022568458193415353, 0.05690282015548931, 0....	[0.994691902938733, 0.9912188303898247, 0.9856...
92	97	192.094498	16.632273	9.035781	37407.832031	36907.446761	[79, 207, 301, 326, 406, 433, 438, 586, 760, 9...	[148.10479736328125, 118.0657730102539, 118.06...	[16.645597499988735, 16.514533011318445, 16.51...	[8.977734565734863, 8.97773265838623, 9.093682...	[8243.1357421875, 4113.98193359375, 3061.89868...	[148.38232421875, 157.131103515625, 19.0856933...	[16.553136825561523, 16.393863677978516, 16.39...	[0.013324779651821217, 0.11773970901846909, 0....	[0.9863538882208794, 0.989537784254673, 0.9886...
86	91	204.123215	17.011349	8.494816	36476.816406	35976.005355	[128, 277, 699, 812, 1127, 1403, 2904, 2907, 2...	[85.05628967285156, 85.07498168945312, 85.0947...	[17.025378545289982, 17.01229256969094, 17.050...	[8.514116287231445, 8.514117240905762, 8.51411...	[5424.8583984375, 3419.12255859375, 1757.81372...	[64.27490234375, 113.87548828125, 87.561401367...	[16.858814239501953, 16.845661163330078, 16.88...	[0.014029820924747582, 0.000943845325704018, 0...	[0.9887982891868636, 0.9859095879326736, 0.976...
85	90	118.086433	13.758590	8.224499	36248.109375	35747.372542	[45, 159, 310, 549, 836, 945, 1102, 1173, 1191...	[59.07223129272461, 118.0858383178711, 58.0918...	[13.790750275478697, 13.789786160412664, 13.75...	[8.320982933044434, 8.166616439819336, 8.20520...	[12710.3955078125, 4892.4521484375, 3134.25, 2...	[476.71484375, 77.00244140625, 129.80932617187...	[13.689501762390137, 13.755182266235352, 13.64...	[0.03216053091082571, 0.031196415844792824, 0....	[0.9782737511259236, 0.975511013881835, 0.9826...
82	87	144.101120	14.669243	7.181000	34877.378906	34377.263167	[11, 13, 14, 28, 342, 468, 536, 541, 612, 645,...	[144.1011199951172, 144.10250854492188, 144.10...	[14.702256094352368, 14.702865879642017, 14.72...	[7.2388834953308105, 7.2388834953308105, 7.277...	[17817.498046875, 17105.62890625, 18313.193359...	[38.572265625, 142.08203125, 1513.0859375, 233...	[14.65795612335205, 14.658571243286133, 14.677...	[0.03301323546564916, 0.03362302075529833, 0.0...	[0.9893036466315329, 0.9893071239016104, 0.990...

Visualize fragmentation spectra of the highest intensity MS1 feature:

[15]:

deimos.plot.stem(np.array(res.loc[61, 'mz_ms2']),
                 np.array(res.loc[61, 'intensity_ms2']),
                 width=1)
plt.tight_layout()
plt.show()

../_images/user_guide_ms2_extraction_29_0.png