Quick Start
As a quick start, we here load MS/MS spectra from a mgf file, perform molecular formula annotations, and retrieve the annotation result summary:
from msbuddy import Msbuddy, MsbuddyConfig
# create a MsbuddyConfig object
msb_config = MsbuddyConfig(ms_instr="orbitrap", # supported: "qtof", "orbitrap", "fticr" or None
# custom MS1 and MS2 tolerance will be used if None
# highly recommended to fill in the instrument type
ppm=True, # use ppm for mass tolerance
ms1_tol=5, # MS1 tolerance in ppm or Da
ms2_tol=10, # MS2 tolerance in ppm or Da
halogen=True,
timeout_secs=200)
# instantiate a Msbuddy object with the parameter set
msb_engine = Msbuddy(msb_config)
# load data, here we use a mgf file as an example
msb_engine.load_mgf('input_file.mgf')
# annotate molecular formula
msb_engine.annotate_formula()
# retrieve the annotation result summary
results = msb_engine.get_summary()
# print the result, results is a list of dictionaries
for individual_result in results:
for key, value in individual_result.items():
print(key, value)
Note
It is highly recommended to set up the ms_instr parameter in the msbuddy.MsbuddyConfig to obtain the best annotation performance.
Please see Configuration session for more details.
Within the result summary, results is a list of Python dictionaries. individual_result is a dictionary containing the following keys:
identifier: Identifier of the metabolic featuremz: Precursor m/zrt: Retention time in secondsadduct: Adduct typeformula_rank_1: Molecular formula annotation ranked in the first placeestimated_fdr: Estimated false discovery rate (FDR)formula_rank_2: Molecular formula annotation ranked in the second placeformula_rank_3: Molecular formula annotation ranked in the third placeformula_rank_4: Molecular formula annotation ranked in the fourth placeformula_rank_5: Molecular formula annotation ranked in the fifth place
MS/MS spectra can also be loaded via their USIs:
# you can load multiple USIs at once
msb_engine.load_usi(['mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00003740036',
'mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00003740037'])
# load USIs with adducts specified, otherwise the default adducts ([M+H]+, [M-H]-) will be used
msb_engine.load_usi(usi_list=['mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00003740036',
'mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000845027'],
adduct_list=['[M+H]+', '[M-H2O+H]+'])
Note
msbuddy does not perform adduct annotation. Please make sure the adduct type is correctly specified in the input file if necessary, otherwise default adducts ([M+H]+, [M-H]-) will be used. We claim that adduct annotation should be performed on the MS1 level, where chromatographic peak profiles must be involved.
If parallel computing is needed, you can specify the number of CPUs to be used, but the code has to be run in if __name__ == '__main__': block:
if __name__ == '__main__':
from msbuddy import Msbuddy, MsbuddyConfig
# create a MsbuddyConfig object
msb_config = MsbuddyConfig(ms_instr="orbitrap", # supported: "qtof", "orbitrap" and "fticr"
# highly recommended to fill in the instrument type
halogen=True,
parallel=True, # enable parallel computing
n_cpu=12) # number of CPUs to be used
...(other code remains the same)