Quick Start

As a quick start, we here load MS/MS spectra from a mgf file, perform molecular formula annotations, and retrieve the annotation result summary:

from msbuddy import Msbuddy, MsbuddyConfig

# create a MsbuddyConfig object
msb_config = MsbuddyConfig(ms_instr="orbitrap", # supported: "qtof", "orbitrap", "fticr" or None
                                                # custom MS1 and MS2 tolerance will be used if None
                                                # highly recommended to fill in the instrument type
                           ppm=True,  # use ppm for mass tolerance
                           ms1_tol=5,  # MS1 tolerance in ppm or Da
                           ms2_tol=10,  # MS2 tolerance in ppm or Da
                           halogen=True,
                           timeout_secs=200)

# instantiate a Msbuddy object with the parameter set
msb_engine = Msbuddy(msb_config)

# load data, here we use a mgf file as an example
msb_engine.load_mgf('input_file.mgf')

# annotate molecular formula
msb_engine.annotate_formula()

# retrieve the annotation result summary
results = msb_engine.get_summary()

# print the result, results is a list of dictionaries
for individual_result in results:
    for key, value in individual_result.items():
        print(key, value)

Note

It is highly recommended to set up the ms_instr parameter in the msbuddy.MsbuddyConfig to obtain the best annotation performance. Please see Configuration session for more details.

Within the result summary, results is a list of Python dictionaries. individual_result is a dictionary containing the following keys:

  • identifier: Identifier of the metabolic feature

  • mz: Precursor m/z

  • rt: Retention time in seconds

  • adduct: Adduct type

  • formula_rank_1: Molecular formula annotation ranked in the first place

  • estimated_fdr: Estimated false discovery rate (FDR)

  • formula_rank_2: Molecular formula annotation ranked in the second place

  • formula_rank_3: Molecular formula annotation ranked in the third place

  • formula_rank_4: Molecular formula annotation ranked in the fourth place

  • formula_rank_5: Molecular formula annotation ranked in the fifth place

MS/MS spectra can also be loaded via their USIs:

# you can load multiple USIs at once
msb_engine.load_usi(['mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00003740036',
                     'mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00003740037'])

# load USIs with adducts specified, otherwise the default adducts ([M+H]+, [M-H]-) will be used
msb_engine.load_usi(usi_list=['mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00003740036',
                              'mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000845027'],
                    adduct_list=['[M+H]+', '[M-H2O+H]+'])

Note

msbuddy does not perform adduct annotation. Please make sure the adduct type is correctly specified in the input file if necessary, otherwise default adducts ([M+H]+, [M-H]-) will be used. We claim that adduct annotation should be performed on the MS1 level, where chromatographic peak profiles must be involved.

If parallel computing is needed, you can specify the number of CPUs to be used, but the code has to be run in if __name__ == '__main__': block:

if __name__ == '__main__':
    from msbuddy import Msbuddy, MsbuddyConfig
    # create a MsbuddyConfig object
    msb_config = MsbuddyConfig(ms_instr="orbitrap", # supported: "qtof", "orbitrap" and "fticr"
                                                    # highly recommended to fill in the instrument type
                               halogen=True,
                               parallel=True, # enable parallel computing
                               n_cpu=12) # number of CPUs to be used
    ...(other code remains the same)