Quick Start
===========

As a quick start, we here load MS/MS spectra from a mgf file, perform molecular formula annotations, and retrieve the annotation result summary:

.. code-block:: python

   from msbuddy import Msbuddy, MsbuddyConfig

   # create a MsbuddyConfig object
   msb_config = MsbuddyConfig(ms_instr="orbitrap", # supported: "qtof", "orbitrap", "fticr" or None
                                                   # custom MS1 and MS2 tolerance will be used if None
                                                   # highly recommended to fill in the instrument type
                              ppm=True,  # use ppm for mass tolerance
                              ms1_tol=5,  # MS1 tolerance in ppm or Da
                              ms2_tol=10,  # MS2 tolerance in ppm or Da
                              halogen=True,
                              timeout_secs=200)

   # instantiate a Msbuddy object with the parameter set
   msb_engine = Msbuddy(msb_config)

   # load data, here we use a mgf file as an example
   msb_engine.load_mgf('input_file.mgf')

   # annotate molecular formula
   msb_engine.annotate_formula()

   # retrieve the annotation result summary
   results = msb_engine.get_summary()

   # print the result, results is a list of dictionaries
   for individual_result in results:
       for key, value in individual_result.items():
           print(key, value)


.. note::
    It is **highly recommended** to set up the ``ms_instr`` parameter in the :class:`msbuddy.MsbuddyConfig` to obtain the best annotation performance.
    Please see `Configuration <config.html>`_ session for more details.


Within the result summary, ``results`` is a list of Python dictionaries. ``individual_result`` is a dictionary containing the following keys:

- ``identifier``: Identifier of the metabolic feature
- ``mz``: Precursor m/z
- ``rt``: Retention time in seconds
- ``adduct``: Adduct type
- ``formula_rank_1``: Molecular formula annotation ranked in the first place
- ``estimated_fdr``: Estimated false discovery rate (FDR)
- ``formula_rank_2``: Molecular formula annotation ranked in the second place
- ``formula_rank_3``: Molecular formula annotation ranked in the third place
- ``formula_rank_4``: Molecular formula annotation ranked in the fourth place
- ``formula_rank_5``: Molecular formula annotation ranked in the fifth place


MS/MS spectra can also be loaded via their USIs:

.. code-block:: python

   # you can load multiple USIs at once
   msb_engine.load_usi(['mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00003740036',
                        'mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00003740037'])

   # load USIs with adducts specified, otherwise the default adducts ([M+H]+, [M-H]-) will be used
   msb_engine.load_usi(usi_list=['mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00003740036',
                                 'mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000845027'],
                       adduct_list=['[M+H]+', '[M-H2O+H]+'])

.. note::
    **msbuddy** does not perform adduct annotation. Please make sure the adduct type is correctly specified in the input file if necessary, otherwise default adducts ([M+H]+, [M-H]-) will be used.
    We claim that adduct annotation should be performed on the MS1 level, where chromatographic peak profiles must be involved.


If parallel computing is needed, you can specify the number of CPUs to be used, but the code has to be run in ``if __name__ == '__main__':`` block:


.. code-block:: python

   if __name__ == '__main__':
       from msbuddy import Msbuddy, MsbuddyConfig
       # create a MsbuddyConfig object
       msb_config = MsbuddyConfig(ms_instr="orbitrap", # supported: "qtof", "orbitrap" and "fticr"
                                                       # highly recommended to fill in the instrument type
                                  halogen=True,
                                  parallel=True, # enable parallel computing
                                  n_cpu=12) # number of CPUs to be used
       ...(other code remains the same)