4. Generating Pickaxe Inputs

4.1. Compound Inputs

Pickaxe takes a few input files to specify compounds and rules for the expansion. One group of these files are simply compounds, some of which are required and others are option, depending on the desired functionality of a given Pickaxe run.

Required:

  1. Compounds to react.

Optional:

  1. Targets to filter for.

  2. Metabolomic data to filter with (see met_data_path parameter in Built-In Filters).

4.1.1. Compound Input

Pickaxe accepts a .csv or a .tsv that consists of two columns, an id field and a structure field. The id field is used to label the final output and the structure field consists of SMILES representation of compounds.

Here is an example of a valid compound input file:

id,SMILES
glucose,C(C1C(C(C(C(O1)O)O)O)O)O
TAL,C/C1=CC(\O)=C/C(=O)O1

4.1.2. Target Input

The target compound input file takes the same form as the input compounds.:

id,SMILES
1,C=C(O)COCC(C)O

4.2. Reaction Operator Inputs

There are two files required for the application of reactions:

  1. Reaction operators to use.

  2. Coreactants required by the reaction operators.

Default rules are supplied with pickaxe, however custom rules can be written and used.

4.2.1. Default Rules

4.2.1.1. Overview

A set of biological reaction rules and cofactors are provided by default. These consist of approximately 70,000 MetaCyc reactions condensed into generic rules. Selecting all of these rules will result in a large expansion, but they can be trimmed down significantly while still retaining high coverage of MetaCyc reactions.

Number of Rules

Percent Coverage of MetaCyc Reactions

20

50

100

78

272

90

500

95

956

99

1221

100

Additionally, a set of intermediate reaction rule operators are provided as well. These operators are less generalized than the generalized ruleset and provide uniprot information for each operator.

4.2.1.2. Generating Default Rule Inputs

Default rules are imported from the rules module of minedatabase and have a few options to specify what is loaded:

  1. Number of Rules

  2. Fractional Coverage of MetaCyc

  3. Anaerobic Rules only

  4. Groups to Include

  5. Groups to Ignore

Possible groups to ignore and include are: aromatic, aromatic_oxygen, carbonyl, nitrogen, oxygen, fluorine, phosphorus, sulfur, chlorine, bromine, iodine, halogen. Examples of Defining rules are given below.

The provided code returns the rule_list and coreactant_list that is passed to the pickaxe object.

4.2.1.3. Generalized Rules Mapping 90% Metacyc

from minedatabase.rules import metacyc_generalized
rule_list, coreactant_list, rule_name = metacyc_generalized(
    fraction_coverage=0.9
)

4.2.1.4. Generalized Rules with 200 Anaerobic and Halogens

from minedatabase.rules import metacyc_generalized
rule_list, coreactant_list, rule_name = metacyc_generalized(
    n_rules=200
    anaerobic=True,
    include_containing=["halogen"]
)

4.2.1.5. Intermediate Rules with all Halogens except Chlorine

from minedatabase.rules import metacyc_intermediate
rule_list, coreactant_list, rule_name = metacyc_intermediate(
    include_containing=["halogen"],
    exclude_containing=["chlorine"]
)

4.2.2. Generating Custom Rules

In the event that the default rules do not contain a reaction of interest, it is pososible to generate your own rules. Outlined below is the process to generate rules for esterification reactions, which consists of three parts

  1. Writing the reaction SMARTS.

  2. Writing the reaction rule.

  3. Writing the coreactant list.

4.2.2.1. Writing Reaction SMARTS

Rules are generated using SMARTS which represent reactions in a string. Importantly, these reaction rules specify atom mapping, which keeps track of the species throughout the reaction. To higlight a simple reaction rule generation, a deesterification reaction will be used.

_images/full_rule.png

The reaction SMARTS is highighted the same color as the corresponding molecule in the reaction above. Ensuring correct atom mapping is important when writing these rules. This is an exact reaction rule and it matches the exact pattern of the reaction, which is not useful as it will not match many molecules.

Instead of using an exact reaction, a generic reaction rule can be used to match more molecules. In this case, the radius of the atom away from the reactive site is decreased.

_images/generic_rule.png

4.2.2.2. Writing Reaction Rules

With the reaction SMARTS written, now the whole rule for Pickaxe must be written. The rules are written as follows in a .tsv:

RULE_ID REACTANTS   RULE    PROODUCTS   NOTES

The rule_id is an arbitrary, unique value, the reactants and products specify how many compounds a rule should be expecting, and the rule is the reaction SMARTS. Notes can be provided, but have no effect on the running of Pickaxe. The reactants and products are specified as a generic compound, “Any”, or as a predefined coreactant.

Below is an example of a reaction rule made for a deesterification reaction.

_images/deesterification.png
RULE_ID REACTANTS   RULE    PROODUCTS   NOTES
rule1   Any;WATER     [#6:2]-(=[#8:1])-[#8:4]-[#6:5].[#8:3]>>[#6:2]-(=[#8:1])-[#8:3].[#8:4]-[#6:5]    Any;Any

Note

Currently only one “Any” is allowed as a reactant and any other reactant must be defined as a coreactant.

4.2.2.3. Defining Coreactants

Coreactants are defined in their own file that the Pickaxe object will load and use alongside the reaction rules. The coreactant file for the example deesterification reaction is:

#ID Name    SMILES
WATER       WATER   O

4.2.2.4. Reaction Rule Example Summary

Summarized here is the input files for a deesterification reaction.

Reaction

_images/deesterification.png

Reaction Rule Input

RULE_ID REACTANTS   RULE    PROODUCTS   NOTES
rule1   Any;WATER     [#6:2]-(=[#8:1])-[#8:4]-[#6:5].[#8:3]>>[#6:2]-(=[#8:1])-[#8:3].[#8:4]-[#6:5]    Any;Any

Coreactant Input

#ID Name    SMILES
WATER       WATER   O