Using the `drugforge-alchemy` CLI ============================= The `drugforge-alchemy` CLI provides a series of automated workflows and convince functions that when combined create and end-to-end pipeline enabling the routine running of state-of-the-art alchemical free energy calculations at (Alchemi)scale! The CLI is designed to get you up and running as quickly as possible and has tried and tested defaults, but also allows you to customise every part of the workflow if required. To build custom workflows see the Alchemy API tutorial which explains the API in detail including the customisation options available. Here we will give a very quick over view of the CLI and how they should be used in production. ## drugforge-alchemy Pipeline The `drugforge-alchemy` allows for the preparation, planning and prediction of alchemical free energy calculations at scale. Each step of the pipeline can be run via the command line. The commands can be viewed at any time by running: ```shell drugforge-alchemy --help ``` Now lets walk through a typical application starting with `prep`. ## drugforge-alchemy Prep `Prep` offers a pipeline of tools to prepare our ligand series for binding free energy calculations including state enumeration, constrained pose and partial charge generation. To view the default prep workflow we can use the following command to write workflow to file where it can be edited although this is much easier using the API: ```shell drugforge-alchemy prep create -f "prep-workflow.json" ``` The prep workflow can then be executed on a set of ligands (in a local file smi/sdf) using the following command: ```shell drugforge-alchemy prep run --factory-file "prep-workflow.json" \ --dataset-name "example-dataset" \ --ligands "ligand_file.sdf" \ --receptor-complex "receptor.json" \ --processors 4 ``` or if you use postera you can provide the name of the molecule set to pull the ligands from provided your `POSTERA_API_KEY` is exported as an environment variable: ```shell drugforge-alchemy prep run --factory-file "prep-workflow.json" \ --dataset-name "example-dataset" \ --postera-molset-name "ligand-series" \ --receptor-complex "receptor.json" \ --processors 4 ``` ```{eval-rst} .. warning:: This feature is highly experimental and it is recommended that you check the reference structure carefully ``` If you are not sure which reference crystal you would like to use when generating the poses for the ligands you can provide a directory of prepared structures using the `drugforge-prep` CLI and one will be selected for you. ```shell drugforge-alchemy prep run --factory-file "prep-workflow.json" \ --dataset-name "example-dataset" \ --postera-molset-name "ligand-series" \ --structure-dir "receptor-cache" \ --processors 4 ``` ```{eval-rst} .. warning:: This feature is highly experimental and it is recommended that you check the injected experimental compounds carefully ``` ```{eval-rst} .. note:: You must export the ``CDD_API_KEY`` and ``CDD_VAULT_NUMBER`` as environment varibales to enable the CDD interface. ``` Experimentally measured ligands can also be injected into the series at this stage via an interface to the CDD vault. By providing a protocol name the prep workflow will automatically download all ligands screened as part of this protocol and filter for ligands with an activity within the assay sensitivity range, fully defined stereochemistry and no covalent warhead. These will then be posed using the same protocol as the target ligands and marked as experimental via an SD tag. ```shell drugforge-alchemy prep run --factory-file "prep-workflow.json" \ --dataset-name "example-dataset" \ --postera-molset-name "ligand-series" \ --structure-dir "receptor-cache" \ --processors 4 \ --experimental-protocol "assay-1" ``` Once the prep workflow has finished you will find a new directory has been created named after the `--dataset-name` argument. Within this you will find a PDB file of the receptor along with an SDF of ligands in their constrained pose along with a csv detailing any ligand for which a pose could not be generated and the reason why. An `prepared_alchemy_dataset.json` file will also be present which can be used in the next stage of the workflow. ## drugforge-alchemy Plan We are now ready to plan an alchemical free energy network using a state-of-the-art workflow built on the [OpenFE](https://docs.openfree.energy/en/stable/) infrastructure. Our default workflow plans a minimal spanning tree network with redundancy to ensure each ligand is connected to at least two other ligands in the network, using the Lomap atom mapping and scoring function. Again this can be configured via the API or via manually editing the workflow file which can be generated using: ```shell drugforge-alchemy create "alchemy-factory.json" ``` We can now plan our network using the default workflow and the ligands we have just posed using the `prep` pipeline from the previous stage. The `prepared_alchemy_dataset.json` file contains everything needed for this next stage including the ligands, a dataset name and the receptor. The network is then generated by running: ```shell drugforge-alchemy plan --alchemy-dataset "prepared_alchemy_dataset.json" ``` Or if you have posed the ligands using some other pipeline you can provide them as an SDF file and the receptor can be provided as a PDB and should already be protonated: ```shell drugforge-alchemy plan --name "my-network" \ --ligands "ligands.sdf" \ --receptor "protein.pdb" ``` If you use the CDD vault to store experimental data and wish to upload your results to postera later you can also set the name of the assay protocol and biological target which should be associated with this network to save having to supply them each time you make a prediction later in the workflow: ```shell drugforge-alchemy plan --name "my-network" \ --ligands "ligands.sdf" \ --receptor "protein.pdb" \ --experimental-protocol "assay-2" \ --target "SARS-CoV-2-Mac1" ``` After running the `plan` workflow you will find another new directory has been created named after the `--name` argument which contains a free energy calculation network in a file named `planned_network.json` and an `ligand_network.graphml` file which can be viewed as an interactive network using the `OpenFE` CLI: ```shell openfe view-ligand-network ligand_network.graphml ``` ## drugforge-alchemy Submit ```{eval-rst} .. note:: The commands ``submit``, ``status``, ``restart``, ``stop``, ``gather`` and ``predict`` assume the network file is in the working directory allowing you to avoid passing the argument explicitly. ``` At ASAP we make extensive use of the fantastic [Alchemiscale](https://github.com/openforcefield/alchemiscale): > a high-throughput alchemical free energy execution system for use with HPC, cloud, bare metal, and Folding@Home This allows us to plan and execute thousands of `OpenFE` based calculations on distributed compute simultaneously, and provides a convent API to track and manage calculations rather than having to manually sort though hundreds of local files. ```{eval-rst} .. note:: Make sure to have your ``ALCHEMISCALE_ID`` and ``ALCHEMISCALE_KEY`` exported as environment variables ``` We can now submit our `planned_network.json` and execute the tasks on Alchemiscale using: ```shell drugforge-alchemy submit --network "planned_network.json" \ --organization "my_org" \ --campaign "testing_asap_alchemy" \ --project "target_1" ``` This command has created the network on Alchemiscale under a Scope defined by the combination of the organization, campaign and project, then created tasks for each transformation and submitted them to be executed! A unique network key is generated during this process which allows you to quickly look up the network on Alchemiscale and is stored in the `planned_network.json` file. ## drugforge-alchemy Status To track to progress of the alchemical network on Alchemiscale you can use the following command: ```shell drugforge-alchemy status ``` If your network has some errored tasks we can also retrieve the errors and tracebacks using: ```shell drugforge-alchemy status --errors --with-traceback ``` or if you would like to view the status of all currently actioned networks on Alchemiscale under your scope you can use: ```shell drugforge-alchemy status --all-networks ``` ## drugforge-alchemy Restart Sometimes calculations can fail due to a verity of reasons, some of which can be cleared by simply restarting the tasks. Until automatic restarting is built into Alchemiscale we provide a command which allows you to restart all the errored tasks in a network: ```shell drugforge-alchemy restart ``` ## drugforge-alchemy Stop If for any reason you want to stop a network, which removes all currently actioned tasks, you will need the network key which can be found in the `status` command: ```shell drugforge-alchemy stop --network-key "network-key" ``` ## drugforge-alchemy Gather Once our network has completed all its tasks we can gather the results and store them locally for analysis using: ```shell drugforge-alchemy gather ``` if the network has some incomplete edges this command will fail, you can however bypass this check using: ```shell drugforge-alchemy gather --allow-missing ``` This will create a new copy of the network with the results called `result_network.json`. ## drugforge-alchemy Predict Finally, with our local results we can now estimate the binding affinity of our ligands using: ```shell drugforge-alchemy predict ``` This will produce two `CSV` files one containing the relative and the other the absolute binding affinity predictions. If you provided the `experimental-protocol` during the plan stage, experimental data will be extracted from the named protocol in the CDD vault and automatically used to assess the accuracy of the calculations. The absolute estimates will also be shifted to be centred around the mean of the experimental values and interactive `HTML` reports will be generated to help analyse the results in more detail. If you did not provide the protocol earlier you can provide it as an argument to the prediction command: ```shell drugforge-alchemy predict --experimental-protocol "assay-1" ``` or if you keep you experimental data in a different source you can provide it as a formated csv file which matches the CDD data: ```shell drugforge-alchemy predict --reference-dataset "assay_data.csv" --reference-units "pIC50" ``` If you use postera and would like to upload the results you can provide the molecule set name and a biological target if not provided earlier: ```shell drugforge-alchemy predict --target "SARS-CoV-2-Mac1" --postera-molset-name "alchemy-ligands-1" ```