This page was generated from /home/docs/checkouts/readthedocs.org/user_builds/drugforge/checkouts/stable/docs/_collections/notebooks/interfacing_with_databases_and_systems.ipynb.
Interfacing with databases and systems
ASAP’s workflows involve interfacing with many different services, data providers and integrations. Much like with our base level abstractions, we aim to provide a seamless way to work with these databases and integrations with high level abstractions
Reading lots of molecules from files
One often wants to read a giant file filled with molecule data, e.g. an SDF or mol2 file. We provide a MolFileFactory to quickly read these into a list of Ligands
[1]:
from drugforge.data.readers.molfile import MolFileFactory
from drugforge.data.testing.test_resources import fetch_test_file
big_sdf_file = fetch_test_file("Mpro_combined_labeled.sdf") # SDF file filled with COVID Moonshot compounds
factory = MolFileFactory(filename=big_sdf_file)
ligands = factory.load()
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'MAT-POS-fa06b69f-6'
Warning: OE3DToAtomStereo had a problem during OEMolToSmiles when writing 'MAT-POS-fa06b69f-6'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'MAT-POS-fa06b69f-6'
Warning: OE3DToAtomStereo had a problem during OEMolToSTDInChI when writing 'MAT-POS-fa06b69f-6'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'MAT-POS-fa06b69f-6'
Warning: OE3DToAtomStereo had a problem during OEMolToSTDInChIKey when writing 'MAT-POS-fa06b69f-6'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'MAT-POS-fa06b69f-6'
Warning: OE3DToAtomStereo had a problem during OEMolToInChI when writing 'MAT-POS-fa06b69f-6'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'MAT-POS-fa06b69f-6'
Warning: OE3DToAtomStereo had a problem during OEMolToInChIKey when writing 'MAT-POS-fa06b69f-6'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'AAR-POS-0daf6b7e-2'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 11 of molecule 'AAR-POS-0daf6b7e-2'
Warning: OE3DToAtomStereo had a problem during OEMolToSmiles when writing 'AAR-POS-0daf6b7e-2'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'AAR-POS-0daf6b7e-2'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 11 of molecule 'AAR-POS-0daf6b7e-2'
Warning: OE3DToAtomStereo had a problem during OEMolToSTDInChI when writing 'AAR-POS-0daf6b7e-2'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'AAR-POS-0daf6b7e-2'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 11 of molecule 'AAR-POS-0daf6b7e-2'
Warning: OE3DToAtomStereo had a problem during OEMolToSTDInChIKey when writing 'AAR-POS-0daf6b7e-2'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'AAR-POS-0daf6b7e-2'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 11 of molecule 'AAR-POS-0daf6b7e-2'
Warning: OE3DToAtomStereo had a problem during OEMolToInChI when writing 'AAR-POS-0daf6b7e-2'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'AAR-POS-0daf6b7e-2'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 11 of molecule 'AAR-POS-0daf6b7e-2'
Warning: OE3DToAtomStereo had a problem during OEMolToInChIKey when writing 'AAR-POS-0daf6b7e-2'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 2 of molecule 'TAT-ENA-80bfd3e5-7'
Warning: OE3DToAtomStereo had a problem during OEMolToSmiles when writing 'TAT-ENA-80bfd3e5-7'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 2 of molecule 'TAT-ENA-80bfd3e5-7'
Warning: OE3DToAtomStereo had a problem during OEMolToSTDInChI when writing 'TAT-ENA-80bfd3e5-7'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 2 of molecule 'TAT-ENA-80bfd3e5-7'
Warning: OE3DToAtomStereo had a problem during OEMolToSTDInChIKey when writing 'TAT-ENA-80bfd3e5-7'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 2 of molecule 'TAT-ENA-80bfd3e5-7'
Warning: OE3DToAtomStereo had a problem during OEMolToInChI when writing 'TAT-ENA-80bfd3e5-7'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 2 of molecule 'TAT-ENA-80bfd3e5-7'
Warning: OE3DToAtomStereo had a problem during OEMolToInChIKey when writing 'TAT-ENA-80bfd3e5-7'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'LON-WEI-8f408cad-5'
Warning: OE3DToAtomStereo had a problem during OEMolToSmiles when writing 'LON-WEI-8f408cad-5'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'LON-WEI-8f408cad-5'
Warning: OE3DToAtomStereo had a problem during OEMolToSTDInChI when writing 'LON-WEI-8f408cad-5'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'LON-WEI-8f408cad-5'
Warning: OE3DToAtomStereo had a problem during OEMolToSTDInChIKey when writing 'LON-WEI-8f408cad-5'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'LON-WEI-8f408cad-5'
Warning: OE3DToAtomStereo had a problem during OEMolToInChI when writing 'LON-WEI-8f408cad-5'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'LON-WEI-8f408cad-5'
Warning: OE3DToAtomStereo had a problem during OEMolToInChIKey when writing 'LON-WEI-8f408cad-5'
[2]:
print(len(ligands)) # loaded 576 ligands into a list
576
Reading structures from Fragalysis
Diamond light source uses the Fragalysis platform to display their crystallography results. ASAP makes extensive use of Diamond’s high throughput crystallography pipeline, and therefore have developed easy ways to download and parse Fragalysis data in our workflows.
To get a Fragalysis format dump, navigate to the Download button on the desired target in the Fragalysis UI. For ease of use here we have vendored a SARS-CoV-2-Mpro fragalysis file in our testing suite.
[3]:
from drugforge.data.testing.test_resources import fetch_test_file
mpro_fragalysis_zipped = fetch_test_file("mpro_fragalysis-04-01-24_zipped.zip")
extract_dir = "."
# unzip
import shutil
shutil.unpack_archive(mpro_fragalysis_zipped, ".")
Downloading file 'mpro_fragalysis-04-01-24_zipped.zip' from 'https://asap-discovery-test-files.s3.amazonaws.com/mpro_fragalysis-04-01-24_zipped.zip' to '/Users/joshua/Library/Caches/asapdiscovery_testing'.
[4]:
from drugforge.data.services.fragalysis.fragalysis_reader import FragalysisFactory
frag_factory = FragalysisFactory.from_dir("mpro_fragalysis-04-01-24_zipped")
complexes = frag_factory.load(use_dask=True) # we can use dask to speed this up a lot
# we now have a list of 800 complexes from fragalysis to use!
print(len(complexes))
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'LIG'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 11 of molecule 'LIG'
Warning: OE3DToAtomStereo had a problem during OEWriteMolecule when writing 'LIG'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 2 of molecule 'LIG'
Warning: OE3DToAtomStereo had a problem during OEWriteMolecule when writing 'LIG'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'LIG'
Warning: OE3DToAtomStereo had a problem during OEWriteMolecule when writing 'LIG'
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8 of molecule 'LIG'
Warning: OE3DToAtomStereo had a problem during OEWriteMolecule when writing 'LIG'
Warning: OECreateInChI: InChI only supports molecules with between 1 and 1023 atoms! (note: large molecule support is experimental)
Warning: OECreateInChI: InChI only supports molecules with between 1 and 1023 atoms! (note: large molecule support is experimental)
Warning: OECreateInChI: InChI only supports molecules with between 1 and 1023 atoms! (note: large molecule support is experimental)
Warning: OECreateInChI: InChI only supports molecules with between 1 and 1023 atoms! (note: large molecule support is experimental)
Warning: OECreateInChI: InChI only supports molecules with between 1 and 1023 atoms! (note: large molecule support is experimental)
Warning: OECreateInChI: InChI only supports molecules with between 1 and 1023 atoms! (note: large molecule support is experimental)
Warning: OECreateInChI: InChI only supports molecules with between 1 and 1023 atoms! (note: large molecule support is experimental)
Warning: OECreateInChI: InChI only supports molecules with between 1 and 1023 atoms! (note: large molecule support is experimental)
Warning: OECreateInChI: InChI only supports molecules with between 1 and 1023 atoms! (note: large molecule support is experimental)
Warning: OECreateInChI: InChI only supports molecules with between 1 and 1023 atoms! (note: large molecule support is experimental)
Warning: OECreateInChI: InChI only supports molecules with between 1 and 1023 atoms! (note: large molecule support is experimental)
Warning: OECreateInChI: InChI only supports molecules with between 1 and 1023 atoms! (note: large molecule support is experimental)
803
Loading compounds from Postera
Postera’s Manifold platform is the primary place for a lot of our DMTA cycle
We need to be able to push and pull data from there with easy. To use this example you will need a valid POSTERA_API_KEY, which you can create after making an account
[5]:
%env POSTERA_API_KEY=EXAMPLE
from drugforge.data.services.postera.postera_factory import PosteraFactory, PosteraSettings
ps = PosteraSettings()
ps
env: POSTERA_API_KEY=EXAMPLE
[5]:
PosteraSettings(POSTERA_API_KEY='EXAMPLE', POSTERA_API_URL='https://api.asap.postera.ai', POSTERA_API_VERSION='v1')
[6]:
pf = PosteraFactory(settings=ps, molecule_set_name="MY_MOLSET")
# pf.pull() will return a list of Ligands
Pushing data to Postera
Pushing data to postera is similar to loading, however we only allow certain tags followin our design specification to be updated in Manifold. These can be queried with ManifoldAllowedTags
[7]:
from drugforge.data.services.postera.manifold_data_validation import ManifoldAllowedTags
ManifoldAllowedTags.get_values()[::10]
[7]:
['SMILES',
'biochemical-activity_EV-A71-3Cpro_computed-SchNet-pIC50_msk',
'biochemical-activity_EV-D68-3Cpro_computed-SchNet-pIC50_msk',
'biochemical-activity_MERS-CoV-Mpro_computed-SchNet-pIC50_msk',
'biochemical-activity_SARS-CoV-2-Mpro_computed-SchNet-pIC50_msk',
'biochemical-activity_ZIKV-NS2B-NS3pro_computed-SchNet-pIC50_msk',
'in-silico_DENV-NS2B-NS3pro_ligand-conformer-strain-szybki-kcal-mol_msk',
'in-silico_EV-A71-3Cpro_docking-structure-POSIT_msk',
'in-silico_EV-A71-Capsid_docking-pose-fitness-POSIT_msk',
'in-silico_EV-D68-3Cpro_docking-hit_msk',
'in-silico_EV-D68-3Cpro_md-pose_msk',
'in-silico_EV-D68-Capsid_ligand-local-strain-szybki-kcal-mol_msk',
'in-silico_MERS-CoV-Mpro_ligand-conformer-strain-szybki-kcal-mol_msk',
'in-silico_SARS-CoV-2-Mac1_docking-structure-POSIT_msk',
'in-silico_SARS-CoV-2-Mpro_docking-pose-fitness-POSIT_msk',
'in-silico_SARS-CoV-2-N-protein_docking-hit_msk',
'in-silico_SARS-CoV-2-N-protein_md-pose_msk',
'in-silico_ZIKV-NS2B-NS3pro_ligand-local-strain-szybki-kcal-mol_msk']
Lets push some mock data to postera! You have to provide a SMILES and also a ligand_id which will be propagated to postera backend if it matches a UUID already present in Postera.
[8]:
from drugforge.workflows.postera.postera_uploader import PosteraUploader
import pandas as pd
data = {"SMILES": ["CCC", "CCCC"], "ligand_id":["abcderf1244134jasdasda", "asidaosidasdnalsd"], "in-silico_SARS-CoV-2-Mac1_docking-structure-POSIT_msk":["structure1", "structure2"]}
df = pd.DataFrame(data)
[9]:
df
[9]:
| SMILES | ligand_id | in-silico_SARS-CoV-2-Mac1_docking-structure-POSIT_msk | |
|---|---|---|---|
| 0 | CCC | abcderf1244134jasdasda | structure1 |
| 1 | CCCC | asidaosidasdnalsd | structure2 |
[10]:
pu = PosteraUploader(settings=ps, molecule_set_name="MY_MOLSET")
# pu.push(df) # will push data to remote
Reading data from CDD
At ASAP we use the CDD vault to store assay information on tested molecules and often need to search and pull data. To use this service you should export your CDD_API_KEY and CDD_VAULT_NUMBER which will be automatically picked up by our CDDSettings:
[11]:
%env CDD_API_KEY=EXAMPLE, CDD_VAULT_NUMBER=1
from drugforge.data.services.cdd.cdd_api import CDDAPI, CDDSettings
settings = CDDSettings()
settings
env: CDD_API_KEY=EXAMPLE, CDD_VAULT_NUMBER=1
[11]:
CDDSettings(CDD_API_KEY='EXAMPLE, CDD_VAULT_NUMBER=1', CDD_VAULT_NUMBER=6890, CDD_API_URL='https://app.collaborativedrug.com', CDD_API_VERSION='v1')
we can now use the CDDAPI interface to query our vault, lets start by searching for molecules, note that they are returned as raw dictionary data from the CDD which can be converted into ligand objects using the CXSmiles or Smiles data:
[ ]:
cdd_api = CDDAPI.from_settings(settings=settings)
# Search for a specific molecule in the vault, can only do one search at a time using smiles
benzene = cdd_api.get_molecules(smiles="c1ccccc1")
# search for a list of molecules by their name in CDD
molecules_by_name = cdd_api.get_molecules(names=['org-id-1', 'org-id-2'])
# or search for molecules using the CDD compound-id
molecules_by_id = cdd_api.get_molecules(compound_ids=[1, 2, 3, 4])
Another common task is to download all IC50 data for a given protocol to use in ML model development or benchmarking binding affinity calculations, this is trivial using the API:
[ ]:
ic50_dataframe = cdd_api.get_ic50_data(protocol_name='assay-1')
We also provide a utility function which allows you to quickly download all of the molecules in a given protocol and filter for fully defined stereo and non-covalent ligands only, which returns the results as asap Lignad objects:
[ ]:
from drugforge.alchemy.cli.utils import get_cdd_molecules
molecules = get_cdd_molecules(protocol_name='assay-1', defined_stereo_only=True, remove_covalent=True)
[ ]: