drugforge.ml.dataset.GroupedDockedDataset

class drugforge.ml.dataset.GroupedDockedDataset(*args: Any, **kwargs: Any)[source]

Bases: Dataset

Version of DockedDataset where data is grouped by compound_id, so all poses for a given compound can be accessed at a time.

__init__(compound_ids: list[str] = [], structures: dict[str, dict] = {}, random_iter=False)[source]

Constructor for GroupedDockedDataset object.

Parameters:
  • compound_ids (list[str]) – List of compound ids. Each entry in this list must have a corresponding entry in structures

  • structures (dict[str, dict]) – Dict mapping compound_id to a pose dict

  • random_iter (bool, default=False) – Iterate through the dataset randomly each time

Methods

__init__([compound_ids, structures, random_iter])

Constructor for GroupedDockedDataset object.

from_complexes(complexes[, exp_dict, ...])

Build from a list of Complex objects.

from_files(str_fns, compounds[, ignore_h, ...])

classmethod from_complexes(complexes: list[Complex], exp_dict={}, ignore_h=True, random_iter=False)[source]

Build from a list of Complex objects.

Parameters:
  • complexes (list[Complex]) – List of Complex schema objects to build into a DockedDataset object

  • exp_dict (dict[str, dict[str, int | float]], optional) – Dict mapping compound_id to an experimental results dict. The dict for a compound will be added to the pose representation of each Complex containing a ligand witht that compound_id

  • ignore_h (bool, default=True) – Whether to remove hydrogens from the loaded structure

  • random_iter (bool, default=False) – Iterate through the dataset randomly each time

Return type:

GroupedDockedDataset

classmethod from_files(str_fns, compounds, ignore_h=True, extra_dict=None, num_workers=1, random_iter=False)[source]
Parameters:
  • str_fns (list[str]) – List of paths for the PDB files. Should correspond 1:1 with the names in compounds

  • compounds (list[tuple[str]]) – List of (crystal structure, ligand compound id)

  • ignore_h (bool, default=True) – Whether to remove hydrogens from the loaded structure

  • extra_dict (dict[str, dict], optional) – Extra information to add to each structure. Keys should be compounds, and dicts can be anything as long as they don’t have the keys [“z”, “pos”, “lig”, “compound”]

  • num_workers (int, default=1) – Number of cores to use to load structures

  • random_iter (bool, default=False) – Iterate through the dataset randomly each time