drugforge.ml.dataset.GroupedDockedDataset

class drugforge.ml.dataset.GroupedDockedDataset(*args: Any, **kwargs: Any)[source]

Bases: Dataset

Version of DockedDataset where data is grouped by compound_id, so all poses for a given compound can be accessed at a time.

__init__(compound_ids: list[str] = [], structures: dict[str, dict] = {}, random_iter=False)[source]

Constructor for GroupedDockedDataset object.

Parameters:

compound_ids (list[str]) – List of compound ids. Each entry in this list must have a corresponding entry in structures
structures (dict[str, dict]) – Dict mapping compound_id to a pose dict
random_iter (bool, default=False) – Iterate through the dataset randomly each time

Methods

`__init__`([compound_ids, structures, random_iter])	Constructor for GroupedDockedDataset object.
`from_complexes`(complexes[, exp_dict, ...])	Build from a list of Complex objects.
`from_files`(str_fns, compounds[, ignore_h, ...])

classmethod from_complexes(complexes: list[Complex], exp_dict={}, ignore_h=True, random_iter=False)[source]

Build from a list of Complex objects.

Parameters:

complexes (list[Complex]) – List of Complex schema objects to build into a DockedDataset object
exp_dict (dict[str, dict[str, int | float]], optional) – Dict mapping compound_id to an experimental results dict. The dict for a compound will be added to the pose representation of each Complex containing a ligand witht that compound_id
ignore_h (bool, default=True) – Whether to remove hydrogens from the loaded structure
random_iter (bool, default=False) – Iterate through the dataset randomly each time

Return type:

GroupedDockedDataset

classmethod from_files(str_fns, compounds, ignore_h=True, extra_dict=None, num_workers=1, random_iter=False)[source]

Parameters:

str_fns (list[str]) – List of paths for the PDB files. Should correspond 1:1 with the names in compounds
compounds (list[tuple[str]]) – List of (crystal structure, ligand compound id)
ignore_h (bool, default=True) – Whether to remove hydrogens from the loaded structure
extra_dict (dict[str, dict], optional) – Extra information to add to each structure. Keys should be compounds, and dicts can be anything as long as they don’t have the keys [“z”, “pos”, “lig”, “compound”]
num_workers (int, default=1) – Number of cores to use to load structures
random_iter (bool, default=False) – Iterate through the dataset randomly each time