drugforge.ml.dataset.SplitDockedDataset

class drugforge.ml.dataset.SplitDockedDataset(*args: Any, **kwargs: Any)[source]

Bases: DockedDataset

Same layout as DockedDataset, but each entry is a dict that has entries for “complex”, “protein”, and “ligand”, which store the corresponding representations.

__init__(compounds={}, structures=[], random_iter=False)

Constructor for DockedDataset object.

Parameters:

compounds (dict[(str, str), list[int]]) – Dict mapping a compound tuple (xtal_id, compound_id) to a list of indices in structures that are poses for that id pair
structures (list[dict]) – List of pose dicts, containing at minimum tensors for atomic number, atomic positions, and a ligand idx. Indices in this list should match the indices in the lists in compounds.
random_iter (bool, default=False) – Iterate through the dataset randomly each time

Methods

`__init__`([compounds, structures, random_iter])	Constructor for DockedDataset object.
`from_complexes`(complexes[, exp_dict, ...])	Build from a list of Complex objects.
`from_files`(str_fns, compounds[, ignore_h, ...])

classmethod from_complexes(complexes: list[Complex], exp_dict=None, ignore_h=True, random_iter=False)

Build from a list of Complex objects.

Parameters:

complexes (list[Complex]) – List of Complex schema objects to build into a DockedDataset object
exp_dict (dict[str, dict[str, int | float]], optional) – Dict mapping compound_id to an experimental results dict. The dict for a compound will be added to the pose representation of each Complex containing a ligand witht that compound_id
ignore_h (bool, default=True) – Whether to remove hydrogens from the loaded structure
random_iter (bool, default=False) – Iterate through the dataset randomly each time

Return type:

DockedDataset

classmethod from_files(str_fns, compounds, ignore_h=True, extra_dict=None, num_workers=1, random_iter=False)

Parameters:

str_fns (list[str]) – List of paths for the PDB files. Should correspond 1:1 with the names in compounds
compounds (list[tuple[str]]) – List of (crystal structure, ligand compound id)
ignore_h (bool, default=True) – Whether to remove hydrogens from the loaded structure
extra_dict (dict[str, dict], optional) – Extra information to add to each structure. Keys should be compounds, and dicts can be anything as long as they don’t have the keys [“z”, “pos”, “lig”, “compound”]
num_workers (int, default=1) – Number of cores to use to load structures
random_iter (bool, default=False) – Iterate through the dataset randomly each time