drugforge.data.services.fragalysis.fragalysis_download.parse_fragalysis

drugforge.data.services.fragalysis.fragalysis_download.parse_fragalysis(x_fn, x_dir, name_filter=None, name_filter_column='crystal_name', drop_duplicate_datasets=False)[source]

Load all crystal structures into schema.CrystalCompoundData objects.

Parameters:
  • x_fn (str or Path) – metadata.CSV file giving information on each crystal structure

  • x_dir (str or Path) – Path to directory containing directories with crystal structure PDB files

  • name_filter (str or list) – String or list of strings that are required to be in the name_filter_column

  • name_filter_column (str) – Name of column in the metadata.csv that will be used to filter the dataframe

  • drop_duplicate_datasets (bool) – If true, will drop the _1A, _0B, etc duplicate datasets for a given crystal structure.

Returns:

List of parsed crystal structures

Return type:

List[schema.CrystalCompoundData]