drugforge.spectrum.blast.get_blast_seqs

drugforge.spectrum.blast.get_blast_seqs(seq_source: str, save_folder: Path, input_type='fasta', nhits=100, nalign=500, e_val_thresh=1e-20, database='refseq_protein', xml_file='results.xml', verbose=True, save_csv=None, email='', pdb_file=None) DataFrame[source]

Run a BLAST search on a protein sequence.

Parameters:
  • seq_source (str) – Source with the sequence.

  • save_folder (Path) – Path to folder to save BLAST results

  • input_type (str, optional) – Type of sequence source [“pre-cal”, “fasta”, “sequence”], by default “fasta”

  • nhits (int, optional) – Number of hits, hitlist_size parameter in BLAST, by default 100

  • nalign (int, optional) – Number of alignments, alignments parameter in BLAST, by default 500

  • e_val_thresh (float, optional) – Threshold to filter BLAST results, by default 1e-20

  • database (str, optional) – Name of BLAST database, by default “refseq_protein”

  • xml_file (str, optional) – Name to be given to XML with BLAST results, by default “results.xml”

  • verbose (bool, optional) – Whether to print info on BLAST search, by default True

  • save_csv (Union[str, None], optional) – CSV file name to optionally save dataframe, by default None

  • email (str, optional) – Email to use for the Entrez query, by default “”

  • pdb_file (str, optional) – Path to PDB file used to calculate pocket similarity score

Returns:

DataFrame with Blast results.

Return type:

pd.DataFrame