drugforge.spectrum.blast.get_blast_seqs
- drugforge.spectrum.blast.get_blast_seqs(seq_source: str, save_folder: Path, input_type='fasta', nhits=100, nalign=500, e_val_thresh=1e-20, database='refseq_protein', xml_file='results.xml', verbose=True, save_csv=None, email='', pdb_file=None) DataFrame[source]
Run a BLAST search on a protein sequence.
- Parameters:
seq_source (str) – Source with the sequence.
save_folder (Path) – Path to folder to save BLAST results
input_type (str, optional) – Type of sequence source [“pre-cal”, “fasta”, “sequence”], by default “fasta”
nhits (int, optional) – Number of hits, hitlist_size parameter in BLAST, by default 100
nalign (int, optional) – Number of alignments, alignments parameter in BLAST, by default 500
e_val_thresh (float, optional) – Threshold to filter BLAST results, by default 1e-20
database (str, optional) – Name of BLAST database, by default “refseq_protein”
xml_file (str, optional) – Name to be given to XML with BLAST results, by default “results.xml”
verbose (bool, optional) – Whether to print info on BLAST search, by default True
save_csv (Union[str, None], optional) – CSV file name to optionally save dataframe, by default None
email (str, optional) – Email to use for the Entrez query, by default “”
pdb_file (str, optional) – Path to PDB file used to calculate pocket similarity score
- Returns:
DataFrame with Blast results.
- Return type:
pd.DataFrame