Drug molecule screening
Code available here.
This tutorial builds a virtual drug-screening pipeline on Flyte. The workflow loads a library of drug SMILES strings, computes physicochemical properties with RDKit, applies Lipinski’s Rule of Five and custom target-profile filters, and ranks candidates by drug-likeness score — with rich HTML reports streamed into the Flyte UI.
Flyte provides:
- Cached molecule loading so repeated runs skip re-parsing SMILES
- Report-enabled stage tasks that stream property charts, similarity matrices, and candidate spotlights as each step completes
- Lightweight orchestration — the top-level
pipelinetask chains stages without its own report surface
Define the task environment
The pipeline runs on CPU with RDKit and system libraries for 2D structure rendering.
main_img = flyte.Image.from_uv_script(__file__, name="drug-molecule-screening", pre=True).with_apt_packages(
"libxrender1", "libxext6", "libexpat1",
)
env = flyte.TaskEnvironment(
name="drug-molecule-screening",
image=main_img,
resources=flyte.Resources(cpu=2, memory="6Gi"),
)
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "flyte>=2.4.0",
# "rdkit",
# "numpy",
# "scikit-learn",
# "pillow",
# ]
# ///Orchestrate the pipeline
The pipeline task is a lightweight orchestrator: it calls four stage tasks in sequence and returns a JSON summary.
@env.task
async def pipeline(
molecules_json: str = "",
target_profile: str = "",
) -> str:
"""Virtual drug molecule screening pipeline.
Parses a molecular library, computes physicochemical properties,
screens candidates against a target drug profile, and generates
a comprehensive visual report with ranked candidates.
Args:
molecules_json: JSON mapping molecule names to SMILES strings.
Defaults to a curated library of ~15 well-known drugs.
target_profile: JSON with desired property ranges
(e.g. {"mw": [150, 500], "logp": [-0.5, 5]}).
Defaults to standard drug-like criteria.
Returns:
JSON summary of screening results.
"""
mol_dir = await load_molecules(molecules_json=molecules_json)
props_json = await compute_properties(molecule_dir=mol_dir)
screening_json = await screen_candidates(
properties_json=props_json,
target_profile=target_profile,
)
summary = await generate_report(
molecule_dir=mol_dir,
properties_json=props_json,
screening_json=screening_json,
)
return summary
Each stage task (load_molecules, compute_properties, screen_candidates, generate_report) owns its own report=True surface and updates the Flyte UI as it runs.
Run the workflow
From the example directory:
cd v2/tutorials/drug_molecule_screening
uv run --script drug_molecule_screening.pyPass a custom target profile:
flyte run drug_molecule_screening.py pipeline \
--target_profile '{"mw": [100, 400], "logp": [-0.5, 4.0]}'Open the run URL and follow the report panel for funnel charts, property distributions, and top-candidate spotlights.