Drug molecule screening

Code available here.

This tutorial builds a virtual drug-screening pipeline on Flyte. The workflow loads a library of drug SMILES strings, computes physicochemical properties with RDKit, applies Lipinski’s Rule of Five and custom target-profile filters, and ranks candidates by drug-likeness score — with rich HTML reports streamed into the Flyte UI.

Flyte provides:

  • Cached molecule loading so repeated runs skip re-parsing SMILES
  • Report-enabled stage tasks that stream property charts, similarity matrices, and candidate spotlights as each step completes
  • Lightweight orchestration — the top-level pipeline task chains stages without its own report surface

Define the task environment

The pipeline runs on CPU with RDKit and system libraries for 2D structure rendering.

drug_molecule_screening.py
main_img = flyte.Image.from_uv_script(__file__, name="drug-molecule-screening", pre=True).with_apt_packages(
    "libxrender1", "libxext6", "libexpat1",
)

env = flyte.TaskEnvironment(
    name="drug-molecule-screening",
    image=main_img,
    resources=flyte.Resources(cpu=2, memory="6Gi"),
)
# /// script
# requires-python = ">=3.12"
# dependencies = [
#    "flyte>=2.4.0",
#    "rdkit",
#    "numpy",
#    "scikit-learn",
#    "pillow",
# ]
# ///

Orchestrate the pipeline

The pipeline task is a lightweight orchestrator: it calls four stage tasks in sequence and returns a JSON summary.

drug_molecule_screening.py
@env.task
async def pipeline(
    molecules_json: str = "",
    target_profile: str = "",
) -> str:
    """Virtual drug molecule screening pipeline.

    Parses a molecular library, computes physicochemical properties,
    screens candidates against a target drug profile, and generates
    a comprehensive visual report with ranked candidates.

    Args:
        molecules_json: JSON mapping molecule names to SMILES strings.
            Defaults to a curated library of ~15 well-known drugs.
        target_profile: JSON with desired property ranges
            (e.g. {"mw": [150, 500], "logp": [-0.5, 5]}).
            Defaults to standard drug-like criteria.

    Returns:
        JSON summary of screening results.
    """
    mol_dir = await load_molecules(molecules_json=molecules_json)
    props_json = await compute_properties(molecule_dir=mol_dir)
    screening_json = await screen_candidates(
        properties_json=props_json,
        target_profile=target_profile,
    )
    summary = await generate_report(
        molecule_dir=mol_dir,
        properties_json=props_json,
        screening_json=screening_json,
    )
    return summary

Each stage task (load_molecules, compute_properties, screen_candidates, generate_report) owns its own report=True surface and updates the Flyte UI as it runs.

Run the workflow

From the example directory:

cd v2/tutorials/drug_molecule_screening
uv run --script drug_molecule_screening.py

Pass a custom target profile:

flyte run drug_molecule_screening.py pipeline \
  --target_profile '{"mw": [100, 400], "logp": [-0.5, 4.0]}'

Open the run URL and follow the report panel for funnel charts, property distributions, and top-candidate spotlights.