LLM fine-tuning with LoRA and QLoRA

Code available here.

This tutorial fine-tunes a language model for SQL generation using three methods in one workflow: full fine-tuning, LoRA adapters, and QLoRA (4-bit quantized base + LoRA). The pipeline prepares an instruction dataset from HuggingFace, trains with TRL SFTTrainer, evaluates against a base-model baseline, and streams training charts into Flyte reports.

Flyte provides:

  • GPU training with live loss and learning-rate charts via report=True.
  • Method switching through a single method parameter (full, lora, or qlora).
  • Cached dataset preparation for fast iteration on hyperparameters.

Define the task environments

The GPU environment declares a HuggingFace token secret for gated models.

llm_fine_tuning_lora_qlora.py
import os

main_img = flyte.Image.from_uv_script(__file__, name="llm-fine-tuning-lora-qlora", pre=True)

gpu_env = flyte.TaskEnvironment(
    name="llm-fine-tuning-lora-qlora-gpu",
    image=main_img,
    resources=flyte.Resources(cpu=4, memory="24Gi", gpu=1),
    secrets=[flyte.Secret(key="huggingface-token", as_env_var="HF_TOKEN")],
)

cpu_env = flyte.TaskEnvironment(
    name="llm-fine-tuning-lora-qlora-cpu",
    image=main_img,
    resources=flyte.Resources(cpu=2, memory="8Gi"),
    depends_on=[gpu_env],
)

HF_TOKEN = os.environ.get("HF_TOKEN")
# /// script
# requires-python = ">=3.12"
# dependencies = [
#    "flyte>=2.4.0",
#    "torch>=2.1.0",
#    "transformers>=4.45.0",
#    "peft>=0.13.0",
#    "trl>=0.12.0",
#    "bitsandbytes>=0.44.0",
#    ...
# ]
# ///

Orchestrate the pipeline

llm_fine_tuning_lora_qlora.py
@cpu_env.task(report=True)
async def pipeline(
    model_name: str = "HuggingFaceTB/SmolLM2-135M",
    dataset_name: str = "b-mc2/sql-create-context",
    method: str = "lora",
    epochs: int = 3,
    lr: float = 2e-4,
    batch_size: int = 4,
    max_train_samples: int = 5000,
    max_eval_samples: int = 500,
    num_eval_examples: int = 50,
    lora_r: int = 16,
    lora_alpha: int = 32,
) -> flyte.io.Dir:
    """
    End-to-end LLM fine-tuning pipeline.

    1. Download and format dataset
    2. Fine-tune model (full / LoRA / QLoRA)
    3. Evaluate: before/after comparison on test set

    Returns the fine-tuned model directory so it can be served directly.
    """
    log.info(f"Pipeline: {model_name} | method={method} | dataset={dataset_name}")
    steps = ["Prepare Data", "Train", "Evaluate"]

    method_badge = f'<span class="badge badge-info">{method.upper()}</span>'

    # Step 1: Prepare data
    await flyte.report.replace.aio(
        wrap_report(
            f"<h2>LLM Fine-Tuning Pipeline</h2>"
            f"<h3>{model_name} {method_badge}</h3>"
            f'{pipeline_step_indicator(0, steps)}'
            f'<div class="card"><p>Downloading and formatting dataset: <b>{dataset_name}</b>...</p></div>'
        ),
        do_flush=True,
    )

    data_dir = await prepare_data(dataset_name, max_train_samples, max_eval_samples)

    # Step 2: Train
    await flyte.report.replace.aio(
        wrap_report(
            f"<h2>LLM Fine-Tuning Pipeline</h2>"
            f"<h3>{model_name} {method_badge}</h3>"
            f'{pipeline_step_indicator(1, steps)}'
            f'<div class="card"><p>Training in progress... check the <b>train</b> task report for live charts.</p></div>'
        ),
        do_flush=True,
    )

    finetuned_dir = await train(
        model_name, data_dir, method, epochs, lr, batch_size, lora_r, lora_alpha,
    )

    # Step 3: Evaluate
    await flyte.report.replace.aio(
        wrap_report(
            f"<h2>LLM Fine-Tuning Pipeline</h2>"
            f"<h3>{model_name} {method_badge}</h3>"
            f'{pipeline_step_indicator(2, steps)}'
            f'<div class="card"><p>Evaluating base vs fine-tuned model...</p></div>'
        ),
        do_flush=True,
    )

    result = await evaluate(model_name, finetuned_dir, data_dir, num_eval_examples)
    metrics = json.loads(result)

    # Final pipeline report
    improvement = metrics["improvement"]
    improvement_badge = (
        f'<span class="badge badge-success">+{improvement:.1f}pp</span>'
        if improvement > 0
        else f'<span class="badge badge-danger">{improvement:.1f}pp</span>'
    )

    await flyte.report.replace.aio(
        wrap_report(
            f"<h2>Pipeline Complete</h2>"
            f"<h3>{model_name} {method_badge}</h3>"
            f'{pipeline_step_indicator(3, steps)}'
            f'<div class="stat-grid">'
            f'  <div class="stat"><div class="value">{metrics["base_accuracy"]}%</div><div class="label">Base Accuracy</div></div>'
            f'  <div class="stat"><div class="value">{metrics["finetuned_accuracy"]}%</div><div class="label">Fine-Tuned Accuracy</div></div>'
            f'  <div class="stat"><div class="value">{improvement:+.1f}pp</div><div class="label">Improvement {improvement_badge}</div></div>'
            f'  <div class="stat"><div class="value">{method.upper()}</div><div class="label">Method</div></div>'
            f'  <div class="stat"><div class="value">{epochs}</div><div class="label">Epochs</div></div>'
            f'  <div class="stat"><div class="value">{metrics["num_examples"]}</div><div class="label">Eval Examples</div></div>'
            f'</div>'
            f'<div class="note">'
            f'Check the <b>train</b> task report for training loss/LR charts, '
            f'and the <b>evaluate</b> task report for detailed example comparisons.'
            f'</div>'
        ),
        do_flush=True,
    )

    log.info(f"Pipeline complete. Improvement: {metrics['improvement']:+.1f}pp")
    return finetuned_dir

Run the workflow

Create a HuggingFace token secret if you use a gated base model:

flyte create secret huggingface-token <YOUR_HF_TOKEN>

From the example directory:

cd v2/tutorials/llm_fine_tuning_lora_qlora
uv run --script llm_fine_tuning_lora_qlora.py

Try QLoRA on a GPU:

flyte run llm_fine_tuning_lora_qlora.py pipeline --method qlora --epochs 3

QLoRA requires CUDA; LoRA and full fine-tuning follow the same entry point with different memory requirements.