LLM fine-tuning with LoRA and QLoRA
Code available here.
This tutorial fine-tunes a language model for SQL generation using three methods in one workflow: full fine-tuning, LoRA adapters, and QLoRA (4-bit quantized base + LoRA). The pipeline prepares an instruction dataset from HuggingFace, trains with
TRL SFTTrainer, evaluates against a base-model baseline, and streams training charts into Flyte reports.
Flyte provides:
- GPU training with live loss and learning-rate charts via
report=True. - Method switching through a single
methodparameter (full,lora, orqlora). - Cached dataset preparation for fast iteration on hyperparameters.
Define the task environments
The GPU environment declares a HuggingFace token secret for gated models.
import os
main_img = flyte.Image.from_uv_script(__file__, name="llm-fine-tuning-lora-qlora", pre=True)
gpu_env = flyte.TaskEnvironment(
name="llm-fine-tuning-lora-qlora-gpu",
image=main_img,
resources=flyte.Resources(cpu=4, memory="24Gi", gpu=1),
secrets=[flyte.Secret(key="huggingface-token", as_env_var="HF_TOKEN")],
)
cpu_env = flyte.TaskEnvironment(
name="llm-fine-tuning-lora-qlora-cpu",
image=main_img,
resources=flyte.Resources(cpu=2, memory="8Gi"),
depends_on=[gpu_env],
)
HF_TOKEN = os.environ.get("HF_TOKEN")
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "flyte>=2.4.0",
# "torch>=2.1.0",
# "transformers>=4.45.0",
# "peft>=0.13.0",
# "trl>=0.12.0",
# "bitsandbytes>=0.44.0",
# ...
# ]
# ///Orchestrate the pipeline
@cpu_env.task(report=True)
async def pipeline(
model_name: str = "HuggingFaceTB/SmolLM2-135M",
dataset_name: str = "b-mc2/sql-create-context",
method: str = "lora",
epochs: int = 3,
lr: float = 2e-4,
batch_size: int = 4,
max_train_samples: int = 5000,
max_eval_samples: int = 500,
num_eval_examples: int = 50,
lora_r: int = 16,
lora_alpha: int = 32,
) -> flyte.io.Dir:
"""
End-to-end LLM fine-tuning pipeline.
1. Download and format dataset
2. Fine-tune model (full / LoRA / QLoRA)
3. Evaluate: before/after comparison on test set
Returns the fine-tuned model directory so it can be served directly.
"""
log.info(f"Pipeline: {model_name} | method={method} | dataset={dataset_name}")
steps = ["Prepare Data", "Train", "Evaluate"]
method_badge = f'<span class="badge badge-info">{method.upper()}</span>'
# Step 1: Prepare data
await flyte.report.replace.aio(
wrap_report(
f"<h2>LLM Fine-Tuning Pipeline</h2>"
f"<h3>{model_name} {method_badge}</h3>"
f'{pipeline_step_indicator(0, steps)}'
f'<div class="card"><p>Downloading and formatting dataset: <b>{dataset_name}</b>...</p></div>'
),
do_flush=True,
)
data_dir = await prepare_data(dataset_name, max_train_samples, max_eval_samples)
# Step 2: Train
await flyte.report.replace.aio(
wrap_report(
f"<h2>LLM Fine-Tuning Pipeline</h2>"
f"<h3>{model_name} {method_badge}</h3>"
f'{pipeline_step_indicator(1, steps)}'
f'<div class="card"><p>Training in progress... check the <b>train</b> task report for live charts.</p></div>'
),
do_flush=True,
)
finetuned_dir = await train(
model_name, data_dir, method, epochs, lr, batch_size, lora_r, lora_alpha,
)
# Step 3: Evaluate
await flyte.report.replace.aio(
wrap_report(
f"<h2>LLM Fine-Tuning Pipeline</h2>"
f"<h3>{model_name} {method_badge}</h3>"
f'{pipeline_step_indicator(2, steps)}'
f'<div class="card"><p>Evaluating base vs fine-tuned model...</p></div>'
),
do_flush=True,
)
result = await evaluate(model_name, finetuned_dir, data_dir, num_eval_examples)
metrics = json.loads(result)
# Final pipeline report
improvement = metrics["improvement"]
improvement_badge = (
f'<span class="badge badge-success">+{improvement:.1f}pp</span>'
if improvement > 0
else f'<span class="badge badge-danger">{improvement:.1f}pp</span>'
)
await flyte.report.replace.aio(
wrap_report(
f"<h2>Pipeline Complete</h2>"
f"<h3>{model_name} {method_badge}</h3>"
f'{pipeline_step_indicator(3, steps)}'
f'<div class="stat-grid">'
f' <div class="stat"><div class="value">{metrics["base_accuracy"]}%</div><div class="label">Base Accuracy</div></div>'
f' <div class="stat"><div class="value">{metrics["finetuned_accuracy"]}%</div><div class="label">Fine-Tuned Accuracy</div></div>'
f' <div class="stat"><div class="value">{improvement:+.1f}pp</div><div class="label">Improvement {improvement_badge}</div></div>'
f' <div class="stat"><div class="value">{method.upper()}</div><div class="label">Method</div></div>'
f' <div class="stat"><div class="value">{epochs}</div><div class="label">Epochs</div></div>'
f' <div class="stat"><div class="value">{metrics["num_examples"]}</div><div class="label">Eval Examples</div></div>'
f'</div>'
f'<div class="note">'
f'Check the <b>train</b> task report for training loss/LR charts, '
f'and the <b>evaluate</b> task report for detailed example comparisons.'
f'</div>'
),
do_flush=True,
)
log.info(f"Pipeline complete. Improvement: {metrics['improvement']:+.1f}pp")
return finetuned_dir
Run the workflow
Create a HuggingFace token secret if you use a gated base model:
flyte create secret huggingface-token <YOUR_HF_TOKEN>From the example directory:
cd v2/tutorials/llm_fine_tuning_lora_qlora
uv run --script llm_fine_tuning_lora_qlora.pyTry QLoRA on a GPU:
flyte run llm_fine_tuning_lora_qlora.py pipeline --method qlora --epochs 3QLoRA requires CUDA; LoRA and full fine-tuning follow the same entry point with different memory requirements.