Reinforcement learning

Tutorials for reinforcement-learning workloads.

RL for LLMs with GRPO and LoRA

Train a reasoning-style RL loop for an LLM with GRPO and LoRA, orchestrated by plain Flyte async tasks on a warm vLLM pool.

LLM-optimized

This page Full docs index

On this page