# Reinforcement learning

Tutorials for reinforcement-learning workloads.

### [RL for LLMs with GRPO and LoRA](https://www.union.ai/docs/v2/union/tutorials/reinforcement-learning/grpo-lora/page.md)

Train a reasoning-style RL loop for an LLM with GRPO and LoRA, orchestrated by plain Flyte async tasks on a warm vLLM pool.

## Subpages

- [RL for LLMs with GRPO and LoRA](https://www.union.ai/docs/v2/union/tutorials/reinforcement-learning/grpo-lora/page.md)
  - Why Flyte with Union fits RL training
  - The idea: RL for LLMs in one loop
  - What is GRPO?
  - Why LoRA?
  - How the work is laid out
  - Warm-pool topology
  - Getting started
  - The image and environments
  - The model weights: prefetch once
  - Walkthrough
  - Rollouts on a warm vLLM pool
  - Reward
  - Pipelining generation and reward
  - The GRPO update
  - The driver loop
  - The live report
  - What this validates
  - Going further

---
**Source**: https://github.com/unionai/unionai-docs/blob/main/content/tutorials/reinforcement-learning/_index.md
**HTML**: https://www.union.ai/docs/v2/union/tutorials/reinforcement-learning/
