Inference

Tutorials for serving models and building inference applications as Flyte apps.

Voice customer-service agent

Serve an LLM with vLLM and a browser voice UI as two composed Flyte apps, with switchable text-to-speech and a live latency comparison.

LLM-optimized

This page Full docs index

On this page