# Serving

Union.ai enables you to implement serving in various contexts:

- High throughput batch inference with NIMs, vLLM, and Actors
- Low latency online inference using frameworks vLLM, SGLang.
- Web endpoints using frameworks like FastAPI and Flask.
- Interactive web apps using your favorite Python-based front-end frameworks like
  Streamlit, Gradio, and more.
- Edge inference using MLC-LLM.

In this section, we will see examples demonstrating how to implement serving
in these contexts using constructs like Union Actors, Serving Apps, and
Artifacts.

## Subpages

- [Deploy Custom Webhooks to Launch Workflows](https://www.union.ai/docs/v1/union/tutorials/serving/custom-webhooks/page.md)
- [Deploy Marimo Notebooks as WASM Applications](https://www.union.ai/docs/v1/union/tutorials/serving/marimo-wasm/page.md)
- [Finetuning a Reasoning LLM with Unsloth and Serving with vLLM](https://www.union.ai/docs/v1/union/tutorials/serving/finetune-unsloth-serve/page.md)
- [Serve your LLM with MAX Serve](https://www.union.ai/docs/v1/union/tutorials/serving/modular-max-qwen/page.md)
- [Add Tracing and Guardrails to an Airbnb RAG App with Weave](https://www.union.ai/docs/v1/union/tutorials/serving/weave/page.md)
- [Trace and Evaluate Models and RAG Apps with Arize](https://www.union.ai/docs/v1/union/tutorials/serving/arize/page.md)
- [Deploying a Fine-Tuned Llama Model to an iOS App with MLC-LLM](https://www.union.ai/docs/v1/union/tutorials/serving/llama_edge_deployment/page.md)
- [Serve vLLM on Union Actors for Named Entity Recognition](https://www.union.ai/docs/v1/union/tutorials/serving/vllm-serving-on-actor/page.md)
- [Serve NVIDIA NIM Models with Union Actors](https://www.union.ai/docs/v1/union/tutorials/serving/nim-on-actor/page.md)

---
**Source**: https://github.com/unionai/unionai-docs/blob/main/content/tutorials/serving/_index.md
**HTML**: https://www.union.ai/docs/v1/union/tutorials/serving/
