Serving
Union.ai enables you to implement serving in various contexts:
- High throughput batch inference with NIMs, vLLM, and Actors
 - Low latency online inference using frameworks vLLM, SGLang.
 - Web endpoints using frameworks like FastAPI and Flask.
 - Interactive web apps using your favorite Python-based front-end frameworks like Streamlit, Gradio, and more.
 - Edge inference using MLC-LLM.
 
In this section, we will see examples demonstrating how to implement serving in these contexts using constructs like Union Actors, Serving Apps, and Artifacts.