Inference
Ultra-low latency, <100ms task startup, dynamic scaling. Realtime or batch workloads.
↓96%
iteration time
50k+
actions/run
<100ms
latency
“We get significant cost efficiency from running [...] AI inference on TPUs. Having the ability to scale dynamically—to go from zero to 500 TPUs across four regions—is unique and highly valuable. We get that from Union.ai, and I don’t know who else could give us that.”

Greg Friedland
Principal ML Engineer, Rezo








