Wayve accelerates autonomous driving R&D with Flyte’s scalable orchestration

Industry

Autonomous Systems

Use Cases

Data Processing
Model Training
Inference

Challenge

Wayve needed scalable orchestration to support end-to-end AV workloads.

Wayve is a leader in end-to-end embodied AI for assisted and autonomous driving. Their work spans massive datasets, GPU-intensive training, offline labeling, neural simulators (like Ghost Gym), and large-scale inference workloads. These pipelines must run reliably across large clusters while enabling rapid iteration for research teams.

Before adopting Flyte, Wayve evaluated more than a dozen orchestration solutions. They needed a system that could:

  • Launch and manage thousands of data and ML workflows
  • Scale GPU workloads efficiently across cloud clusters
  • Provide strong observability, reproducibility, and caching
  • Support heterogeneous tasks (Python, Spark, containerized jobs)
  • Allow researchers to move quickly without deep Kubernetes expertise

Flyte’s strong Kubernetes foundation, flexible APIs, and focus on reproducibility aligned with Wayve’s technical stack and R&D needs—enabling multi-cluster deployments, scalable experiment execution, and simplified workflow development.

“The ability to massively parallelise computations accelerates our development lifecycle. Researchers focus on the science rather than learning compute frameworks.”

Tom Newton

Software Engineer, Wayve

Solution

Flyte enabled massive parallelism, faster iteration, and efficient resource use.

Flyte is now the orchestration engine powering Wayve’s large-scale experiments and production pipelines. It manages workflows for data processing, offline labeling, embedding generation, dataset materialization, model training, and neural simulator inference.

Researchers benefit from Flyte’s abstracted infrastructure: they write Python functions while Flyte handles Kubernetes, scheduling, type safety, versioning, caching, and large-scale parallelism.

Key capabilities delivering value include:

  • Massively parallel map tasks for tens of thousands of distributed operations
  • Caching and versioning to reduce recomputation and cost
  • Multi-cluster deployments with meaningful error surfacing (OOM, resource limits)
  • Heterogeneous task support (Spark, Python, container tasks) with isolated dependencies
  • High reliability and automated retries, crucial for mission-critical AV pipelines

Wayve uses Flyte to resimulate past data—an essential technique for improving robustness. Resimulation workflows run at scale thanks to Flyte’s parallel execution and efficient compute allocation.

Flyte’s embedding workflows also enable scene summarization for similarity search, scenario classification, and dataset curation—core capabilities for modern AV research.

Flyte’s modularity and task isolation give Wayve reproducibility, cost efficiency, and simpler scaling. Wayve has even contributed upstream improvements to Flyte’s resilience and performance based on their enterprise-scale usage.

10000
+

parallel map tasks executed at once

weeks-to-days reduction in pipeline development time

significantly faster neural simulator and NeRF processing

Results

Flyte accelerated R&D velocity and scaled Wayve’s core AV workflows.

With Flyte, Wayve unlocked faster experimentation, higher reliability, and more efficient resource usage across its AV R&D pipelines.

A labeling pipeline that previously took weeks to build and deploy was delivered in a few days using Flyte’s modular workflows. Neural simulators and NeRF-based workloads scaled to new levels of parallelism—achieving speeds and reliability that were impossible before.

Flyte also improved operational visibility, allowing teams to quickly respond to failure states and optimize compute usage. Workflow modularity and dependency isolation reduced conflicts, improved task reliability, and minimized resource waste.

Flyte has become a core part of Wayve’s AI development infrastructure, enabling research teams to iterate faster, explore more data, scale simulations, and deliver production-grade AV systems with greater efficiency.