Cradle accelerates ML development for protein design with Flyte

Challenge
Scaling protein design workflows required a stronger orchestration engine.
Cradle is a leader in synthetic biology, building ML-driven tools that dramatically accelerate protein design. But as the team grew its platform, the early engineering approach — a mix of scripts, glue code, and manual processes — began to break down. This made it difficult to implement essential ML development best practices such as reproducibility, isolation of resources and dependencies, caching, and clear data provenance.
These capabilities were mission-critical. Protein design iteration cycles can span six weeks to six months, and the team needed the ability to look back and trace exactly which workflow version produced which experimental results. Long-running, expensive experiments also meant that failures or missing cache caused unacceptable re-computation costs.
Cradle’s workflows also depended on legacy protein engineering tools that needed to run unchanged for benchmarking. The team required an orchestrator that could integrate these tools seamlessly rather than force a migration.
After evaluating managed ML platforms such as Vertex AI and SageMaker — which lacked the Kubernetes-native flexibility Cradle needed — and open-source tools like Argo, Kubeflow, and Airflow — which didn’t meet requirements around reproducibility, typing, local execution, and caching — Flyte emerged as the only solution that met both immediate needs and long-term ambitions.
“Without Flyte, we couldn't have done what we've done so far with the people that we have. You need a workflow orchestration engine if you're going to do ML at our level, and Flyte is the best one.”

Eli Bixby
Co-Founder
Solution
Flyte delivered scalable, reproducible, and flexible ML orchestration.
Cradle deployed Flyte on Google Kubernetes Engine and quickly unlocked value across its ML development lifecycle.
Flyte’s strong guarantees around versioning, lineage, and reproducibility allowed Cradle to trace every experiment from input to output, including all code, container images, parameters, and intermediate states. This rigor was essential for long biological feedback loops — enabling the team to compare new lab data against historical model results and diagnose weaknesses.
Flyte’s robust caching capabilities replaced Cradle’s brittle, manually implemented caching logic. With strongly typed I/O, automatic data offloading, and code hashing, Flyte ensured cache accuracy while keeping the developer experience simple. As Cradle’s co-founder Eli Bixby shared: “If you’re manually doing the caching as part of every step, the way you write code changes, and it’s not clean. And that magnifies the number of outputs in your interface quite a lot.”
Cradle also leveraged Flyte’s task-level dependency isolation and PodTemplates to run specialized workloads. For example, a C++ tool performing terabyte-scale in-memory scans automatically ran on a high-memory node that scaled to zero when unused — maximizing efficiency without any manual node management.
Flyte’s static typing enforced best practices and significantly reduced workflow errors. Engineers who previously didn’t rely on type hints came to appreciate the determinism this discipline introduced: workflows compiled cleanly, dataflow mismatches surfaced early, and complex pipelines became easier to maintain.
Through Flyte’s modularity, Cradle created internal patterns like reference tasks to promote reuse, speed up Docker builds, reduce dependency conflicts, and simplify overall workflow authoring.
Together, these capabilities gave Cradle a platform that was both opinionated enough to help them move fast and flexible enough to support any workload across hybrid cloud, on-prem, and customer environments.
faster ML development cycles (from days to hours)
significant reduction in compute costs through granular caching
rework required for legacy protein engineering tools
Results
Flyte accelerated development and strengthened scientific rigor.
With Flyte, Cradle accelerated its ML development cycle from three days to just three hours — a foundational shift enabling more rapid experimentation and model refinement.
Caching eliminated unnecessary recomputation and ensured workflows could resume seamlessly after any failure. Strong provenance and reproducibility strengthened Cradle’s scientific discipline, allowing the team to confidently revisit historical experiments even months later.
Flyte’s declarative infrastructure, typing system, and developer-friendly tooling simplified pipeline authoring and reduced operational overhead. Engineers could focus on protein design research rather than infrastructure plumbing.
“Without Flyte, we couldn't have done what we've done so far with the people that we have. You need a workflow orchestration engine if you're going to do ML at our level, and Flyte is the best one.” — Eli Bixby, Co-Founder
After nearly 18 months on Flyte, Cradle continues to deepen its adoption, citing both the platform’s capabilities and the responsiveness of the Flyte community as core enablers. Flyte remains the orchestration engine powering Cradle’s mission to push the boundaries of biology and AI.


