How Dragonfly scales agentic research across 250k products

Challenge
Building dynamic agentic workflows at production scale.
When journalists ask Sven Sabas, cofounder of Dragonfly, why users can't "just Google" to find the right software tools, he smiles. It's exactly the problem Dragonfly set out to solve.
Choosing the right software doesn’t happen with Google or ChatGPT. Building the right tech stack requires evaluating tools in context, including integrations, budget constraints, internal capabilities, and regulatory requirements. Additionally, public information is often misleading or outdated, and the challenge only grows as new software products proliferate.
Dragonfly is building an automated solutions architect, an AI-powered recommender engine to help businesses construct modern, AI-native tech stacks. At its core is a living knowledge graph of 250,000+ software products, their features, and interrelations.
But orchestrating a swarm of intelligent agents to research and test 250,000 products is a monumental task. The platform performs deep research that adapts its strategy based on what it discovers to construct the knowledge graph. As of now, each of the 250,000 products requires ~190 steps and ~95 LLM calls, but the path is different every time.
"We tried static DAGs initially to orchestrate this, but it wasn't dynamic enough," explains Sabas, Dragonfly's co-founder. Static DAGs couldn't handle agentic loops that adapt based on intermediate results. Their initial approach, internally-developed scaffolding on AWS Bedrock, became an "uncomfortable risk" they couldn't maintain with a small team.
They needed orchestration that thinks like an agent but runs like enterprise infrastructure.
The Orchestration Dilemma
Consider how Dragonfly's research engine actually works. Say you ask it to find information about a startup that recently rebranded (we’ll use Dragonfly itself as an example):
- Search attempt 1: "Dragonfly Connect" → No relevant results (name changed)
- Search attempt 2: Look for the founder, Sven Sabas → Find LinkedIn profile
- Search attempt 3: Check Sven's previous company → Discover connection
- Search attempt 4-N: Iteratively refine until information converges
Now multiply this by 250,000 products, each requiring different question patterns depending on category (a CRM needs different research than cloud infrastructure), and run hundreds of these workflows concurrently. Some queries overlap (multiple research paths might need the same API documentation), so you need intelligent caching to avoid redundant $2-3 API calls.
The requirements were clear. The team needed an AI development platform with:
- Dynamic orchestration: Pure Python async patterns with loops and conditionals. No rigid DAG structure
- Intelligent caching: Detect when parallel runs converge to the same sources and stop redundant work
- Fault tolerance: Recover from minute 28 of a 30-minute run. Don’t restart from scratch
- Cost control: Make Spot Instances viable with checkpointing; reuse containers to eliminate cold-start overhead
- Speed to market: Small team can't afford to debug Kubernetes instead of improving agents
Dragonfly chose Union.ai, the enterprise Flyte platform, to provide an end-to-end surface to orchestrate, deploy, and operate their agentic system.
“We wouldn't have been able to do what we wanted without Flyte and Union.ai.”

Sven Sabas
Cofounder, Dragonfly
Solution
Union.ai delivered Python-native dynamic orchestration with enterprise-grade durability.
One Hour to Production
When Union.ai and Dragonfly engineers met to discuss the challenge, Dragonfly had existing agentic code that worked but didn't scale. The teams set out to port it to Flyte 2 remote execution.
Within just an hour, they had wrapped the async Python in Flyte decorators and configured remote execution on GKE.
The Architecture: Tiered Task Environments for Scale
To handle the complexity of their research workflows, Dragonfly built a tiered task environment architecture using reusable containers. The system has four specialized layers:
- Driver (4 replicas): Lightweight orchestration that coordinates the overall research workflow
- Coordinator (8 replicas): Decision-making layer that plans research strategies and determines next steps
- Research Assistant (12 replicas): Heavy LLM inference for synthesizing findings and generating insights
- Tool Layer (12 replicas): I/O-bound operations including API calls and web scraping
Why this pattern matters:
The Tool Layer throttles external API calls to stay within LLM rate limits, giving Dragonfly control over compute scheduling rather than being throttled by hyperscaler priority queues. Each tier has specialized dependencies and resource allocations, right-sized for its workload.
But the real win is reusable containers. Agentic tasks are often quick, making cold-start overhead devastating at scale. Keeping containers warm eliminates this waste across thousands of concurrent workflows.
Other game-changing features:
Cross-Run Caching
When parallel research runs both need the same documentation page, Flyte's caching ensures the first query fetches and pays for the API call while subsequent queries reuse the cached result.
Dragonfly built convergence detection on top of this: when parallel runs show 60%+ overlap in the information they've retrieved, their coordinator logic consolidates work streams, stopping redundant research before wasting money on duplicate LLM calls.
At $2-3 per research run × 250K products, the impact of this is enormous.
Checkpoint-Based Recovery
Research runs take tens of minutes. With Flyte's checkpointing, a Spot Instance interruption at step 187 of 190 means resuming from step 187, not restarting from scratch. This made aggressive Spot usage viable for cost optimization.
Full Auditability
Every agent decision is traceable with Union.ai: "Why did it search for X instead of Y?" This turned debugging from guesswork into systematic improvement. When agents do the wrong thing first, you need to understand why to iterate quickly.
failure recovery time
development velocity
saved in infrastructure management
Results
Dragonfly’s agentic system achieved production scale in months, not years.
In partnership with Union.ai, Dragonfly scaled to production level significantly faster. The team was able to deliver a platform that "wouldn't have been possible" without Union.ai for Flyte.
- 1 hour: From prototype to production-ready remote workflows
- 2,000 concurrent workflows: Scaled from single workflow to production-scale agentic research
- 50 coordinators × 1,000 actions validated: Testing confirms ability to scale to 250K products in continuous refresh
Dragonfly scaled from single workflows to 2,000 concurrent production runs, with plans to process all 250K products in continuous refresh cycles. Current testing validates 50 concurrent research coordinators × 1,000 actions each.
Dragonfly’s success demonstrates what’s possible when agentic systems are built on an AI development infrastructure platform like Union.ai. By leveraging Union.ai and Flyte’s dynamic, runtime orchestration, Dragonfly moved beyond static pipelines to build agents that can reason, adapt, and scale in production.


