AI Agents Fail in Production. Here's Why State Management Matters
Mark Fussell, Co-creator of Dapr, explains how Dapr Agents 1.0 solves Day 2 operational challenges for running AI agents at scale in production on Kubernetes.
Mark Fussell
CEO & Co-Founder
Most AI agent prototypes never make it to production. The reason? They fail spectacularly when networks drop, machines crash, or state gets lost mid-transaction. Imagine processing a Stripe payment, the system crashes, and your workflow restarts — charging the customer twice. That's the reliability gap killing enterprise AI adoption today.
In this interview with Swapnil Bhartiya, Mark Fussell, Co-creator and Core Maintainer of Dapr, explains how Dapr Agents 1.0 solves the Day 2 operational nightmare of running AI agents at scale. Built on Dapr's durable workflow engine and battle-tested in Kubernetes environments, this CNCF graduated project provides the recovery guarantees that microservices-plus-LLM architectures desperately need.
Key topics covered:
- Durable execution patterns for stateful AI workflows with automatic crash recovery and checkpoint logging
- How Dapr's workflow engine prevents duplicate transactions and data loss during network failures in distributed agent systems
- Production deployment strategies for agentic applications on Kubernetes with vendor-neutral, multi-state store flexibility
- Real-world case study: Zeiss Vision Care using Dapr Agents for personalized prescription glass manufacturing workflows
- The evolution from microservices to agentic applications and why workflow reliability is the new competitive advantage
Read the full story and transcript at www.tfir.io


