Dapr University is live.Explore The Free Courses
Diagrid
Back to Blog

Announcing Durable Workflow for Agents

Diagrid introduces durable workflow support for leading AI agent frameworks, allowing agents to automatically recover from failures and complete long-running workflows in production.

Mark Fussell

Mark Fussell

CEO & Co-Founder

Yaron Schneider

Yaron Schneider

CTO & Co-Founder

March 12, 20264 min read

AI agents are great at starting work. Production systems need them to finish it.

Today, Diagrid is announcing Dapr durable workflow integrations for leading AI agent frameworks giving developers a way to automatically recover long-running agent workflows after failures without sacrificing the agent framework they are using.

Instead of restarting from the beginning or writing complex recovery logic, teams can make their agents resume execution from the exact point of interruption and reliably complete workflows in production.

As organizations move AI agents from experimentation into real business systems, reliability becomes a fundamental requirement.

The Reliability Gap in AI Agent Frameworks

Frameworks like LangGraph, OpenAI Agents SDK, CrewAI, AWS Strands and Microsoft Agent Framework have made it significantly easier to build powerful agent workflows.

However, most frameworks focus primarily on orchestration and reasoning, not production reliability.

Real-world AI agent workflows often involve:

  • Multi-step reasoning and planning
  • Tool and API calls
  • Multi-agent collaboration
  • Long-running tasks
  • Integration with enterprise systems

These workflows frequently run for minutes or even hours.

When a failure occurs such as a network interruption, infrastructure issue, or tool error the workflow often stops completely. Developers must restart the entire workflow or implement custom retry and recovery logic.

Many frameworks offer checkpointing, but checkpointing alone does not guarantee recovery. Saving some agent workflow state periodically is not the same as ensuring that an interrupted workflow automatically resumes and completes execution.

For production AI agents, durable workflow is essential.

Up until now, developers had to choose between a workflow engine for reliability which meant giving up their agent framework and reinventing reasoning, loops and eval from scratch; or using the agent framework's basic checkpoint mechanism which offloaded all the hard work of failure detection and recovery at scale to them.

Durable Workflow for Leading AI Agent Frameworks

With this release, Diagrid introduces first-class durable workflow support across many of the most widely used AI agent frameworks, enabling developers to add reliability without changing their development stack.

Supported frameworks include:

  • LangGraph
  • Dapr Agents
  • Microsoft Agent Framework
  • AWS Strands
  • Google Agent Development Kit (ADK)
  • OpenAI Agents SDK
  • CrewAI
  • Pydantic AI Agents
  • LangChain4J - Coming soon

Developers can include the Diagrid package in their agent applications to enable automatic workflow recovery.

If a failure occurs during execution, Diagrid resumes the workflow from the exact step where it stopped, ensuring that agents continue running until the workflow completes.

This dramatically reduces operational complexity while improving reliability for production AI systems.

The workflow engine is built on the battle-tested Dapr open-source project, used in production by companies including Grafana, NVIDIA, HSBC, HDFC and thousands of others.

Securing Agent Tools with MCP Identity

AI agents increasingly rely on Model Context Protocol (MCP) servers to access tools and external services.

Without proper security controls, agents may be able to call tools they should not access.

Diagrid introduces identity and access control for both agents and MCP servers, enabling:

  • mTLS-secured communication
  • Agent authentication
  • Fine-grained authorization policies

This enables a zero-trust security model for agent tools, ensuring only authorized agents can access specific MCP servers.

Enabling Production-Ready AI Agents

As agentic architectures become more common, organizations need infrastructure that supports reliable AI systems.

Diagrid provides the capabilities required to operate production AI agents, including:

  • Durable workflows for long-running agents
  • Observability into agent behavior and decisions
  • Secure tool access with MCP identity and authorization
  • Deployment inside customer infrastructure for governance and data locality

By bringing durable workflow to popular AI frameworks, Diagrid helps developers move from AI prototypes to production-grade agent systems.

Get Started

Agent durable workflow support and MCP Server security is available in Diagrid Catalyst.

You can sign up for free and dive into our agent quickstarts with your favorite agent framework. Or check out how to launch and orchestrate multiple agents with different frameworks.

If you want to deploy Diagrid Catalyst Enterprise for a free trial in your own infrastructure, reach out to us.