Announcing Durable Workflow for Agents

AI agents are great at starting work. Production systems need them to finish it.

Today, Diagrid is announcing Dapr durable workflow integrations for leading AI agent frameworks giving developers a way to automatically recover long-running agent workflows after failures without sacrificing the agent framework they are using.

Instead of restarting from the beginning or writing complex recovery logic, teams can make their agents resume execution from the exact point of interruption and reliably complete workflows in production.

As organizations move AI agents from experimentation into real business systems, reliability becomes a fundamental requirement.

The Reliability Gap in AI Agent Frameworks

Frameworks like LangGraph, OpenAI Agents SDK, CrewAI, AWS Strands and Microsoft Agent Framework have made it significantly easier to build powerful agent workflows.

However, most frameworks focus primarily on orchestration and reasoning, not production reliability.

Real-world AI agent workflows often involve:

Multi-step reasoning and planning
Tool and API calls
Multi-agent collaboration
Long-running tasks
Integration with enterprise systems

These workflows frequently run for minutes or even hours.

When a failure occurs such as a network interruption, infrastructure issue, or tool error the workflow often stops completely. Developers must restart the entire workflow or implement custom retry and recovery logic.

Many frameworks offer checkpointing, but checkpointing alone does not guarantee recovery. Saving some agent workflow state periodically is not the same as ensuring that an interrupted workflow automatically resumes and completes execution.

For production AI agents, durable workflow is essential.

Up until now, developers had to choose between a workflow engine for reliability which meant giving up their agent framework and reinventing reasoning, loops and eval from scratch; or using the agent framework's basic checkpoint mechanism which offloaded all the hard work of failure detection and recovery at scale to them.

Durable Workflow for Leading AI Agent Frameworks

With this release, Diagrid introduces first-class durable workflow support across many of the most widely used AI agent frameworks, enabling developers to add reliability without changing their development stack.

Supported frameworks include:

LangGraph
Dapr Agents
Microsoft Agent Framework
AWS Strands
Google Agent Development Kit (ADK)
OpenAI Agents SDK
CrewAI
Pydantic AI Agents
LangChain4J - Coming soon

Developers can include the Diagrid package in their agent applications to enable automatic workflow recovery.

If a failure occurs during execution, Diagrid resumes the workflow from the exact step where it stopped, ensuring that agents continue running until the workflow completes.

This dramatically reduces operational complexity while improving reliability for production AI systems.

The workflow engine is built on the battle-tested Dapr open-source project, used in production by companies including Grafana, NVIDIA, HSBC, HDFC and thousands of others.

Securing Agent Tools with MCP Identity

AI agents increasingly rely on Model Context Protocol (MCP) servers to access tools and external services.

Without proper security controls, agents may be able to call tools they should not access.

Diagrid introduces identity and access control for both agents and MCP servers, enabling:

mTLS-secured communication
Agent authentication
Fine-grained authorization policies

This enables a zero-trust security model for agent tools, ensuring only authorized agents can access specific MCP servers.

Enabling Production-Ready AI Agents

As agentic architectures become more common, organizations need infrastructure that supports reliable AI systems.

Diagrid provides the capabilities required to operate production AI agents, including:

Durable workflows for long-running agents
Observability into agent behavior and decisions
Secure tool access with MCP identity and authorization
Deployment inside customer infrastructure for governance and data locality

By bringing durable workflow to popular AI frameworks, Diagrid helps developers move from AI prototypes to production-grade agent systems.

Get Started

Agent durable workflow support and MCP Server security is available in Diagrid Catalyst.

You can sign up for free and dive into our agent quickstarts with your favorite agent framework. Or check out how to launch and orchestrate multiple agents with different frameworks.

If you want to deploy Diagrid Catalyst Enterprise for a free trial in your own infrastructure, reach out to us.

`, "still-not-durable-how-microsoft-agent-framework-and-strands-agents-repeat-the-same-mistake":

AI agents are great at starting work. Production systems need them to finish it.

As organizations move AI agents from experimentation into real business systems, reliability becomes a fundamental requirement.

The Reliability Gap in AI Agent Frameworks

Frameworks like LangGraph, OpenAI Agents SDK, CrewAI, AWS Strands and Microsoft Agent Framework have made it significantly easier to build powerful agent workflows.

However, most frameworks focus primarily on orchestration and reasoning, not production reliability.

Real-world AI agent workflows often involve:

Multi-step reasoning and planning
Tool and API calls
Multi-agent collaboration
Long-running tasks
Integration with enterprise systems

These workflows frequently run for minutes or even hours.

For production AI agents, durable workflow is essential.

Durable Workflow for Leading AI Agent Frameworks

Supported frameworks include:

LangGraph
Dapr Agents
Microsoft Agent Framework
AWS Strands
Google Agent Development Kit (ADK)
OpenAI Agents SDK
CrewAI
Pydantic AI Agents
LangChain4J - Coming soon

Developers can include the Diagrid package in their agent applications to enable automatic workflow recovery.

If a failure occurs during execution, Diagrid resumes the workflow from the exact step where it stopped, ensuring that agents continue running until the workflow completes.

This dramatically reduces operational complexity while improving reliability for production AI systems.

The workflow engine is built on the battle-tested Dapr open-source project, used in production by companies including Grafana, NVIDIA, HSBC, HDFC and thousands of others.

Securing Agent Tools with MCP Identity

AI agents increasingly rely on Model Context Protocol (MCP) servers to access tools and external services.

Without proper security controls, agents may be able to call tools they should not access.

Diagrid introduces identity and access control for both agents and MCP servers, enabling:

mTLS-secured communication
Agent authentication
Fine-grained authorization policies

This enables a zero-trust security model for agent tools, ensuring only authorized agents can access specific MCP servers.

Enabling Production-Ready AI Agents

As agentic architectures become more common, organizations need infrastructure that supports reliable AI systems.

Diagrid provides the capabilities required to operate production AI agents, including:

Durable workflows for long-running agents
Observability into agent behavior and decisions
Secure tool access with MCP identity and authorization
Deployment inside customer infrastructure for governance and data locality

By bringing durable workflow to popular AI frameworks, Diagrid helps developers move from AI prototypes to production-grade agent systems.

Get Started

Agent durable workflow support and MCP Server security is available in Diagrid Catalyst.

You can sign up for free and dive into our agent quickstarts with your favorite agent framework. Or check out how to launch and orchestrate multiple agents with different frameworks.

If you want to deploy Diagrid Catalyst Enterprise for a free trial in your own infrastructure, reach out to us.

`, "still-not-durable-how-microsoft-agent-framework-and-strands-agents-repeat-the-same-mistake":

Announcing Durable Workflow for Agents

The Reliability Gap in AI Agent Frameworks

Durable Workflow for Leading AI Agent Frameworks

Securing Agent Tools with MCP Identity

Enabling Production-Ready AI Agents

Get Started

Ready to Go to Production?

Related Articles

Make Your Agent Production-Smart with Automatic Web Context

Webinar Recap: Who Let the Agents Out? Your client_id Is Not An Identity

Who Let the Agents Out? Your client_id Is Not An Identity

Announcing Durable Workflow for Agents

The Reliability Gap in AI Agent Frameworks

Durable Workflow for Leading AI Agent Frameworks

Securing Agent Tools with MCP Identity

Enabling Production-Ready AI Agents

Get Started

Ready to Go to Production?

Related Articles

Make Your Agent Production-Smart with Automatic Web Context

Webinar Recap: Who Let the Agents Out? Your client_id Is Not An Identity

Who Let the Agents Out? Your client_id Is Not An Identity