There are many reasons why Kubernetes is a popular container runtime platform for distributed applications. One of these reasons is the portability and flexibility that it provides to IT architects. However, the difficulties of service discovery, infrastructure reliability, and security are known challenges that result from these benefits. From challenges, opportunities are created, and many tools have risen to mitigate common problems faced by teams that benefit from containerized applications on Kubernetes.
A service mesh is a pattern that aims to mitigate some of these challenges when architecting an application on Kubernetes. By providing a dedicated service layer to facilitate service discovery and how applications share information with each other, they provide security, tracing, monitoring, and traffic control.
Dapr, which stands for Distributed Application Runtime, is an open-source, portable, event-driven runtime designed to make it easier for developers to build resilient, stateless, and stateful microservice applications. Similar to service meshes, Dapr provides features such as discoverability, service-to-service secure communication, and distributed tracing.
These overlapping features often raise questions about when you should choose Dapr or a service mesh when trying to make your distributed system architecture more robust. This is a multi-part article where the first part unpacks this decision, but instead of focusing on a single tool for the job, we will dive deeper into the two technologies, understanding where they overlap and, most importantly, where their strengths can be used together to achieve your microservices goals.
The TLDR is: Kubernetes is a great platform for distributed applications. The challenges that come with portability and flexibility can be solved by using both Service Meshes and Dapr to make your architecture robust, resilient, and secure. In some cases, where you require capabilities that are unique to both, you will find it useful to leverage both Dapr alongside a Service Mesh, or you might find that the security, observability and resiliency features of Dapr alone are enough.
Common Foundations: The Sidecar Pattern
Before understanding how Dapr and service meshes work in detail, we need to understand what a sidecar is, as it is a pattern leveraged by both technologies.
Sidecars are containers or processes that are deployed alongside an application to extend functionality and provide isolation. They are a completely independent piece of software that abstracts the responsibility of tasks like monitoring, logging, and configuring network services from the application code.
Sidecars always have exactly one process or replica running alongside each application replica and, when running on Kubernetes, it is often deployed within the same pod as the application. When using the sidecar pattern, applications do not communicate directly with each other, but instead through their corresponding sidecars.
Understanding Service Meshes
A service mesh is a dedicated layer within your application architecture that manages service discovery and communication in distributed applications. This layer provides features including monitoring, logging, tracing, and traffic control. Service meshes are focused on the networking layer and because they do not require programming skills or application code modifications, they are typically managed by system operators.
Alongside service discovery, service meshes provide networking features like load balancing, advanced traffic management, mTLS encryption, and security policies to control access to application endpoints. Service meshes also provide comprehensive auditing and debugging features through monitoring metrics, system performance analysis, error and latency calculations, and distributed tracing.
These features are provided through network proxies that are responsible for routing requests between applications. The proxies act as a gateway between the networking layer and the applications, forcing all traffic to be routed through the service mesh. As co-located processes or containers that run alongside applications, service meshes commonly follow the sidecar pattern.
Service meshes rely on two components: the control plane and the data plane. The control plane is where service endpoints, network settings, load balancing configurations, and routing rules are registered. This information is shared with the data plane, where the sidecar proxies are hosted alongside the applications. Popular service meshes include Linkerd, Istio and Cilium.
Dapr vs. Service Mesh: Overlapping Features
Dapr provides a set of APIs for developers to build distributed applications abstracting underlying infrastructure, and leveraging industry best-practices for observability, security, and resiliency. The APIs focus on facilitating microservice development by providing building blocks for service invocation, state management, pub/sub messaging, stateful workflows, actors, and more. Dapr also traditionally relies on the sidecar pattern to provide service communication with mTLS, metrics collection, distributed tracing, and resiliency.
Opposite to service meshes, where operations teams are the primary demographic, Dapr is developer-centric, as developers need to build it into application code though native HTTP/gRPC clients or SDK integrations. Dapr and service meshes share many of the same features, commonly executed at different layers of the system architecture. Here are some overlapping features:
Secure Service-to-Service Communication
Dapr provides end-to-end security with the service invocation API, with the ability to authenticate applications with token based auth and restrict access using policies. These applications are typically scoped to namespaces for deployment, and traffic can be encrypted end-to-end using mTLS with no extra code configuration.
Dapr provides service discovery and invocation via names through the concept of “Application IDs”, which is a developer-centric concern. This means that through Dapr’s service invocation API, developers call a method on an application using their App ID allowing for easily readable code. Importantly, application IDs (or names) provide identity for the application, meaning that service discovery is dependent on where the application is currently running and also you can enforce security between applications. This is not possible with a service mesh that operates only at the network level.
Service meshes handle service-to-service communication by providing a dedicated infrastructure layer that manages and optimizes the interactions between microservices. This layer provides service discovery by managing service endpoints and makes apps more resilient, rerouting requests if they fail.
Service meshes also provide traffic management capabilities through load balancing and traffic splitting; encryption via mutual mTLS and policy enforcements, such as access control and rate limiting.
Observability
Both service meshes and Dapr provide observability into distributed systems. Service meshes operate at the network level and provide traces for the network calls between applications along with metrics and network activity logs. Dapr also provides metrics, tracing and logs for all of the APIs that are used within the system. Since Dapr provides a layer between the infrastructure and applications, it is able to include insights into the application and the infrastructure level, something that service meshes lack.
Observability with Dapr goes beyond service communication. For asynchronous messaging in particular, Dapr provides observability into pub/sub calls using trace IDs written into the CloudEvents envelope. This means that metrics and tracing with Dapr is more extensive than with a service mesh for applications that use both service-to-service invocation and pub/sub to communicate.
Both service meshes and Dapr can leverage popular tracing solutions like Jaeger and Zipkin for distributed tracing, which helps in monitoring and troubleshooting microservices by visualizing the flow of requests through different applications.
Service meshes collect and visualize traces, which represent the path of a request through various microservices. These distributed tracing solutions are commonly included as part of the service mesh deployment, enabling automatic tracing of service-to-service communication. Options like Linkerd and Istio provide dashboards to provide a visual representation of it’s metrics
Dapr also supports multiple solutions for distributed tracing, sending trace data using the OpenTelemetry and Zipkin protocols. This allows developers to monitor and visualize the interactions between Dapr-enabled microservices and their associated infrastructure resources.
Developers that want to have a full understanding of their Dapr applications can leverage Diagrid Conductor. Conductor provides full control of your Dapr environment, allowing access to a comprehensive set of features that help developers and system administrators manage the current health of their workloads. It also has advisories that show how to operate Dapr from a well-architected lens.
Resiliency Through Retries
Both Dapr and Service Meshes handle resiliency through retries by implementing policies that automatically retry failed requests, ensuring that transient errors and network noise does not disrupt the system.
Dapr provides a robust mechanism for handling retries through its resiliency policies. Users can define retry policies in a configuration file, specifying parameters like the retry strategy (constant or exponential), duration between retries, maximum interval, and the maximum number of retries. The code snippet below contains a Dapr resiliency policy that configures retries.
Service meshes, like Istio, also handle resiliency through retries by providing built-in features for retrying failed requests. These retries can be configured with parameters such as the number of attempts, the interval between retries, and the conditions under which retries should be attempted.
Dapr, Service Mesh or Both?
So, should you be using Dapr, a Service Mesh, or both? The answer here depends on your system requirements.
Applications with a Variety of Developer-Centric Needs.
If you are building microservices and need a set of building blocks that handle developer-centric needs for state management, pub/sub messaging, service invocation, and workflows, Dapr is the best choice. Dapr can also handle network security and observability requirements, for most cases like these and a Service Mesh may not be required.
Polyglot Applications with Cloud Dependencies.
Dapr is likely the best choice for microservices that are built in multiple programming languages that need to communicate with many backing cloud services. With SDKs and APIs that abstract the need to understand how these underlying infrastructure resources work, developers do not need to worry about the intricacies of using Amazon DynamoDB or Azure ComosDB for state management for example. Systems that are deployed on multiple clouds also benefit from this modularity and code abstraction that Dapr provides.
Architectures Where mTLS Needs to be Enforced for All Application Communication
If your solution requires complex network-level security that needs to be enforced with fine-grained control, network-level policies, and mTLS encryption for all applications - Dapr-enabled or not; you likely require a service mesh. Dapr provides access control and mTLS; however, it cannot provide these capabilities for apps that do not have Dapr sidecars.
The picture below demonstrates an architecture where Dapr is used for their developer-purposed APIs while service meshes are leveraged for mTLS communication between all applications.
Multi-Cluster Connectivity
Another common case is when your microservices are spread between multiple Kubernetes clusters. These architectures often require load balancing, advanced traffic routing, and traffic splitting, leaving service meshes to be the best option in this case.
The architecture below contemplates service meshes for load balancing and Dapr for developer-purposed APIs.
A common use case is where traffic splitting is needed for A/B testing scenarios. A service mesh is preferred in this case as Dapr does not provide this capability.
Architectures that Mix Dapr-Enabled Applications with Regular Apps
Typically you would use a service mesh with Dapr where there is a corporate policy that requires traffic on the network to be encrypted for all applications. For example, you may be using Dapr in only part of your system, and other services and processes that are not using Dapr also need their traffic encrypted. In this scenario a service mesh is the better option, and most likely you should use mTLS and distributed tracing on the service mesh and disable this on Dapr.
Conclusion
Kubernetes is a great platform for distributed applications. The challenges that come with portability and flexibility can be solved by using both service meshes and Dapr to make your architecture robust, resilient, and secure. In some cases, where you require capabilities that are unique to both, you will find it useful to leverage both Dapr alongside a service mesh, or you might find that the security, observability and resiliency features of Dapr alone are enough.
There is no limitation in combining both tools, and it is common for Dapr to be deployed with service meshes like Istio, Linkerd and others. Understanding the overlapping features and making sure that they are not enabled in both technologies is the key to a successful deployment. You can learn more about best practices with Dapr by following the Diagrid blog and take inspiration from what developers are achieving with Dapr reading our case studies.
Part two of this blog series will provide a step-by-step solution covering both developer-centric needs and fine-grained network requirements, focusing on how to avoid common errors that can lead to an operational nightmare. See you there.