Resilient Systems with Polly.NET


Probably most of the Web sites and Applications we use or visit today are of a distributed nature and running highly complex infrastructure and using sophisticated Software Design patterns for the Cloud along with factors that can lead to difficulties and failure. We can look out there for companies like Amazon, Netflix, Hulu, Facebook, to mention some, where they have invested Infrastructure and Software to deal with failure. We should not be strangers to the fact Systems will fail, that we should embrace and always be prepared for it.

Distributed Systems

Some Systems eventually face the situation where they need to be able to handle larger workloads with virtually no impact on performance, at least nothing “noticeable” by users and us, this leads engineers to re-think approaches and architect systems that can properly scale. Distributed Systems are strategically designed and built with such scenarios in mind where engineers connect different components together, establish a protocol, workflows and policies for efficient and reliable communication. The tradeoff however, is that this inevitably introduces more complexity as well as new risks and challenges.

Resiliency in Distributed Systems

This has to do how a System copes with difficulties, particularly that such System should be able to deal with failure and ideally recover from it, there are different events that introduce the risk of failure such as:
  • General network connectivity intermittency or temporary failure, i.e. Outages.
  • Spikes in load.
  • Latency.
Also, there are quite a few scenarios we can encounter where dependency between systems exists, to mention some:
  • Basic network dependencies like Database Servers.
  • Systems integration and distributed processes.
  • SOA based Applications
At the end of the day, it is about embracing failure, being mindful of it when designing and developing solutions, paying close attention to requirements, workflows and key aspects like critical paths, integration points, handling cascading failures and system back pressure.

Introducing Polly.NET

This is a library that enables you to write fault-tolerant, resilient .NET based Applications by means of applying well-known techniques and software design patterns through a API, to mention some of the features of the library:
  • Lightweight.
  • Zero dependency, it is only Polly.NET Packages.
  • Fluent API, supports both synchronous and asynchronous operations.
  • Targets both .NET Framework and .NET Core runtimes.
They way the library works is by allowing developers express robust and mature software design patterns through Policies. The developer will usually go through the following steps when using it:
  1. Identify what problem to handle, this will likely translate to an Exception in .NET.
  2. Policy or Policies to use.
  3. Execute the Policy or Policy group.


To demonstrate some of the Policies offered by Polly.NET we are going to assume we need to reach a Web API endpoint to retrieve a list of Users. These are some of the Policies that the library offers out of the box, as mentioned before we can use the Policies individually or combined. The first one we will look at is Retry which allows to specify the number of attempts to conduct should an action fail the first time, this could interpreted as the Systems just having a short-lived issue so that we are willing to give it another chance, we can see an example below:
// Simple Retry Policy, will make three attempts before giving up.
var retryPolicy = Policy
HttpResponseMessage response = await retryPolicy.ExecuteAsync(() => httpClient.GetAsync(uriToCheck));
The approach presented above is pretty straightforward, a failure is considered to happen when an exception of type HttpRequestException is raised, if this happens it will just retry as many times as specified in WaitAndRetryAsync, in this case three, it will give up if the call does not succeed, in other words, if the exception continued to be raised which in such case it will bubble up.
There are various versions of Retry for instance we can tell to just retry Forever although that might not be the best choice in most situations, however, there is one version that allows to do Exponential Back off which is Retry and Wait.
It behaves like the Retry policy, with the addition of being able to space out each call with a given duration, take the following example:
// Retry with a 'Wait' period which is given as TimeSpan in thte second argument to WaitAndRetryAsync
var retryPolicy = Policy
                   .WaitAndRetryAsync(3, (r) => TimeSpan.FromSeconds(r * 1.5f));

HttpResponseMessage response = await retryPolicy.ExecuteAsync(() => httpClient.GetAsync(uriToCheck));

Now, in this version, we can see an extra lambda function returning a TimeSpan which represents the time we want to wait before each subsequent attempt, if we dissect that line we can find the following:

  • The argument r passed to the lambda represents the attempt number, in this case it will have three possible values: 1,2 and 3.
  • We multiply the attempt number by a float constant greater than 1, this gives the effect of an increasing delay on each attempt.

Applying a Strategy

As mentioned before, we can apply several policies as a group, for instance, Retry with Fallback, this is achieved by using Policy.Wrap, we can construct strategies by properly combining different policies that can be applied and re-used, this obviously goes on a case-by-case basis, what is certain is that there are situations where just a single policy may not suffice.
The following example demonstrates the combination of Retry and Wait with Fallback, the latter allows to degrade gracefully  upon failure.
// Exponential back-off behavior
var retryPolicy = Policy
                    .WaitAndRetryAsync(3, (r) => TimeSpan.FromSeconds(r * 1.5f));

// Degrade gracefully by returning bad gateway
var fallbackPolicy = Policy
                       .FallbackAsync((t) => Task.FromResult(HttpStatusCode.BadGateway));

* Combine them, first argument is outer-most, last is inner-most.
* Policies are evaluated from inner-most to outer-most, in this case:
* retry then fallback
var policyStrategy = Policy.WrapAsync(fallbackPolicy, retryPolicy);
HttpStatusCode resultCode = await policyStrategy.ExecuteAsync(async () => (await httpClient.GetAsync(uriToCheck)).StatusCode);

We are dealing with status codes here for simplicity of the example, but a better approach for the fallback action would be to follow a Null Object Design Pattern and produce an empty list of Users, this really depends on the requirements and structure of our Application. The patterns we have introduced in this article are only a few I have picked to introduce you to how the library works and how such patterns can be applied. Polly exposes even more Policies to enforce a broad set of well-known software design patterns for resiliency such as:

  • Circuit Breaker.
  • Advanced Circuit Breaker.
  • Bulkhead Isolation.
  • Fallback.
  • Cache (which gives us the cache-aside / read-through pattern)
It is up to us to identify and understand the interactions between the different components of an Application, possible fragile points where we could potentially face failure and define a solid strategy on how to deal with such situations.



Leave a Reply

%d bloggers like this: