How to record metrics to DataDog from ASP.NET Core Recording metrics and events to DataDog in development and production in AWS ECS.

I’m currently architecting and building a new microservices based system at work. A priority of mine has been to learn from the experience of building our first microservices project by putting a greater focus on logging and monitoring. We freely acknowledge that we didn’t get this as good as we would have liked in our first implementation. Logging is hugely important in a microservices design. Tracking down failures and bugs can be very difficult without good logging and even more importantly, a way to search and view those logs.

But logging isn’t what I want to cover in this post. Alongside logging, something we felt we were missing in our previous project was detailed performance and usage metrics. We can gain some insight into the state of the system using built-in AWS metrics for our ECS services and also by monitoring, again via CloudWatch things such as SQS queue metrics. This is all reasonably useful but at times it would be great to know a bit more about the services themselves.

With this new project, I’ve started instrumenting it with DataDog so that the application can record useful metrics that allow us to track it’s overall health, as well as understand how things are performing and what’s actually being used. The goal is to build up a much better profile of the application so that we can more easily see whether changes that we have deployed have improved the performance or whether they have degraded it. Having proper baselines of the application performance is key to being able to make such determinations.

The choice of DataDog was made, simply because our systems team use it already for monitoring the overall AWS environment. After a few quick enquiries, I found that recording metrics from the application code would be possible thanks to the availability of a C# library from DataDog called “dogstatsd-csharp-client”. My only concern is that the library seems to be a little stalled, with no commits since Nov 30th 2017. There is also a third party library called DatadogSharp which I may investigate in the future. Their readme suggests it is more performant than DataDog’s own library. Putting that aside for now; I decided I’d go ahead and prototype based on the DataDog library and revisit these concerns if they proved founded, once I was underway. I have built a small wrapper library over the static DogStatsd methods so I have a layer of abstraction if I decided to change the underlying client. It also allows me to centralise the use of some common tags which I wanted to include on all of my metrics.

In this post, I want to focus on the slightly higher level concepts by discussing how I got this working during development and more recently, deploying to a prototype in production running in AWS ECS.

Development

The DogStatsd client sends messages over UDP to an agent server which will collect these and eventually send them up to the DataDog service. You can read more about that flow on the DataDog DogStatsD page. Therefore, to test my code during development, I needed to have an agent server running somewhere.

In my case, I opted for a Docker image which I could run locally. After a bit of Googling, I ended up at the DogStatsD6 Docker Image GitHub repository. This image is hosted on the Docker Hub in the DataDog DogStatsD repository.

To begin working with this locally, I created a simple docker-compose file. I’ve not included some other elements I was running here for logging via the ELK stack as those are not necessary for this example. My docker-compose file looked like this:

The above example specifies a dogstatsd service which will be started using the DogStatsD6 image. By default, this DogStatsD agent server will be listening on UDP port 8125 for data. I expose that port to port 8125 on my host (development) machine. For the DogStatsD server to be able to send its data to DataDog we need to provide an API key. This can be done by passing the key into the DD_API_KEY environment variable.

At this point, I could run a docker-compose up -d command to start an instance of the DogStatsD container.

Sending Metrics and Events from an ASP.NET Core based API

As I mentioned earlier, I’ve created a wrapper library for sending DataDog metrics and events via the methods from the dogstatsd-csharp-client library. I won’t be covering that here as I want to keep this post focused on the core principles. Instead, let’s look at how we can begin with recording metrics and events to DataDog.

The first step, before we can send data to DataDog is to add the Nuget package for the dogstatsd-csharp-client library.

You can do this by searching for it via the NuGet package manager, or as I did, by editing the csproj file directly and adding the package reference.

<PackageReference Include="DogStatsD-CSharp-Client" Version="3.1.0" />

Once the library is added we must configure the client. All of the functionality for this client library is implemented as static methods. The first method we will call is the DogStatsd.Configure method. This takes a StatsdConfig object as a parameter. We’ll add a small static method in our Startup.cs class as follows:

We build up the configuration object using a server name string which we will pass into this helper method. This will be the hostname or IP of the server running the DogStatsd server. We also set the port it will be listening on. Finally, we can add a prefix which will be added to the front of any metric and event names. This is useful to make it easier to search for them in the DataDog application.

We then call the Configure method, passing in the config object.

Next, we need to call this method. We can do that from our Configure method in the Startup class. This is called once at application startup so is an appropriate place to do this. First, we’ll load the DogStatsd server name from ASP.NET Core configuration as this allows us to easily set it per environment. We’ll then call our ConfigureDataDog method, passing that server hostname through:

We’ll need to include a section for this configuration in our appsettings.json file:

During development, we can set this to the localhost IP address (127.0.01). Since we are running are DogStatsd server within a container and exposing its port through to the host we can access it this way. In production, we can then set an appropriate production hostname for the DogStatsd server.

At this point, everything is ready to go. We can now use the static DogStatsd methods to send metrics and events from our code.

The client library readme includes examples of the methods we can call but as a basic example, here’s some code we can call from our application to record a metric whenever a profiles endpoint is called.

DogStatsd.Increment("all_profiles_endpoint_call");

At this stage, we have added basic metrics to our application and have a DogStatsd server running locally in a container to collect and forward those metrics to DataDog.

Production

NOTE: Since trialling this approach, we have learned that because each sidecar is considered a new host, this may have cost implications with the DataDog pricing model. We are moving away from this approach to daemon mode containers (1 per ECS host). I will update accordingly once we are happier with that approach. For now, this stills works, but do consider how it affects your billing.

The final stage is to get this working in production. I felt there might be a few options here but the one I wanted to try was to run the DogStatsd server as a sidecar container. We use AWS Elastic Container Service (ECS) to run our production services as Docker containers.

To achieve this we will add a second container to our task definition for our ECS service. ECS services are the logical structure that represents a unit of scale and deployment for containers in ECS. The service can run multiple containers which generally is not something you need or want to do since it limits your scaling options. However, for this requirement, running a second supporting container, it is a reasonable use case. This container is directly used only by our main microservice and should scale with that service.

The final ECS task definition looks like this:

The new section in my case is the dogstatsd container definition near the bottom (lines 39-51). The task definition takes one or more container definitions which define the containers that will be started as part of the tasks for this service.

The dogstatsd one is quite simple. We tell it to use the same image as we used in development, “datadog/dogstatsd”, which will be pulled from the Docker Hub. We can pass in the API Key via an environment variable. We can mark this container as non-essential since we don’t want its failure to cascade to causing our service to restart unexpectedly.

The next change is to add the links to the main API server container definition (lines 35-37). This ensures that it will have a network link with the dogstatsd container. Finally, we can add an environment variable for the API container which sets the hostname for the DogStatsD server. This will override the 127.0.0.1 setting from our appsettings.json file. Since we have linked the containers we can reference the DogStatsd server by its container name.

With these changes made it was then a case of registering this new task definition and updating our ECS service. When the service starts tasks, it will start an instance of the DogStatsd container which our application containers can use to record their metrics and events.

Summary

It’s still early days, but this is working quite nicely in the production environment. I will be reviewing it over time as I learn more about the various elements at play. This is likely not the end of the story or our journey with DataDog. Hopefully, this is enough information to get you started if you are following a similar path. 

HttpClientFactory in ASP.NET Core 2.1 (Part 3) Outgoing request middleware with handlers.

In my previous posts in this series (An Introduction to HttpClientFactory and Defining Named and Typed Clients) I introduced some core concepts and then showed some examples of using the new IHttpClientFactory feature in ASP.NET Core 2.1. It’s been a while since those first two posts but I’d like to continue this series by looking at the concept of outgoing request middleware with handlers.

IMPORTANT NOTE: the features shown here require the current preview build of the SDK and the .NET Core and ASP.NET Core libraries. I won’t cover how to get those in this post. At the time of writing we’re in preview 2 of .NET Core and ASP.NET Core 2.1. This preview should be reasonably feature complete but things may still change. If you want to try this out today you can get the preview 2 installers but I recommend waiting until at least the RC before producing any production code.

DelegatingHandlers

To be clear from the outset; many of the pieces involved in this part of the feature have existed for a long time. HttpClientFactory simply makes the consumption of these building blocks easier, through a more composable and clear API.

When making HTTP requests, there are often cross cutting concerns that you may want to apply to all requests through a given HttpClient. This includes things such as handling errors by retrying failed requests, logging diagnostic information or perhaps implementing a caching layer to reduce the number of HTTP calls on heavily used flows.

For those familiar with ASP.NET Core, you will also likely be familiar with the middleware concept. DelegatingHandlers offer an almost identical concept but in reverse; when making outgoing requests.

You can define a chain of handlers as a pipeline, which will all have the chance to process an outgoing HTTP request before it is sent. These handlers may choose to modify headers programmatically, inspect the body of the request or perhaps log some information about the request.

The HttpRequestMessage flows through each handler in turn under it reaches the final inner handler. This handler is what will actually dispatch the HTTP request across the wire. This inner handler will also be the first to receive the response. At this point that response passes back through the pieline of handlers in the reverse order. Again, each handler can inspect, modify or use the response as necessary. Perhaps for certain request paths you want to apply caching of the returned data for example.

 

IHttpClientFactory - DelegatingHandler outgoing middleware pipeline flow

In the diagram above you can see this pipeline visualised.

 

Much like ASP.NET Core middleware, it also possible for a handler to short-circuit the flow and return a response immediately. One example where this might be useful is to enforce certain rules you may have in place. For example, you could create a handler which checks if an API key header is present on outgoing requests. If this is missing, then it doesn’t pass the request along to the next handler (avoiding an actual HTTP call) and instead generates a failure response which it returns to the caller.

Before IHttpClientFactory and its extensions you would need to manually pass a handler instance (or chain of handlers) into the constructor for your HttpClient instance. That instance will then process any outgoing requests through the handlers it has been supplied.

With IHttpClientFactory we can more quickly apply one or more handlers by defining them when registering our named or typed clients. Now, anytime we get an instance of that named or typed client from the HttpClientFactory, it will be configured with the required handlers. The easier way to show this is with some code.

Creating a Handler

We’ll start by defining two handlers. In order to keep this code simple these aren’t going to be particularly realistic in terms of function. They will however show the key concepts. As we’ll see in future posts, there are ways to achieve similar results without having to write our own handlers.

To create a handler we can simply create a class which inherits from the DelegatingHandler abstract class. We can the override the SendAsync method to add our own functionality.

In our example this will be our outer request. A StopWatch will be started before calling and awaiting the base handler’s SendAsync method which will return a HttpResponseMessage. At this point the external request has completed. We can log out the total time taken for the request to flow through any other handlers, out to the endpoint over HTTP and for the response to be received.

Just to keep things interesting let’s create a second handler. This one will check for the existence of a header and if it is missing, will return an immediate response, short-circuiting the handler pipeline and avoiding an unnecessary HTTP call.

Registering Handlers

Now that we have created the handlers we wish to use, the final step is to register them with the dependency injection container and define a client. We perform this work in the ConfigureServices method of the Startup class.

The first two lines register each handler with the service collection which will be used to build the final service provider. These need to be transient so that a new instance is provided each time a new HttpClient is created.

Next, we define a client. In this example I’m using a named client for simplicity. Check out my previous post in this series for more detail about named and typed clients. The AddHttpClient method in this case returns an IHttpClientBuilder. We can call additional extension methods on this builder. Here we are calling the generic AddHttpMessageHandler method. This method takes the type for the handler as its generic parameter.

The order of registration matters here. We start by registering the outer most handler. This handler will be the first to inspect the request and the last to see the response. In this case we want our timing handler to record complete time taken for the whole request flow, including time spent in any inner handlers, so we have added it first. We can call AddHttpMessageHandler again, this time with our ValidateHeaderHandler handler. This will be our final custom handler before the inner HttpClientHandler is passed the request to send the request over the network.

At this point we now have an outgoing middleware pipeline defined on our named ‘github’ client. When a request comes through this client it will first pass into the TimingHandler, then into the ValidateHeaderHandler. Assuming the header is found the request it will be passed on and sent out to the URI in the request. When the response comes back it first returns through the ValidateHeaderHandler which does nothing with the response. It then passes onto the TimingHandler where the total elapsed time is logged and then finally is returned to the calling code.

Summary

While I have shown how easy it is to create a DelegatingHandler and then add it to your HttpClient outgoing pipeline using the new extensions; the team hope that for most cases you will not find yourself needing to craft your own handlers. Common concerns such as logging are taken care of for you within IHttpClientFactory (we’ll look at logging in a future post). For more complex but common requirements such as retrying failed requests and caching responses a much better option is to use a third party library called Polly. The team at Microsoft have made a great decision integrate with Polly.

In my next post I’ll investigate the options for adding Polly based handlers with IHttpClientFactory. In the mean time I suggest you check out this post by Scott Hanselman where he covers the Polly extensions. You can also check out the Polly wiki for more information.

Other Posts in this Series

Part 1 – An introduction to HttpClientFactory
Part 2 – Defining Named and Typed Clients
Part 3 – This post
Part 4 – Integrating with Polly for transient fault handling
Part 5 – Exploring the default request and response logging

Updates to my ASP.NET Core Correlation ID Library Supporting correlation IDs across ASP.NET Core microservices.

Back in May 2017 I blogged about creating a simple library which supports passing correlation IDs between ASP.NET Core micro-services. The library came about because of a basic requirement we had at work to pass an identifier between related services to enable more useful error logging. By passing an identifier from the first service, through to any further services it then calls; if an exception occurs, we can quickly search for the entire history of that request across the distributed environment.

Since I released that first version to NuGet I have been staggered by the download stats. According to NuGet it now has nearly 27,000 downloads at the time of writing this post. I never really expected it to be used that heavily so this is a really pleasant surprise. I’m very pleased that something I’ve created is helping others with a similar requirement. It is a little daunting to think that so many people are dependent on that library in their code!

Three months ago I released version 2.0 of the library which added the concept of a CorrelationContext. This was a something I’d been considering almost immediately after completing version 1.0. An issue with v1 was that I’d chosen to set the TraceIdentifier on the HttpContext to match the correlation ID being passed in via the request headers. In controllers, where the HttpContext is accessible, this was not a major issue since the value of TraceIdentifier could then be read and used in logging. However, to use the correlation ID elsewhere, the only way to access it was via the IHttpContextAccessor. This isn’t registered in ASP.NET Core by default and so for some users of the library meant they would have to register it to make full use of the correlation ID.

I based my version 2 changes on the HttpContext and HttpContextAccessor in ASP.NET Core and this seems to have worked quite nicely so far. This required a breaking change for the library since it needed to register some services to support the new CorrelationContext.

Today I released version 2.1 of the library. This version adds two new configuration options that can be set when registering the middleware. One of these options came as a result of a GitHub issue requesting that it be possible to disable updating the TraceIdentifier with the correlation ID. This is now possible since the ID is passed around in the CorrelationContext and I needn’t rely on the HttpContext. To avoid breaking changes I added the option with it default setting behaving as it did before. I may look to change this default in the next major release.

I took the opportunity to add another new option that determines if the correlation ID should be matched to the TraceIdentifier or whether it should be a GUID in situations where an ID is not present in the header. For some users I can see this being useful and at work I’m considering the move to a GUID for our correlation ID.

A final change I was able to incorporate came as a result of another feature request via the projects GitHub issues. In this case it was to include the configured correlation ID header name on the CorrelationContext. This resulted in the first external PR on the project which I was very happy to receive. Thanks to Julien for his contribution.

I hope that people using the library are happy with these changes. As ever I’m happy to take feedback and ideas via GitHub if there are use cases that it doesn’t currently support.

UPDATE: It seems I wasn’t as careful as I thought about breaking changes. One did slip through in this release as I updated the interface and implementation for the ICorrelationContextFactory to support the new property on the context. If you’re consuming the library this is not something you generally need to access but if you’re mocking for unit testing it’s possible this will break there. Apologies! Turns out it’s harder than you think avoid breaking changes when released publicly!

HttpClientFactory in ASP.NET Core 2.1 (Part 1) An Introduction to HttpClientFactory

TL;DR;

A new HttpClientFactory feature is coming in ASP.NET Core 2.1 which helps to solve some common problems that developers may run into when using HttpClient instances to make external web requests from their applications.

Introduction

This blog post has been in the works since mid-October 2017, which was when I first noticed the new HttpClientFactory repository appear on GitHub. I was intrigued by its appearance and wondered what the ASP.NET team were up to, so I went diving into the available code that the repo contained at the time. I’ve then kept an eye on it ever since, watching as the team evolved the feature by reading the commits, issues and pull request discussions.

Recently the feature has started to be talked about more openly and was included in a recent talk by Damian Edwards and David Fowler at NDC London. In fact on the day of writing this introduction it’s been shown on both Jeff Fritz’s livestream show and the ASP.NET Community Standup. The opinion of Ryan Nowak, one of the main ASP.NET developers for the feature, is that it’s reasonably stable to begin writing about it now.

NOTE: Please bear in mind that this post is written prior to the official preview release of .NET Core 2.1 by using the nightly builds of ASP.NET Core 2.1 and the .NET Core SDK. Therefore, things may change before and during the public previews (hopefully we’ll get these within the next month) and also before the final release of 2.1 based on feedback received from those previews.

What is HttpClientFactory?

In the words of the ASP.NET Team it is “an opinionated factory for creating HttpClient instances” and is a new feature coming with the release of ASP.NET Core 2.1. Depending on your past experience using HttpClient, you may or may not be aware of some of the pitfalls that can be encountered, sometimes without even being aware that you have a problem.

The first issue is when you create too many HttpClients within your code which can in turn create two problems…

  1. It’s inefficient as each one will have its own connection pool for the remote server. This means you pay the cost of reconnecting to that remote server for every client you create.
  2. The bigger problem you can have if you create a lot of them is that you can run into socket exhaustion where you have basically used up too many sockets too fast. There is a limit on how many sockets you can have open at one time. When you dispose of the HttpClient, the connection it had open remains open for up to 240 seconds in a TIME_WAIT state (in case any packets from the remote server still come through).

HttpClient implements IDisposable and this often leads developers to follow the normal pattern when using an IDisposable object, creating it within a using block. This ensures that the object is properly disposed of once you’re done with it and it has gone out of scope. If you want to read more about this, it is well documented by the ASP.NET Monsters in their post “You’re using HttpClient wrong and it’s destablizing your software”.

A preferred approach therefore it to reuse HttpClient instances so that connections can also be reused. HttpClient is a mutable object but as long as you are not mutating it, it is actually thread safe and can be shared. A common approach is therefore to register it as a singleton with a DI framework or to create a wrapper around it which holds a static instance.

However, this creates a new problem. Using a single HttpClient in this way will keep connections open and not respect the DNS Time To Live (TTL) setting. Now the connections will never get DNS updates so the server you are talking to will never have its address updated. This is entirely possible in some situations where you are balancing over many hosts that may go away over time or perhaps rolling out new services using blue/green deployments. If the server is gone, the IP your connection is using may no longer respond to requests that you make through the single HttpClient. You can read more about this issue at “Singleton HttpClient? Beware of this serious behaviour and how to fix it” and “Singleton HttpClient doesn’t respect DNS changes”.

HttpClientFactory is designed to help start solving these problems and provides a new mechanism to create HttpClient instances that are properly managed for us behind the scenes. It will “do the right thing” for us and we can focus on other things! While the above problems are mentioned in reference to HttpClient, in fact the source of the issues actually occurs on the HttpClientHandler, which is used by HttpClient. The HttpClientFactory manages the lifetime of the handlers so that we have a pool of them which can be reused, while also rotating them so that DNS doesn’t get stale.

The expensive part of using HttpClient is actually creating the HttpClientHandler and the connection. Having these pooled in this manner means we can get more efficient use of the connections on our system. When you use the HttpClientFactory to request a HttpClient, you do in fact get a new instance each time, which means we don’t have to worry about mutating it’s state. This HttpClient may (or may not) use an existing HttpClientHandler from the pool and therefore use an existing open connection.

By default, each new HttpClientHandler (which derives from HttpMessageHandler) will be created with an active lifetime of 2 minutes. This can be controlled on a per named client basis when creating it’s handler chain. Once the lifetime is reached, the handler will not be immediately be disposed of and will instead be placed into the expired pool. Any clients depending on the original handler chain can continue using it without any issues. There is a background job checking the expired pool to see if all references for the handler have gone out of scope, at which point it can then be disposed of. Any new requests for a new client once the handler chain has been expired will get a new handler chain.

This works reasonably well, but there are other things underway on the .NET Core side which might improve the situation further. The .NET Core team are working on a new ManagedHandler which should manage DNS more correctly and in principle can be kept around for longer, meaning connections can be shared even more efficiently. This new handler is also being designed to function more consistently across the different operating systems. Until that work is completed (which might be in the 2.1 time frame) the pooling of handlers above is a reasonable workaround.

How to use HttpClientFactory

IMPORTANT NOTE: The features and code samples shown here require the current nightly builds of the SDK and the .NET Core and ASP.NET Core libraries. I won’t cover how to get setup to use those in this post. Treat this as an early preview of how the feature will work so that you can begin planning why, where and how you will use it once 2.1 is publicly available. Unless you have an urgent need to try this out today, I’d recommend waiting until the 2.1 previews are released, hopefully within the next month or so.

In this post I’ll concentrate on one of the most basic ways to get started with the HttpClientFactory. For this example, we’ll start by creating a simple WebAPI project and then edit the csproj file to upgrade it to use the new .NET Core and ASP.NET Core 2.1 bits. First we need to set it to be based on netcoreapp2.1 (not yet in official preview) and then include two packages which we’ll need. For this post I’m pinning those to specific preview nightly build versions available on the ‘dev’ MyGet feeds. After doing this our project file looks like this:

Next we need to head over to our Startup.cs file and register a service. The HttpClientFactory includes various ServiceCollection extensions. The one we’ll use for this example is:

services.AddHttpClient();

Behind the scenes this will register a few required services, one of which will be an implementation of IHttpClientFactory. Next we’ll update the default ValuesController to make use of this feature:

Here we are first adding a dependency on IHttpClientFactory which will be injected into our controller by the DI system. The IHttpClientFactory allows us to ask for and receive a HttpClient instance.

Within our Get action we are then using the HttpClientFactory to create a client. Behind the scenes the HttpClientFactory will create a new HttpClient for us. But wait, didn’t I say earlier that using a new HttpClient for each request is bad? Indeed I did; but in fact that was a little bit of misdirection. The HttpClient itself is not really the problem, it’s the HttpClientHandler which it uses to make the HTTP calls that is the actual issue. It’s this which opens the connections to the external services that will then remain open and block sockets, even in the main HttpClient is disposed of.

HttpClientFactory pools these HttpClientHandler instances and manages their lifetime in order to solve some of the issues I mentioned earlier. Each time we ask for a HttpClient, we get a new instance, which may (or may not) use an existing HttpClientHandler. The HttpClient itself it not too heavy to construct so this is okay.

Once created the HttpClientHandlers are pooled and held around for around 2 minutes by default. This means that any new requests for CreateClient may share a handler and therefore the connections also. While a HttpClient lives, it’s handler will remain available and again this will share the connection.

After the two minutes, each HttpClientHandler is marked as expired. The expired state simply marks them so that they are no longer used when creating any new HttpClient instances. They are not immediately disposed however, as other HttpClient instances may be using them. The HttpClientFactory uses a background service which monitors the expired handlers and once they are no longer referenced, can then dispose of them properly, allowing their connections to be closed also.

This pooling feature helps reduce the risk of socket exhaustion and the refreshing process helps solve the DNS update problem by ensuring we don’t have long lived instances of HttpClientHandlers and connections hanging around. It’s a reasonable compromise which is managed for us by making use of the HttpClientFactory feature.

Summary

I’ll leave it there for this introductory post. In future posts I’ll dive into some of the more advanced ways we can use HttpClientFactory as there’s some nice features so show off. We’ll look at how we can create named HttpClient instances with configuration and also creating our own typed clients. This is where the feature will really begin to shine. Hopefully you’ll have seen, even in this basic example, how it improves use cases where you have a requirement to make HTTP calls in the most correct and efficient way. We don’t need to think about how we manage the lifetime of the clients or worry about running into DNS issues. I’m looking forward to using this in production once ASP.NET Core 2.1 is released.

Other Posts in this Series

Part 1 – This post
Part 2 – Defining Named and Typed Clients
Part 3 – The Outgoing Request Middleware Pipeline with Handlers
Part 4 – Integrating with Polly for transient fault handling
Part 5 – Exploring the default request and response logging

Docker for .NET Developers Header

Docker for .NET Developers (Part 3) Why we started using Docker with ASP.NET Core

In the prior two posts in this series we took a look at the main Docker terminology and looked at creating a basic dockerfile to define a Docker image containing an ASP.NET Core API application. In this post I want to switch it up a bit and spend a little time sharing a specific practical reason that led our team to start using Docker in our development flow. This was our first step in the Docker journey that we now find ourselves on and expands on the summary I introduced in part 1.

Why we started using Docker

Warning! This post has no code examples. Instead what I’ll be describing is a real-world situation that led us to start using Docker for our new project. I want to highlight why Docker was a good choice given our requirements and hopefully you’ll find cases in your work where it may be relevant to consider Docker yourself. I think it’s useful to share this story, while there’s a lot of excitement about Docker and some clear benefits, it’s useful to reflect on a practical advantage it brought us with very little effort. This is simply the start of the journey and I will be expanding on how our use evolved as the series continues. I hope you find this part useful, but don’t worry, I won’t be offended if you want to skip onto the docker-compose examples in part 4! See, I even provided a link!

As I explained in the first post, our new greenfield project at work kicked off in 2016. We needed to build a system to provide data analytics over a very large data set of event based data. I won’t cover the architecture of the back end components at this stage. For this post it’s the front end where I want to demonstrate the first benefit we were able to gain from using Docker.

At work, our front end and back end teams are split and work on their areas of expertise accordingly. On our current platforms, we have an in-house MVC framework which uses a custom templating language for the front end pages. When our front end developers need to perform work on the UI, they must pull down and update the main platform code on their devices in order to spin up the site locally. Given that this is a large platform, the size of the changes they need to pull could be fairly large. In order to work with the site, they must compile the code and run it locally, which, as this platform is built on the full .NET framework, requires Windows and IIS. Therefore, each developer, most of whom use a Mac, must have a Windows VM installed, or access to a Remote VM in order to work with the code.

Over time this flow has evolved to make the process as efficient as possible but it’s still not an ideal solution. With this new project we had the opportunity to try a new more modern approach. Given that the UI for this analytics style application was very interactive, we decided that it made sense to develop a SPA style front end. This had the benefit that it enabled the front end developers to choose a technology that suited them. After a bit of research they landed on Vue.js.

In order to support the front end SPA, we agreed that we would provide a number of REST API services, each providing a specific, bounded functionality. We could have chosen to build one larger, all-encompassing API application, but decided that the smaller APIs made scaling each component much easier. Scaling would be dependent on their own unique load and allow us to better use the resources of the server. It also means we can keep the code separated and hopefully easier to maintain. We can change each component independently and work more rapidly.

We ended up with a design that included 3 web facing API services, each backed by its own database. We have a user API which handles authentication and authorisation for the application, as well as other user management services. We have a report API that enables the creation and running of reports and we have a schedule API used for creating scheduled reports and managing downloads. Finally, we have a back end API which is responsible for querying over ElasticSearch, processing the data and returning the resulting data. This query API is used by both the reporting API and our offline back end scheduled report service.

Here is a diagram showing the main components our of front end services architecture:

Docker microservice architecture

At the time we were creating our architecture, ASP.NET Core was in RC and we took a brave but in the long run, good decision to use the new framework to develop our APIs. Not only were we interested in the performance and framework improvements, but we also knew we could begin to take advantage of the cross platform nature of .NET Core to allow us to look at different hosting options. Having recently started moving our systems into AWS we initially were considering the possibility of using Linux VMs to host the APIs.

As we pursued this path, another benefit of this decision came to light. Now that we were cross platform and planning to target Linux for hosting, we realised we could start to look at Docker and containerisation as part of our workflow. This opened up some new possibilities and after talking through the concept we realised that we could make development for the front end team much easier by providing Docker images to run the API code. The big change this presented is that they would no longer need to run Windows in order to build the UI for the platform and they would no longer need Visual Studio in order to build the source code.

A downside of splitting things into smaller microservices is that in order for a system to function, you often need multiple things running together at the same time. In our case, each of the 4 main API services needed to be running for the front end to be able to fully function. Each of those also needed a database server, we had chosen Postgres, to support the data storage requirements. Spinning up all of these parts needed a little coordination.

When we started looking at Docker we realised that we could solve this problem using docker-compose files. Docker-compose provides a simple way to start multiple containers with a single command. In our case, each service is a separate Visual Studio solution and repository. This allows us to develop, build and deploy each part of the system individually. As long as the public facing API endpoints do not change then we can update the internals of each API with no impact on the other parts of the system. In each solution we include a dockerfile to describe a Docker image that that will run the API service. These dockerfiles initially started very much like the example I showed in part 2 of this series. We have since gone on to optimise the files and I plan to explore that in a future post.

In addition to the dockerfiles we also provide a docker-compose file which, with a single command, can be used to build all of the required images. With a second command we can start all of the containers needed to support the front end. Our first iteration of the front end workflow relied on the front end developers pulling the latest source from each repository. They could then use the docker-compose build command which triggered image creation. Since the build is happening inside the Docker containers, they do not need to run Windows or even have the .NET SDK installed on their Macs. As part of the solution, we also utilised a public Docker image for Postgres as one of the services defined within our docker-compose file. This means that the front end team do not need to install Postgres on their own device either. Other than Docker, there are no dependencies to run the back end services required to develop the front end website. With these steps we had removed a barrier for the front end team and very much simplified our development process.

In the second iteration that we are now working on for the front end workflow we are providing the front end team with a new docker-compose file which pulls images from a private container registry running in AWS ECR. With this change there is no need for the front end developers to ever pull or update the source for the components they need. Instead, the compose file will pull the latest available images from the registry and have them up and running extremely quickly. We’re still investigating and testing how we want to finalise this part of our development workflow so it’s something I’ll share in a future post.

The same process proved really useful for QA and testing as well. Anyone involved with testing the system could pull the components down and run the full system in isolation on their machine. The front end website is also containerised and in this case we even used an ElasticSearch Docker image to allow a full end to end system to be started and tested on a machine, with few external dependencies. The dependencies we could not containerise were things such as access to Amazon AWS SQS and S3 for example.

One by-product of leveraging Docker for the front and back end developer flow is that we can start to eliminate the “it worked on my machine” arguments. By loading everything inside Docker we get to a place, where everyone is working on an identical, repeatable environment. Because our image contains all of the dependencies, we can be sure we have matching versions and configuration every time it is run. We don’t have to ask developers to maintain correct versions of dependencies on their devices either.

Other Posts In This Series

Part 1 – Docker for .NET Developers Introduction
Part 2 – Working with Docker files
Part 3 – This post
Part 4 – Working with docker-compose and multiple ASP.NET Core microservices
Part 5 – Exploring ASP.NET Runtime Docker Images
Part 6 – Using Docker for Build and Continuous Deployment
Part 7 – Setting up Amazon EC2 Container Registry