Using HttpCompletionOption to Improve HttpClient Performance in .NET - Steve Gordon

In this blog post, I want to get back to an area I genuinely enjoy, exploring the use of HttpClient within your .NET applications. I’ll explain how you can optimise the performance of HttpClient when handling data such as JSON payloads on the HttpResponseMessage.

Making HTTP Requests Using HttpClient

By default, in the majority of cases, when using most of the overloads of HttpClient, the entire response body is read into a memory buffer before the method completes. This is the case for the GetAsync, PostAsync and SendAsync methods. At that point, the TCP connection, used for the request, goes idle and will be available for re-use for another request.

This behaviour is true in the case where the response body uses either ‘Content-Length’ or chunked ‘Transfer-Encoding’ semantics. In those cases, the data received on the socket indicates the end of the response body.

This default behaviour is reasonable since it ensures that we don’t tie up sockets on the host for any longer than absolutely necessary. The downside is that it introduces some memory overhead. The response is buffered into a MemoryStream which is then available on the HttpResponseMessage. Depending upon the size of the response payload, this may mean that we buffer a rather large amount of data into memory.

What is HttpCompletionOption?

HttpCompletionOption is an enum with two possible values. It controls at what point operations on HttpClient should be considered completed.

The default value is ResponseContentRead which indicates that the operation should complete only after having read the entire response, including the content, from the socket. This aligns with the default behaviour I described above, with the content being buffered into a MemoryStream so that the connection can go idle.

The second possible value is ResponseHeadersRead which returns control to the caller earlier, as soon as the response headers have been fully read. The body of the response may not be fully received at this point.

Why Use ResponseHeadersRead?

You may now be wondering what benefit the ResponseHeadersRead option provides.

The main benefit is for performance. When using this option, we avoid the intermediate MemoryStream buffer, instead of getting the content directly from the stream exposed on the Socket. This avoids unnecessary allocations which is a goal in highly optimised situations.

Another benefit in regards to performance is that we can begin working with the stream of data more quickly. In the default mode, when ResponseContentRead is used, first the content is buffered, then the method returns control to the calling method. With ResponseHeadersRead, we can begin reading the data from the stream, even while it is being sent over the network. This benefit requires that any processing can take advantage of partial response data, such as deserialising using streams with JSON .NET or System.Text.Json.

How to Use ResponseHeadersRead

Overloads of specific HttpClient methods accept HttpCompletionOption as a parameter. The conventional methods are GetAsync and SendAsync, where overloads exist to accept your choice of HttpCompletionOption.

For example, we can provide it as the second argument to GetAsync as follows.

_httpClient.GetAsync("http://example.com", HttpCompletionOption.ResponseHeadersRead);

The first argument for GetAsync is the request URI as either a string or Uri instance. The overload above accepts the HttpCompletionOption to use as a second argument. A further overload exists which also supports asynchronous cancellation by accepting a CancellationToken as the third argument.

Once we specify ResponseHeadersRead, the GetAsync method will return to us as soon as the headers have been fully read from the response. At this point, we can inspect those headers if we want to.

Further optimisations may exist for your scenarios, since you may be able to decide, purely on the headers whether you care about the content or not. I’ve seen systems which include some kind of domain-specific status header, which may indicate that while content exists, it’s not relevant to the calling code. We could, therefore, avoid processing the content entirely in that case.

Typically though, you will want to parse the content in some form. Let’s look as some code to get request some data from an API and deserialise the JSON response. To keep things uncomplicated, this code omits some defensive checks, such as validating that the content type which the server sent is actually JSON. It also does not include cancellation support for the async methods.

The preceding code uses the GetAsync method on the HttpClient. First, it provides the URL of an API endpoint to get data from. We have also passed in the ResponseHeadersRead HttpCompletionOption value, indicating that we want this method to return as soon as the headers have been read.

We then check to ensure that the request succeeded and has a 2xx status code. If so, we check that there is content available on the response. We can now access the stream from the response content using ReadAsStreamAsync.

In this example, we are using the JsonSerializer from System.Text.Json to deserialise the payload. This accepts a stream and will attempt to deserialise the data into a List of Book objects. My Book type is a basic class with four string properties. It’s not really important what the book data looks like.

There’s a crucial topic hiding in the code above.

Disposal of HttpResponseMessage

You may have noticed that HttpResponseMessage implements IDisposable since it’s possible that it can hold onto OS resources. This is true only in the scenario where we choose the ResponseHeadersRead option.

When using this option, we accept more responsibility around system resources, since the connection to the remote server is tied up until we decide that we’re done with the content. The way we signal that is by disposing of the HttpResponseMessage, which then frees up the connection to be used for other requests.

Therefore, we must remember to dispose of the response. If we fail to do so, the finalisers should eventually run and release the connection, but it won’t necessarily be very timely. In the preceding code, I’ve used a using declaration to achieve that.

using var response = ...

With this shorthand notation, the HttpResponseMessage will be disposed of at the end of this method. If you want to know more about the using declaration syntax, I have blogged about that previously.

The above code is reasonable in this case and keeps things pretty concise. If you have a lot of code after deserialising the content, this may not have the desired effect. Since the disposal of the response only occurs when the current scope, the method, in this case, ends, long-running work before the method completes could force the connection to be tied up longer than we would like.

In this case, a using statement may be better as you can control it’s scope to dispose of the response and release the connection as soon as the content has been deserialised. Alternatively, a try/finally block can be used.

It’s useful to know that the GetByteArrayAsync and GetStringAsync methods on HttpClient are optimised internally and use the ResponseHeadersRead option, disposing of the response correctly.

Comparing Performance

The goal of using HttpCompletionOption.ResponseHeadersRead is to achieve a performance optimisation. So let’s take a look at what gains we can accomplish.

My test code is available up on GitHub if you want to play around with it. Please be aware that these are rough benchmarks for basic comparison and it may result is some multi-modal data.

The benchmarks project includes two methods which make a GET request for some data from an example API. The API returns an array of data representing books. One of the benchmarks uses HttpCompletionOption.ResponseHeadersRead and the other, uses the default, HttpCompletionOption.ResponseContentRead.

I ran variations of the code for my tests and also tested with a relatively small payload of 5 books in the array as well as a much larger 250 book payload.

Results

The first test case is where the response content is read from the stream and deserialised. The results are as follows:

With the smaller 5 book payload, there is a modest improvement in the allocations, but it’s not hugely significant. However, in the 250 book payload scenario, we can see close to a 75% reduction in the allocated bytes.

The fact that we avoid buffering everything into the intermediate MemoryStream has saved us a decent amount of allocations and potential GC pressure.

The execution time data for these tests is not entirely accurate as I was calling a “real” API. Even though it was local to my benchmarks, I take the data with a pinch of salt. Different benchmark code would be needed to be more accurate and to take the API out of the equation.

Still, we can see a slight reduction in the execution time for the larger payload. This makes sense since we’ve avoided some data copying in the execution path.

Next, let’s rerun the benchmarks, this time without deserialising the JSON data. We will simply get a stream to read from and end the benchmark there. These results take the allocation overhead of deserialising lots of book objects out of the mix so we can focus more on the HttpClient overhead.

Again, for the small payload, the difference is negligible. Unless you are focused on every ounce of performance, when payloads are small, you won’t see a massive gain from manually specifying the ResponseHeadersRead option.

When we look at the large payload benchmark, the reduction in allocations is even more significant. We have decreased allocations by around 93%. Most of the remaining allocations are the creation of the HttpRequestMessage and HttpResponseMessage objects for each request being sent.

Summary

In this post, I have hopefully demonstrated how the HttpCompletionOption enum can be used to produce more performance optimised code. It requires a little more consideration than the default code since we are now more responsible for ensuring that the connection is released promptly to avoid more sockets being required.

The code complexity is not substantially increased though, and when we expect large payloads, we can reduce the allocations considerably for our application by avoiding the pre-buffering of the response content. We are also able to access the content stream more quickly, which can help with reducing the execution time of working with responses. In high use services, this small latency reduction could help with overall throughput.

As ever, with performance optimisations, ensure you take measurements and have suitable production metrics in place to monitor any changes you make.

Have you enjoyed this post and found it useful? If so, please consider supporting me:

Buy me a coffee

Steve Gordon

Steve Gordon is a Pluralsight author, 6x Microsoft MVP, and a .NET engineer at Elastic where he maintains the .NET APM agent and related libraries. Steve is passionate about community and all things .NET related, having worked with ASP.NET for over 21 years. Steve enjoys sharing his knowledge through his blog, in videos and by presenting talks at user groups and conferences. Steve is excited to participate in the active .NET community and founded .NET South East, a .NET Meetup group based in Brighton. He enjoys contributing to and maintaining OSS projects. You can find Steve on most social media platforms as @stevejgordon