Readers who have followed me for some time will know that I have developed a bit of a passion for performance improvements and avoiding allocations in critical code paths. Previous blog posts have touched on examples of using Span<T> as one mechanism to prevent allocations when parsing data and using ArrayPool
to avoid array allocations for temporary buffers. Such changes, while good for performance, can make the new version of the code harder to maintain.
In this post, I want to show how performance optimisations do not always require extensive and complex code modifications. Sometimes, there is low hanging fruit that we can tackle for some quick performance wins. Let’s look at one such example.
Identifying Optimisations
I was recently poking around in the Elasticsearch.NET client code base. I became curious about performance on some of the hot paths within the library.
For those new to profiling applications, a hot path is a sequence of methods which are called often within a code base under typical usage. For example, in a web application, you may have one endpoint, which is called extremely often in production when compared to all other endpoints. The code from the corresponding action method will likely be the start of a hot path in your application. Any methods it calls, in turn, may be on the hot path depending on any conditional execution flows. Another less obvious example is code within loops, which may generate a lot of calls to other methods if the loop executes many hundreds or thousands of times.
When optimising the performance of applications, you generally want to focus on hot paths first since changes and improvements, there will have the most significant effect due to their high call rate. Optimising code which is called only 10% of the time, may yield much smaller gains.
There are two related Elasticsearch clients for .NET. NEST is a high-level client supporting strongly typed querying. It sits on top of Elasticsearch.NET, the low-level client.
Inside the NEST namespace, there is an abstract RequestBase
class, from which the strongly typed request types are derived. A strongly typed request class is generated for each of the Elasticsearch HTTP API endpoints which may be called. A primary feature of a request is that it contains the URL or URLs for the API endpoint(s) to which it relates.
The reason that multiple URLs may be defined is that many API endpoints of Elasticsearch may be called with a base path or with a path containing an identifier for a particular resource. For example, Elasticsearch includes endpoints to query the health of a cluster. This can be the general health for the whole cluster using the URL “_cluster/health”; or he request can be limited to specific indices by including the index name in the path, “_cluster/health/{index}”.
These are logically handled by the same request class within the library. When creating the request, the consumer may provide an optional request value to specify a particular index. In this case, the URL must be built at runtime, replacing the {index} portion of the URL pattern with the user-provided index name. When no index name is provided, the shorter “_cluster/health” URL is used.
At the time a request is sent, the final URL must therefore be determined and built. The URL pattern to use is first matched from the list of potential URLs. This is based on the number of request values which may have been specified on the strongly typed request object. Once a URL pattern has been matched, the final URL can then be generated. A tokenised version of the URL pattern is used, where necessary, creating a final URL string by replacing any optional tokens with values from the route values provided by the consuming code.
The core of this URL building takes place in a UrlLookup
class which includes a ToUrl
method as follows:
The above code starts by creating a StringBuilder
instance. It then loops through each string from the tokenised URL. The tokenised elements of the URL path are stored in the string array field “_tokenized”. On each iteration, if the tokenised value begins with an ‘@’ character, this identifies that it needs to be replaced with a corresponding consumer provided route value. The route values are searched for a match to the current token name, held within the “_parts” array. If a match is found, the value is appended to the URL StringBuilder, after URI escaping it (line 15).
For any path parts which do not require replacement from the route values, those are appended directly onto the string builder without modification (line 21).
Once all tokenised values have been appended and replaced where necessary, the final string is returned by calling ToString on the builder. This code will be called for each request made from the client, so it’s on a pretty hot path within the library.
How could we consider optimising this so that it performs faster and allocates less?
Right now, the code is using a StringBuilder, which is a good practice to avoid string allocations when concatenating an unbounded number of strings together. There are some options here to use a Span<T> based approach to build the string which could certainly reduce allocations. However, adding Span<T> and other techniques such as using ArrayPools to provide a zero-allocation buffer will add complexity to the code. Since this is a library used by many consumers, such work could indeed be a worthwhile trade-off.
In much of your day-to-day code, such an optimisation would likely be overkill, unless your service is under extreme use/load. Once you know the high-performance tricks such as Span<T>, it can be tempting to allow your thoughts to jump straight to the most optimised potential, targeting zero allocations. Such thoughts can blind you from the low hanging fruit which you should consider first.
When I was reviewing the ToUrl method and thinking through the logical flow, one thing came to mind. Two additional lines should be able to provide a simple but effective performance gain for some scenarios. Take another look at the code above and see if you can spot any opportunities for a simple improvement. Hint: The new lines belong right at the start of the method.
Let’s consider again the cluster health example where there are two potential URL patterns, “_cluster/health” and “_cluster/health/{index}”.
The latter requires the last part of the path to be replaced by a user-provided index name. The former though, has no such requirement for any replacement. This is true for a vast majority of the endpoints where only some cases will require path parts to be replaced with route values from the consumer. Are you starting to see where I’m going here?
My theory was that the ToUrl method could, in some cases, avoid the need to build a new URL at all. This removes the need to use (and allocate) the StringBuilder instance or generate a new URL string. Where there are no parts in the URL to replace, the tokenized collection will contain a single item, the full, original URL path string. So why not just return that?
Optimising the Code
Before taking on any optimisations for code, there are two things I like to do. First, I want to check there are sufficient unit tests of the existing code. Just as with any refactoring, it’s possible to break the current behaviour. When no tests are present, I start by creating some which exercise the existing behaviour. After completing any optimisations, if the tests still pass, then nothing has been broken. For brevity in this post, I won’t show unit tests since they are a familiar concept to many developers.
The second pre-optimisation step is to create a benchmark of the existing code so that we can later confirm that our changes have made things better and measure the improvement. Assumptions about performance are dangerous, and it’s safest to ensure that a scientific approach is taken. Establish your theory, measure the existing behaviour, perform your experiment (code optimisation) and finally, measure again to validate the hypothesis. Writing benchmarks may be something you’re less familiar with. As a primer, you can view my post about Benchmark .NET.
In this ToUrl example, the benchmark was reasonably straightforward.
There are some static fields used to set up the types we are benchmarking and any inputs we require. We don’t want to measure their overhead in the benchmarks. I then included two benchmarks, one for each URL pattern. We expect to optimise the pattern which does not require a replacement from the route values, but it’s worth testing the alternative case too. We don’t want to improve one path, but negatively impact another.
The results from the initial run, before any code changes, were as follows:
This gives us a baseline to compare against once we finish our work.
In the ToUrl method, we want to short circuit and avoid the URL building for paths where there are no parts which we need to replace from the route values. We can achieve that with the promised two lines of additional code.
Adding these two lines (well four if you prefer braces around the return statement) to the beginning of the method is all we need here. This code performs three logic checks. If they all return true, then we know that we have a single URL token which requires no replacements, so we can return it directly. The first check ensures we have no route values from the user. If we have route values, then we should assume there is some replacement to do. We also check that we have a single item in the tokenized array and that the first character of that item does not begin with the reserved ‘@’ character.
In the case of a standard cluster health request where no index name is provided, the conditions would be met and the original string containing “_cluster/health” can be returned directly from index zero of the tokenized array.
I don’t consider these extra lines to be a complex code change. Most developers will be able to read this and understand what it’s doing. For completeness, we could consider refactoring the conditionals into a small method or local function so that we can name it, to help the code be more self-documenting. I haven’t done that here.
Now that we’ve made the change, and ensured that the unit tests still pass, we can re-run the benchmarks to compare the results.
The second benchmark “HealthIndex” is unchanged since part of the URL had to be replaced, and so the full method was executed as before. However, the more straightforward case in the first benchmark “Health”, is much improved. There are no longer any allocations on that code path, a 100% reduction! Instead of allocating the StringBuilder and creating a new string, we return the original string, in this case, already allocated when the application starts.
A saving of 160 bytes might not sound that exciting but when we consider the fact that this occurs for every request sent by the client, it soon adds up. For just 10 requests (where no route value needs to be replaced) we save over 1Kb of unnecessary allocations. In consumers which use Elasticsearch extensively, this will quickly become a worthwhile improvement.
There is also an 87% reduction in execution time since the only code which has to execute in this case is the conditional check and method return. These improvements are a great win on this hot path and benefit any consumers calling the method. Since this is a client library, consumers see the benefit, simply by using the latest release of the client which includes the optimisation.
Summary
In this post, we introduced the idea that not all performance optimisations need to be complex to implement. In this example, we optimised the ToUrl method of the NEST library by conditionally avoiding executing code we know would cause allocations. While we could theorise about some more extensive optimisations using Span<T>, we focused first on a quick win, which didn’t introduce complicated and challenging to maintain code. To ensure our change was indeed an improvement, we used a benchmark to measure the performance before and after the change. Whilst not shown in the example, unit tests should be applied to avoid introducing regressions to the behaviour of the method.
Hopefully, this example was useful to identify where quick wins may exist in your own code. When looking to optimise your code base, focus on hot paths, start simple, and try to address quick wins before jumping to more complex optimisations. Some changes, such as the one shown in this post should be reasonable in most code bases, while more advanced optimisations may make the maintenance burden too high. As with this example, some optimisations can be as simple as avoiding the execution of existing code with a conditional check.
Happy optimising!
Have you enjoyed this post and found it useful? If so, please consider supporting me: