Creating Strings with No Allocation Overhead Using String.Create Writing High-Performance C# and .NET Code: Part 4

In this post, I’ll continue my series about writing high-performance C# and .NET code. This time, I will focus on a new(ish) method available on the String type – String.Create. First introduced in .NET Core 2.1, this method is currently planned for inclusion as part of .NET Standard 2.1 once that is released.

What Does String.Create Do?

The String.Create method supports the efficient creation of strings that need to be built or computed at runtime. Before I expand on this, let’s take a moment to cover some facts about strings.

  • In .NET strings are a prevalent type, used to represent text data.
  • Strings are reference types and their data is stored on the managed heap.
  • By design, strings are immutable, which means that once created, their data cannot be modified.

The combination of these facts leads to a problem with strings from a high-performance perspective. At a high-level, our goal when writing high-performance code is often to reduce the execution time of running that code and also to remove memory allocations. Operating on strings often results in an excess of allocations due to their immutable nature. If we want to extract part of a string, it results in the creation of a new string and the copy of the string data between the memory occupied by the old and new strings. If we want to convert a string to uppercase, that too results in a new string being allocated on the heap.

If we want to create a string programmatically, using data available only at runtime, that also presents a problem. Concatenating strings will also cause allocations and copies. For long strings, particularly those composed of many component parts, this cost can add up significantly.

This doesn’t mean that strings shouldn’t be used when it’s appropriate to do so, but it becomes a concern when writing highly optimised code. A standard solution, used when constructing strings at runtime, is to use a StringBuilder which uses an internal buffer to which characters are appended. When you call the build method on the StringBuilder, this causes the final string allocation. When concatenating more than a few elements, StringBuilder will usually be more efficient than plain concatenation (always benchmark to validate this in your scenarios). StringBuilder still requires an intermediate buffer for the characters so there’s a heap allocation there, plus a copy when building the string from the buffer. StringBuilder itself is a class, so there’s an allocation involved in using one. In ASP.NET Core the team have worked around this allocation cost on hot paths by pooling and sharing instances of StringBuilder where it makes sense to do so, in places such as middleware for example.

When Would I Use String.Create?

String.Create is not something that you will need during everyday development. It has a specific purpose, which is to create a string pragmatically from some existing data, or potentially just via an algorithm, in a high-optimised way. The primary optimisation, in this case, is that it helps us avoid unnecessary allocations and copying of data. We’ll look at a worked example in a few minutes, but before that, let’s consider some more general use cases. Inside the Kestrel web server for ASP.NET Core, unique IDs are created per request. In that case, the requirement is to build a string of a known length and format, which will uniquely identify the request. Since this may be done many thousands of times per second, making this perform well is crucial. String.Create allows the string to be constructed efficiently in that scenario.

How Does String.Create Work?

String.Creates provides a very short window where we are allowed to essentially break the immutability rule of strings. That may sound scary to some, but it’s not as bad as I make that sound. The window where data mutation can occur is only before the first reference to the string is returned. There’s no possibility to modify the data of an existing string after this brief window.

Internally, String.Create allocates a suitable portion of memory on the heap to contain the char array for the string data. For this to work, the method takes as its first parameter, the length required for the string. This is an important limitation, and you must know or be able to calculate up front, the exact character length of the string that you wish to create.

Here’s the signature for the Create method:

public static string Create<TState> (int length, TState state, System.Buffers.SpanAction<char,TState> action);

The method takes a second parameter, which is the generic state needed to construct the string. We’ll come back to the state in a few moments.

Finally, the create method accepts a delegate that is expected to operate on the allocated heap memory to set the final string data. In this case, the parameter is a SpanAction, which is defined in System.Buffers. Since a Span<T> type cannot be used as a generic type argument, the standard Action delegate cannot be used. Instead, SpanAction supports taking the type which will be used as the type for the internal Span<T>. In this case, we’re working with chars.

The SpanAction delegate is where the power lies. After the char[] memory needed for the string is allocated, the delegate we pass can then be used to populate the characters within that array. Once the delegate completes, the string which uses that array internally is then returned, having had its value set appropriately.

Let’s consider for a moment one of the lowest allocation ways we could build up the string without this method. We could potentially use a temporary char array as a buffer to build up the data for the string, then pass that array to the constructor for the string. That is essentially what StringBuilder does for us. This approach would result in two allocations, one for the buffer and one for the string. There would also be some memory copying occurring between the arrays involved.

Here’s what that might look like:

Another option would be to use unsafe code, or in .NET Core 2.1 and later, we can use the Span<T> support to safely use a small stack allocated buffer instead of the heap allocated array. As long as the size of the buffer is not too large, this would be a good option and we’d be down to one heap allocation, just for the final string. There would, however, be a copy needed to get the data from the stack memory and into the string heap memory. That has a small execution time cost.

The changes to achieve this in our example Main method would look like this:

Getting back to String.Create, we can now understand how this gives us the best possible performance. By avoiding the need to pre-buffer our characters, even if that is on the stack, it means that our logic, used to construct the string is acting directly on the final region of memory which the string will reference. Done correctly, we can build strings programmatically, with no intermediate allocations and very high-performance.

Within the SpanAction, we have access to the Span<char> over the memory which the string occupies. We can modify that memory via the Span, slicing into the appropriate positions and writing characters into the underlying array. The state that we have passed in will allow us to use existing data to build up our string. There’s an important point here which you may already be wondering about. Why does the state get passed into the Create method directly? Why can’t we just reference the data we need from the delegate code?

The reason is that the latter approach would result in a closure if we capture a variable. The compiler will have to generate a class to deal with this which is a heap allocation we want to avoid here. Also, a closure here would prevent caching of the delegate, which itself is a performance hit we cannot afford. Instead, the Create method accepts the state as a parameter to avoid the need for the delegate to form a closure.

This is a little complex to explain, but the take away here is to make sure any objects that you need to access in order to create your string are included in the state. If you have more than one object to pass, the recommended pattern is to use a ValueTuple. Since this is a struct it does not allocate anything and once inside the delegate you can deconstruct it to get the constituent parts.

A Quick Example of Using String.Create

Before we dive into a real-world example, let’s quickly look at how to use String.Create.

The comments in this code explain what is happening, step-by-step. In summary, we have a ContextData object which contains three strings we want to use to build a final string. First, we calculate the length needed for our final string, which includes the component parts and the spaces between them. We pass the length to string.Create as well as passing the context as the state argument. Finally, we define the code for the SpanAction delegate which slices into the underlying Span<char> to copy the component parts into the correct locations within the final string. All of this occurs with the single heap allocation for the memory needed by the string.

How To Use String.Create – A Real World Example

Let’s now look at a worked example based on a real situation I faced. Note that this is still demo code. It’s based on a production requirement I had but I have simplified it so that we can focus on specific techniques. I’m reasonably sure it can be further optimised!

In my talk, ‘Turbocharged: Writing High-Performance C# and .NET Code’, I discuss an example of a service where after reading a message from an AWS SQS queue, I need to store the message body into an S3 bucket. When storing a blob into S3 we must provide a unique key for the object. As a result, this service must compute the key to pass into the AWS SDK when uploading the object. In our case, this occurs 18 million times per day so even a small gain in performance can have a significant effect at scale.

The key is formed of eight elements from the incoming message. Only lowercase letters, numbers and underscores are allowed in the final key and any spaces should be converted to underscores. The first approach to building the string used an array to hold the constituent parts and then joined the pieces together to form the final string. I won’t show all of the code in this post, but you can check out an example in my GitHub repo.

The second iteration used a stack allocated array of chars as a buffer to form the final data for the string. Using a Span<char> over that memory, I was then able to copy the various elements into the stack allocated buffer. Calling ToString on the Span<char> resulted in the creation of the final string for the object key. Again, I won’t show that code here since it’s quite lengthy. That is also available in my repo if you want to check it out.

In the final iteration, I utilised String.Create which meant I could avoid the memory copy from the stack allocated buffer into the heap memory for the string. If you’d like to explore that code, that too is in my GitHub repo.

Do bear in mind that these samples are not fully optimised and are designed to demonstrate some specific techniques rather than complete optimisations. The String.Create in my case is only marginally quicker in the benchmarks that I’ve run. I’ll be exploring that more deeply in the future. Here is my benchmark result comparing the two approaches.

In most cases, the String.Create approach is a few nanoseconds quicker, but in some benchmark test runs, it came up a few nanoseconds slower. Potentially I have some further optimisations I could apply to the transformation logic which may account for this. Logically, the less work needed to copy data from the stack memory into the strings heap memory should be more efficient, but its always worth testing for your actual scenario.

To investigate this, I did some benchmarking of purely String.Create vs. stackalloc creation. For shorter lengths of strings, stackalloc seemed to only be marginally quicker. Here’s a benchmark where I combined short strings of 10 characters using both approaches. The count in this case in the number of strings combined in each test. At only five items there’s not much in it at all. By the time it gets to combining 100 strings, the performance improvement from using String.Create is more apparent.

If you are interested in another example use case for String.Create, I’ve identified a place in the ASP.NET Core code-based where String.Create should improve performance. I’ve raised a GitHub issue demonstrating this and hope to be involved in creating a PR to proposed the final optimisation.

String.Create Best Practices

There has been quite a lot of information in this post to explain a single method. Let’s conclude by reviewing the most important points.

  • String.Create provides a high-performance, low allocation approach to programmatically creating a string.
  • As with all performance optimisations, benchmark your original solution and ensure that the changes have a positive effect.
  • Avoid closures and make sure that you don’t capture external variables in your SpanAction delegate.
  • Use ValueTuples to pass multiple objects for the state.

Limitations of String.Create

Using String.Create is more involved that some of the other ways to create a new string that you may be familiar with. I don’t recommend using this everywhere, but in hot paths, in performant applications, it may provide some worthwhile gains. The biggest limitation which you may hit is that you must know in advance (or be able to calculate) the exact length of the string which you require. You may need to access the lengths of all of the constituent state objects in order to compute the length of the final string. In some cases, where you have a lot of conditional logic when building a string, just knowing the lengths of the parts might not be enough.

Summary

String.Create is useful in high-performance scenarios. It’s relatively straight-forward to use once you understand the rules about how it operates. Therefore, it’s a tool worth remembering if you are optimising hot paths in your applications and may have a significant gain in applications with parse and produce strings often as part of their primary function.