How Does the StringBuilder Work in .NET? (Part 1)

Part 1: Why do we need a StringBuilder and when should we use one?

After becoming proficient in .NET and C#, developers are likely to learn that they should use a StringBuilder to optimise string manipulation and concatenation. This is not a hard and fast rule for all situations but is generally good advice if your code combines or modifies strings repeatedly, particularly if the number of modifications is unbounded and not known until runtime.

In this post, I want to begin a series of posts that I hope will be useful to developers looking to understand why this advice exists and how the StringBuilder is implemented to support more efficient string manipulation. Today, we will focus on understanding the problem that the StringBuilder class is designed to solve and when it makes sense to use it in our code.

I’ve covered string manipulation in detail in my recent Pluralsight course, “String Manipulation in C#: Best Practices“. If you have a subscription, please add the course to your playlist to learn in depth how strings work in .NET and the best practices you should apply to work with them effectively!

Why can String Manipulation be Inefficient?

Each modification or concatenation of a string causes an allocation. This is because strings are immutable. Anything that appears to modify an existing string is, in fact, allocating a new string with the changes applied.

Take the following console application code:

var stringA = Console.ReadLine();
var stringB = Console.ReadLine();
stringA = stringA + stringB;

The preceding code accepts two strings from the user and then concatenates them using the plus operator, assigning the result to stringA. You can easily be forgiven for assuming that perhaps we’re mutating the first string in this code. In fact, since stringA and stringB are both immutable, a new string must be created to hold the combined string. Behind the scenes, the + operator calls the static Concat method on the string type, allocating a brand new string on the heap. The assignment to stringA purely updates the reference which that local variable points to, allowing us to access the new string.

A brief summary of string implementation details

The string type is a class and is therefore allocated on the heap. All classes have some overhead, and then, of course, we need to store the characters of the string. Internally, a Char buffer is used to store the characters of the string. Each Char in .NET represents a UTF-16 encoded character which is a variable-length format. Skipping over the complexities of Unicode encoding, we can for now understand that the standard English alphabet characters require two bytes per letter. Finally, some padding may need to occur to align the boundary of the object to 8 bytes (for x64).

Let’s assume that the user provides the word “hello” as the first input and the word “world” as the second. Both strings require 32 bytes each on the heap. After the concatenation, we have a third string 42 bytes in size. Notice that we don’t simply add the size of each string together to calculate the size of the final concatenated string. Each of the original strings have their own object overhead. We only incur that overhead once in the final string. The exact mechanics of this is not crucial to understand but still kind of interesting.

Introducing a StringBuilder to Optimise Allocations

The previous code concatenates just two strings and is actually about as efficient as you can get for that scenario. If you have similar code in your applications and are advised to switch to a StringBuilder, that is probably bad advice.

While a StringBuilder can avoid string allocations using its own internal buffer of Chars to allow sequences of characters to be manipulated efficiently, it has some overhead. A StringBuilder is a class, and creating a new instance will allocate 48 bytes on a 64bit machine before you even begin using it. It also causes a Char array to be allocated as the initial buffer. By default that will occupy a further 56 bytes. If we were to use a StringBuilder to join the two user-provided strings in the previous code block, it would still have to allocate a final string when we call its ToString method, so its overhead would, in fact, make using it less efficient. We’d still have the same three string allocations, and now the allocation for the StringBuilder and its array buffer, so that’s two extra objects compared to the original code.

Let’s switch focus to some different code:

const string testString = "test string";
 
var output = string.Empty;
var iterations = int.Parse(Console.ReadLine() ?? "0");
for (var i = 0; i < iterations; i++)
{
    output += testString;
}

You’re unlikely to see precisely this code in an actual application, but the situation it represents is not uncommon in some form or another. It accepts user input which dictates how many times it will concatenate a string. It starts with an empty string, and then, on each iteration, it concatenates the testString onto the end of it, growing the output string each time.

The critical consideration here is that the number of iterations is unbounded, and we cannot predict during development how many iterations a user will choose. It may be two, but it could also be two thousand. This situation can occur in various forms when performing functions based on user input or perhaps data loaded from a file or over the network.

Let’s assume the user selects 100 iterations when running the previous block of code. After concatenating the testString 100 times, the final string requires 2,222 bytes of memory on the heap. Since we want this final string, that allocation is unavoidable and not a problem. However, if we profile the application and capture the memory traffic during the string concatenation, it reveals something crucial. 99 other strings are allocated during the for loop, each growing in size as the testString is concatenated to the end of the previous string. A memory profiler reveals that 111,034 bytes are allocated for these strings while executing the for loop, all of which are temporary and not required after the next iteration. They will occupy memory in generation 0 of the heap until the next garbage collection kicks in.

You may not worry about 111Kb of memory which will quickly be reclaimed, and in some applications, this could be acceptable. For example, if this code runs once when an application starts, we may write off concerns about the efficiency of this code. Imagine, though, that such code runs inside an action method of an ASP.NET Core application. This could now be on a hot path as it will cause each HTTP request to an endpoint of our application to incur unnecessary memory allocations. At scale this could easily cause more GC pauses than is really necessary.

Concatenating Efficiently with a StringBuilder

A StringBuilder is advised as an alternative in such situations because it supports modification and concatenation in a far more optimal way, allowing the characters to be manipulated with fewer allocations. We’ll learn about the implementation details, resulting in more efficient memory usage starting in part two of this series. For now, let’s close out this part by comparing the difference when we use a StringBuilder for the concatenation.

const string testString = "test string";
var iterations = int.Parse(Console.ReadLine() ?? "0");

var str = new StringBuilder(); 
for (var i = 0; i < iterations; i++)
{
    str.Append(testString);
} 
var output = str.ToString();

This code is still pretty easy to read and understand. That’s important as some optimisations may incur a readability penalty. This can sometimes lead to the encouragement not to prematurely optimise code. In this case, we don’t negatively impact how easy this code will be to maintain, so deciding to optimise should be an easier decision to make. We should still profile the application to ensure that the optimisation actually improves the performance in the way that we expect.

The difference here is that we are appending the testString by calling the Append method on the StringBuilder. Crucially, this is not causing a new string to be allocated on each iteration. Instead, an internal buffer holds the Chars and can “expand” as more characters are appended to the end of the existing data. Note that I’ve used the term expand a little casually here. As we’ll see when we dig into the internals, the way a StringBuilder grows to accomodate ever increasing characters is slightly more involved. For now, we don’t need to worry about how it works, so we’ll focus on the effect in terms of memory allocations that occur when the code runs.

I captured the memory traffic using JetBrains dotMemory, and the relevant allocations for 100 iterations are as follows:

Type	Allocated Bytes	Allocated Objects
StringBuilder	384	8
String	2,222	1
Char[]	4,288	8
RuntimeType	40	1

In total, 18 objects are allocated here, including the final string we’re after. Those require, in total, 6,934 bytes on the heap. 2,222 of those bytes are the final string we need, so the overhead of the concatenation process is just 4.7kb. Remember that when we concatenated manually without using the StringBuilder the cost was 111kb. That’s a substantial saving for a trivial code change.

You may already be wondering why there are eight StringBuilder instances in the above table, which is a really great question. I promise we’ll get to that in a future blog post as it requires a deeper dive into the internals of the StringBuilder implementation.

Summary

In this blog post, we have learned about and observed the effect of using a StringBuilder when concatenating a large number of strings. The code samples are overly simplified to keep the analysis simple. Still, hopefully, you can appreciate the distinction between concatenating a small number of strings vs. concatenating many strings, particularly in situations when the number of concatenations is unknown until runtime.

When concatenating 100 strings inside a tight loop, we saw that we could avoid over 95% of the unnecessary allocations. Such allocations can add sufficient memory pressure to trigger a garbage collection in your application.

Join me in the next part of this series to learn more about how the StringBuilder works. And remember; if you want to learn more about using strings in C# .NET applications, please check out my course on Pluralsight.

Steve Gordon

Steve Gordon is a Pluralsight author, 6x Microsoft MVP, and a .NET engineer at Elastic where he maintains the .NET APM agent and related libraries. Steve is passionate about community and all things .NET related, having worked with ASP.NET for over 21 years. Steve enjoys sharing his knowledge through his blog, in videos and by presenting talks at user groups and conferences. Steve is excited to participate in the active .NET community and founded .NET South East, a .NET Meetup group based in Brighton. He enjoys contributing to and maintaining OSS projects. You can find Steve on most social media platforms as @stevejgordon

Part 1: Why do we need a StringBuilder and when should we use one?

Other posts in this series:

Why can String Manipulation be Inefficient?

A brief summary of string implementation details

Introducing a StringBuilder to Optimise Allocations

Concatenating Efficiently with a StringBuilder

Summary

Other posts in this series:

Other posts in this series:

Steve Gordon