Part Two: Understanding the Overhead of a StringBuilder
To continue exploring how the StringBuilder works, we’ll shift focus and study its logical design. Today, we’ll start by looking at how the type is designed and the overhead involved with creating and using StringBuilder instances. If you missed part one of this series, I explained why you may decide to use StringBuilders in your application code for more efficient string manipulation. Don’t worry if you missed it, I’ll wait while you check that out first!
We’re starting to get into internal implementation details, so please remember that these details may change in future versions of .NET. I have used the current code from .NET 6 while researching this blog post. The design of StringBuilder has changed little in past versions of .NET, so I expect these details to remain broadly applicable to earlier .NET versions.
StringBuilder Memory Layout
In the previous post, we witnessed a reduction in allocations inside an string concatenation loop (with 100 iterations selected at runtime) when using a StringBuilder. We also learned that when concatenating a small, bounded number of strings, the StringBuilder may be less efficient. At the time, I mentioned that creating a StringBuilder introduces some additional overhead that should be considered when using them in our code. To understand that better, let’s dive into the side effects of the following line of code:
var sb = new StringBuilder();
We are creating an instance of StringBuilder using the parameterless constructor and are ready to use its capabilities to manipulate string data.
First, we must appreciate that a StringBuilder is a class which means that memory for each instance is allocated on the heap. All classes in .NET have some overhead required for their object header and method table information/references. I won’t go into the fine detail of how this works as I want to focus on the specifics of the StringBuilder. For the purpose of this post, it’s enough to know that the overhead will be either 8 bytes on a 32-bit system or 16 bytes on 64-bit architectures, although the minimum object size is 12 bytes or 24 bytes respectively.
I’ll assume we’re on x64 for the remainder of this post. Here is a diagram to help illustrate this information.
Next, the StringBuilder type has some internal fields which also contribute to its final size in bytes. Let’s discuss each of these in turn.
ChunkChars
internal char[] m_ChunkChars;
You’ll immediately notice that the fields defined within StringBuilder use the Hungarian Notation for their naming. This is likely a historic decision and is not recommended when defining your own types.
The StringBuilder works by maintaining a buffer of characters (Char) that will form the final string. Characters can be appended, removed and manipulated via the StringBuilder, with the modifications being reflected by updating the character buffer accordingly. An array is used for this character buffer. Since arrays in .NET are also a class, they are heap allocated, introducing a second object allocation when creating a StringBuilder instance.
The m_ChunkChars field holds a reference to a char[] on the heap. This is assigned in the constructor of the StringBuilder, which we can observe in the following code:
public StringBuilder()
{
m_MaxCapacity = int.MaxValue;
m_ChunkChars = new char[DefaultCapacity];
}
In this code, we can see that an array is initialized with a default capacity. What is that capacity, you may rightly be wondering?
internal const int DefaultCapacity = 16;
A constant defines that unless specified in the constructor arguments, the capacity of new StringBuilder instances will start at 16 characters. In the next blog post, we’ll learn how the StringBuilder can “expand” to support longer strings.
Let’s update our diagram with the information we have so far:
We have now included the array instance, which for 16 chars occupies 56 bytes on the heap. The StringBuilder field requires 8 bytes for its reference pointer to the array. Let’s move onto the next field.
ChunkPrevious
internal StringBuilder? m_ChunkPrevious;
This field is one I’ll be diving into more deeply in the next blog post (coming soon), as it will make more sense when we talk about expanding the capacity of the StringBuilder. For now, it’s helpful to understand that in some cases, rather than growing the array buffer to accommodate longer string lengths, the StringBuilder may form into a linked list of StringBuilder instances.
Each instance holds part of the final string data and is considered a chunk of the final characters. For this mechanism to function, a StringBuilder may include a reference back to the previous StringBuilder instance, the previous chunk of characters.
This field may hold that reference if the StringBuilder has “grown”. This is also a reference (pointer) to the char array on the heap and therefore requires a further 8 bytes to store.
For our unused StringBuilder, the m_ChunkPrevious field is null.
Final Fields
internal int m_ChunkLength;
internal int m_ChunkOffset;
internal int m_MaxCapacity;
The last three fields of the StringBuilder are all integer values used to manage information about the current chunk of characters and the overall maximum capacity that the StringBuilder may support. We’ll explore these in greater detail in future posts. The default maximum capacity is set as Int.MaxValue, so a new StringBuilder can support up to 2,147,483,647 characters.
Since integers are structs, the data is stored directly inside the StringBuilder, with each field requiring 4 bytes to hold the 32-bit integer.
Finally, on x64 architectures, 4 bytes are added to the end of the type to pad the StringBuilder to provide proper memory alignment. Here’s the final diagram:
We are now able to understand the initial memory overhead of creating a new StringBuilder for use by our application code. Two objects are allocated in total. Each StringBuilder instance requires 48 bytes on the managed heap. An array for the Chars is also allocated with a capacity of 16, requiring 56 bytes on the heap. This gives us a total overhead of 104 bytes for these two objects.
As we saw in the previous post, this overhead is more than worth it once we start concatenating more than two or three string instances together. Doing so can result in significant savings by avoiding intermediate string allocations. The cost of using a StringBuilder can increase as we expand it to accommodate longer sequences of characters which we will come to in this series.
That completes our dive into the fields used inside the StringBuilder, helping us appreciate how this class functions and what memory each instance occupies. Join me in part 3, where we will learn how chunks are added to expand the StringBuilder when data is appended. And remember; if you want to learn more about using strings in C# .NET applications, please check out my course on Pluralsight.
Other posts in this series:
- Part One – Why do we need a StringBuilder and when should we use one?
- Part Two – Understanding the Overhead of a StringBuilder
- Part Three – This post!
Other posts in this series:
Have you enjoyed this post and found it useful? If so, please consider supporting me: