Accessing State in System.Text.Json Custom Converters - Steve Gordon

In this post, I describe several techniques that can provide additional state to custom JsonConverters when using System.Text.Json.

While building the new .NET client for Elasticsearch, one of the key objectives I gave myself was to move away from the internal Utf8Json-based serializer used in v7. The obvious choice was to look at redesigning the serialization using System.Text.Json.

System.Text.Json was introduced as part of the .NET Core 3.0 release. Since that release, it ships “in the box” as part of the base class libraries. It is also available as a .NET Standard compatible NuGet package, which can also be consumed from .NET Framework projects. This namespace and its types support (de)serialization of JSON to/from .NET types. It was designed to provide a modern JSON library as part of the BCL, focusing on high performance.

System.Text.Json

System.Text.Json is a good choice for the v8 Elasticsearch client for .NET for several reasons:

It’s included as part of the BCL for many versions and therefore requires fewer additional dependencies.
Its high-performance design is suited to our JSON-heavy workloads.
Microsoft fully supports it.
We can remove complex and difficult-to-maintain code from our client assembly.

Some types used in the client are quite complex to model and present extra challenges for serialization. In particular, Elasticsearch uses requests and responses that can include polymorphic data. For example, one of many possible queries may be sent when performing a search. Each query may have different properties. Similarly, search responses may include a variety of different aggregations.

The solution for these more complex types in the Elasticsearch .NET client is to leverage custom converters. With custom converters, we have complete control over reading and writing the JSON to serialize it to and from our objects. The v8 Elasticsearch .NET client includes many custom converters, some manually crafted and some code-generated.

In some situations, these converters require access to extra state to serialize the types correctly. One typical example is when we have types which include field names. In the client, we support a concept of inference which is where properties on a type can be used to infer the name of things, such as fields. This reduces the use of magic strings which can be quite brittle. We use information from the ElasticsearchClientSettings instance to correctly infer field names based on any configuration the user may have provided.

The upshot is that we need access to the ElasticsearchClientSettings instance inside many custom converters. Fortunately, there are a few ways to achieve this. We’ll begin with the technically correct way to handle this before learning about some of the downsides and limitations it imposes.

Registering Customer Converters on JsonSerializerOptions

Custom converters can be registered for types in several ways. We can add an attribute to types or properties specifying the specific converter that should be used. In this scenario, an instance of the converter is created on-demand the first time a type is (de)serialized and cached for reuse across subsequent serialization operations. When using this approach, converters must include a parameterless default constructor so that an instance can be created when needed. This requirement makes it difficult to supply additional state for the converter.

The solution is to register an instance of the converter with the JsonSerializerOptions instance. Using this mechanism, we can control the creation of the converter instances and call other constructors, passing additional state into them via their arguments.

Take, for example, this simplified FieldConverter:

internal sealed class FieldConverter : JsonConverter<Field>
{
	private readonly IElasticsearchClientSettings _settings;

	public FieldConverter(IElasticsearchClientSettings settings) => _settings = settings;

	public override Field? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
	{
		// Omitted for brevity
	}

	public override void Write(Utf8JsonWriter writer, Field value, JsonSerializerOptions options)
	{
		if (value is null)
		{
			writer.WriteNullValue();
			return;
		}

		var fieldName = _settings.Inferrer.Field(value);

		if (string.IsNullOrEmpty(value.Format))
		{
			writer.WriteStringValue(fieldName);
		}
		else
		{
			writer.WriteStartObject();
			writer.WritePropertyName("field");
			writer.WriteStringValue(fieldName);
			writer.WritePropertyName("format");
			writer.WriteStringValue(value.Format);
			writer.WriteEndObject();
		}
	}
}

The constructor accepts and stores an instance of IElasticsearchClientSettings. The Write method accesses the settings instance to infer the field name for the Field instance being serialized. Consequently, I cannot assign this converter to the Field type using an attribute.

Instead, I must register the instance with the JsonSerializerOptions for the request/response serializer.

public DefaultRequestResponseSerializer(IElasticsearchClientSettings settings)
{
	Options = new JsonSerializerOptions
	{	
		DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull,
		IncludeFields = true,
		Converters =
			{
				new FieldConverter(settings),
				// Many other converters
			},
		PropertyNamingPolicy = JsonNamingPolicy.CamelCase
	};

	_settings = settings;
}

This isn’t too painful when the number of custom converters remains small. It’s arguably a bit more problematic when there are lots of converters. The main concern is that with this approach, we create an instance of each converter upfront before its ever used. Since we cannot know in advance which features of the library consumers may use, we may be allocating some converters which are never required. But that’s not the main problem with this approach.

Generated Customer Converters

The Field type that uses the FieldConverter shown above is a manually created class in the library. Registering the custom converter is simple because I can also manually ensure it’s added to the options. However, we now generate most of the Elasticsearch .NET client library types from a specification. This means that any generated types requiring a converter that uses the IElasticsearchClientSettings would also need to be registered with the options instance. I had a few ideas for solving this, but none were particularly pleasing. I was also conscious of trying to avoid an explosion of converter instances on the options which may never be needed. I landed on a reasonably clever hack that worked quite well.

Misusing Get Converter

My approach was relatively simple. Define a custom converter that does no actual conversion but can be retrieved when required to grab the IElasticsearchClientSettings.

internal sealed class ExtraSerializationData : JsonConverter<ExtraSerializationData>
{
	public ExtraSerializationData(IElasticsearchClientSettings settings) => Settings = settings;

	public IElasticsearchClientSettings Settings { get; }

	public override ExtraSerializationData? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) => throw new NotImplementedException();
	public override void Write(Utf8JsonWriter writer, ExtraSerializationData value, JsonSerializerOptions options) => throw new NotImplementedException();
}

This converter is defined as a converter for its own type. This is weird but perfectly fine for how I intended to use it. Its Read and Write methods are not implemented. It accepts an IElasticsearchClientSettings instance in its constructor and exposes it through a read-only property.

This can then be registered with the JsonSerializerOptions, just as before.

public DefaultRequestResponseSerializer(IElasticsearchClientSettings settings)
{
	Options = new JsonSerializerOptions
	{	
		DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull,
		IncludeFields = true,
		Converters =
			{
				new FieldConverter(settings),
				new ExtraSerializationData(settings)
				// Many other converters
			},
		PropertyNamingPolicy = JsonNamingPolicy.CamelCase
	};

	_settings = settings;
}

Now the clever bit! I can retrieve my ExtraSerializationData converter from the options in code-generated converters that require access to settings.

internal sealed class SortOptionsConverter : JsonConverter<SortOptions>
{
	public override SortOptions Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
	{
		// Omitted for brevity
	}

	public override void Write(Utf8JsonWriter writer, SortOptions value, JsonSerializerOptions options)
	{
		writer.WriteStartObject();
		if (value.AdditionalPropertyName is IUrlParameter urlParameter)
		{
			var extraData = options.GetConverter(typeof(ExtraSerializationData)) as ExtraSerializationData;
			var propertyName = urlParameter.GetString(extraData.Settings);
			writer.WritePropertyName(propertyName);
		}
		else
		{
			writer.WritePropertyName(value.VariantName);
		}

		// Omitted for brevity

		writer.WriteEndObject();
	}
}

[JsonConverter(typeof(SortOptionsConverter))]
public sealed partial class SortOptions
{
	// Omitted for brevity
}

The JsonSerializerOptions exposes a GetConverter method which accepts a Type as its argument. This resolves an instance of a JsonConverter for the Type. The write method calls this method to get the converter for the ExtraSerializationData type (which is in fact itself). As the return type of the GetConverter method is JsonConverter, it must be cast as the ExtraSerializationData.

With the converter retrieved, the code can use the Settings property to access the IElasticsearchClientSettings instance required to perform its conversion. As I described on Twitter, this is a technique I was equally proud and ashamed of at the same time!

API Proposal

After this experience and some responses to my tweet, it was clear that I wasn’t the only person with this requirement. I decided to open an issue on the .NET runtime repository to propose solving this in the public API.

Two solutions came to mind. The first was to unseal the JsonSerializerOptions type, which consumers such as myself could then extend with additional properties. We could then cast the JsonSerializerOptions to our derived type inside converters to access the extra state. A second option (which I made the primary proposal) was to support a property bag on the existing JsonSerializerOptions type. This would allow consumers to store objects in a dictionary for later retrieval.

At the time of writing, this proposal has been closed in favour of another GitHub issue tracking this requirement. Fortunately, Eirik Tsarpalis, one of the Microsoft developers involved in System.Text.Json, provided another solution.

Update!! My proposal issue has now been reopened as the alternative proposal did not solve the same requirement identified in this post. Hopefully both proposals may make it into a future release.

Leveraging a ConditionalWeakTable

Eirik asked if I had considered using a ConditionalWeakTable to dynamically attach data to options instances. I replied that I hadn’t, as I was unaware of that type’s existence! I quickly referred to the documentation to learn about the type.

The documentation only includes the API definition with a single sentence summarising the purpose of this type.

Enables compilers to dynamically attach object fields to managed objects.

There is a little more detail in the examples and remarks. In essence, this type provides a way to link two runtime object instances but in such a way that those instances are not rooted to the GC forever. The key is a managed object to which we want to attach additional properties at runtime.

The references used inside the implementation are weak, and the keys stored in the table do not persist once references to the object outside the table are destroyed. This would only be a concern in the case of my library should a consumer null all references to the ElasticsearchClient and/or dispose of it in their code. If a regular dictionary were used instead of the ConditionalWeakTable then the JsonSerializerOptions may never be disposed of in such a case. In reality, this is not the likely behaviour for consumers who generally should have a singleton instance of ElasticsearchClient for the life of their application, but that’s not guaranteed.

It sounded promising as a cleaner solution for my work on the .NET client, so I proceeded to apply it to my code. I would first need a singleton instance of a ConditionalWeakTable to hold my settings. I added a static property to the ElasticsearchClient type that would allow my internal code to retrieve the IElasticsearchClientSettings corresponding to an instance of JsonSerializerOptions used by the default request/response serializer for the client. I named this property SettingsTable.

public sealed partial class ElasticsearchClient
{
	internal static ConditionalWeakTable<JsonSerializerOptions, IElasticsearchClientSettings> SettingsTable { get; } = new();
	
	// Omitted for brevity
}

After creating the JsonSerializerOptions for the serializer, I also make sure to add an entry to the SettingsTable, weakly tying the options to the IElasticsearchClientSettings for the client.

public DefaultRequestResponseSerializer(IElasticsearchClientSettings settings)
{
	Options = new JsonSerializerOptions
	{	
		DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull,
		IncludeFields = true,
		Converters =
			{
				new FieldConverter(settings),
				// Many other converters
			},
		PropertyNamingPolicy = JsonNamingPolicy.CamelCase
	};

	ElasticsearchClient.SettingsTable.Add(Options, settings);

	_settings = settings;
}

Next, I needed a way to access the settings on-demand inside converters. As this task would occur in several places, I decided to introduce an extension method for this:

internal static class JsonSerializerOptionsExtensions
{
	public static bool TryGetClientSettings(this JsonSerializerOptions options, out IElasticsearchClientSettings settings) =>
		ElasticsearchClient.SettingsTable.TryGetValue(options, out settings);
}

This extension method is defined for the JsonSerializerOptions type, and it uses the SettingsTable to attempt to look up the settings instance which relates to the JsonSerializerOptions. Converters, including code-generated ones, can easily access the settings they need.

internal sealed class WildcardQueryConverter : JsonConverter<WildcardQuery>
{
	public override WildcardQuery Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
	{
		// Omitted for brevity
	}

	public override void Write(Utf8JsonWriter writer, WildcardQuery value, JsonSerializerOptions options)
	{
		if (value.Field is null)
			throw new JsonException("Unable to serialize WildcardQuery because the `Field` property is not set. Field name queries must include a valid field name.");
		if (options.TryGetClientSettings(out var settings))
		{
			writer.WriteStartObject();
			writer.WritePropertyName(settings.Inferrer.Field(value.Field));
			writer.WriteStartObject();
			// Omitted for brevity
			writer.WriteEndObject();
			writer.WriteEndObject();
			return;
		}

		throw new JsonException("Unable to retrieve client settings required to infer field.");
	}
}

The above example, taken from a code-generated converter, calls TryGetClientSettings on the JsonSerializerOptions passed into the Write method. We always expect the settings to be accessible via the ConditionalWeakTable but will throw an exception should that not be the case for some reason. Once the converter method has access to the settings, it can complete its serialization work.

Since this converter no longer requires a constructor to accept the IElasticsearchClientSettings, it doesn’t need to be added directly to the collection of converters registered with the JsonSerializerOptions and can be created on-demand by the System.Text.Json library if any types require it.

[JsonConverter(typeof(WildcardQueryConverter))]
public sealed partial class WildcardQuery : Query
{
	// Omitted for brevity
}

The WildcardQuery is attributed with the JsonConverterAttribute to define its converter. The converter instance is only created if the consuming application defines a WildcardQuery, which needs to be serialized as part of the search request.

NOTE: Eric from Microsoft highlighted that for extra efficient of avoidning the ConditionalWeakTable lookup per serialization operation, a ConverterFactory could be used to perform that work once per converter. It’s a good suggestion which I’ll review, measure and likely implement at some point. Once I do, I’ll try to follow up with a new post on that improvement.

Summary

I’m pretty pleased with the end result that the ConditionalWeakTable approach enables. It solves my two main challenges, allowing me to avoid creating potentially unused converter instances purely to register them with the JsonSerializerOptions. It simplifies the generated code for converters which can leverage my extension method to access settings if they require them for inference.

The ConditionalWeakTable class was unknown to me and not an obvious choice. I still think that the System.Text.Json library can and should solve this in a more discoverable way for consumers. While it’s a more advanced requirement, I’m sure others may require additional static state for their customer converters. Until this .NET runtime issue is resolved and .NET ships with an in-the-box solution, perhaps consider using a ConditionalWeakTable for the job.

Have you enjoyed this post and found it useful? If so, please consider supporting me:

Buy me a coffee

Steve Gordon

Steve Gordon is a Pluralsight author, 6x Microsoft MVP, and a .NET engineer at Elastic where he maintains the .NET APM agent and related libraries. Steve is passionate about community and all things .NET related, having worked with ASP.NET for over 21 years. Steve enjoys sharing his knowledge through his blog, in videos and by presenting talks at user groups and conferences. Steve is excited to participate in the active .NET community and founded .NET South East, a .NET Meetup group based in Brighton. He enjoys contributing to and maintaining OSS projects. You can find Steve on most social media platforms as @stevejgordon