For the last few weeks, I’ve been working on designing and developing a C# code generator. In this post, I want to explain some of the core concepts that I’ve learned so far and describe how you too, can get started with using the Roslyn APIs.
NOTE: The Roslyn APIs are a niche feature of .NET, and unfortunately, the documentation is pretty bare. I used a combination of intuition, reading the APIs docs and Googling for some examples to get me started. It’s entirely possible that there are improvements that can be applied to the code shown in this post. The sample code is not necessarily the optimal approach and I welcome ideas for improvements.
I would like to call out the helpful Roslyn Quoter site, created by Kirill Osenkov from Microsoft. This useful tool can be used to figure out how to represent C# code using an AST and the Roslyn API syntax. It tends to produce overly verbose code for some scenarios, but it’s a great way to get started.
Code Generation Requirements
The background for my requirement is that I now maintain the .NET client for Elasticsearch. While we already generate much of the core boilerplate code for the low-level client, our code generation for the NEST high-level client has been more limited. Generating the high-level client requires more detail around the endpoints exposed by Elasticsearch and details of the types used by the request and response data. As each minor release of Elasticsearch nears, I must manually implement the new APIs, aggregations, query DSL, and request/response models. Often, this process involves reading the documentation and exploring the Elasticsearch source code to glean enough information to then implement the feature in the client.
The language clients team are developing a type generator that takes in several build artefacts from Elasticsearch and uses them to produce a rich schema describing everything we need to know about the Elasticsearch APIs. It includes detailed descriptions of the types representing the requests, responses and the types used for their properties. This will serve many purposes internally, but one significant advantage is that we can use this rich schema to generate far more of the high-level client code automatically. This will free developer time to add more value by focusing on higher-level abstractions and improvements.
We are at the stage where we have a relatively mature version of this schema which uses an extensive integration test suite to validate its accuracy against actual requests and responses. The next stage in this journey is to explore how the various clients can take this schema and turn it into code through a code generation process. I trialled a few ideas for the initial proof of concept phase and settled on a C# application, which will eventually be a dotnet tool.
I’d also like to clarify that at this stage, I’m building a dotnet command-line tool that runs, parses the schema and produces physical C# files which can be included in the NEST codebase. These files then get checked in and live with the manually created source in the repository. I have considered using a new C# feature called source generators, which supports compile-time code-generation, using the C# Roslyn compiler.
I may return to that approach, but a disadvantage is that the code is generated at compile-time rather than being a physical, checked-in asset. This approach is excellent for some things, and I’m looking at it to potentially generate compile time JSON readers and writers for our types that can be optimised to avoid allocations during (de)serialisation.
For now, though, we’ll concentrate on using the Roslyn APIs from a console application to define our Syntax tree and use that to produce physical C# files as output.
Getting Started with Roslyn Code Generation
Because this is intended as an introductory post, I’m going to use a reasonably simplified example of generating code. Real-world code-gen examples will grow more complex than this. There are many ways to work with complete solutions and projects through workspaces. In this post, I’m going to avoid those and concentrate on a simpler example.
The sample code from this post can be found up on GitHub.
{
"types": [
{
"typeName": "FirstClass",
"properties": []
},
{
"typeName": "SecondClass",
"properties": []
}
]
}
We’ll start with this simplified JSON schema that defines an array of types. Each object has data relating to that type, including its name and an array of properties. For this post, I will leave the properties array empty, and we’ll focus on how to create stub classes from this schema.
The next step is to deserialise the schema, for which we’ll need some classes to represent the schema information.
public class Schema
{
public IReadOnlyCollection<SchemaTypes> Types { get; init; } = Array.Empty<SchemaTypes>();
}
public class SchemaTypes
{
public string TypeName { get; init; } = string.Empty;
public IReadOnlyCollection<string> Properties { get; init; } = Array.Empty<string>();
}
The above code defines two simple POCO types used during deserialisation of the type from the JSON file. The schema includes a collection of SchemaTypes. The SchemaType includes a property for the type name and can support an array of strings for the properties.
You may wonder about the use of the init keyword in the properties. Init-only setters were introduced in C# 9. They support properties that may be publicly set, but specifically only during the object’s initialisation. This assists in creating immutable types, while avoiding the need for complex constructors with potentially several overloads. They are a nice fit for this case since System.Text.Json can initialise them during deserialisation, but once initialised, we don’t expect them to be changed.
We’ll read our schema file from disk and use the System.Text.Json serialiser to generate our object instances.
var path = Directory.GetCurrentDirectory();
await using var fileStream = File.OpenRead(Path.Combine(path, "schema.json"));
var schema = await JsonSerializer.DeserializeAsync<Schema>(fileStream, new JsonSerializerOptions
{
PropertyNameCaseInsensitive = true
});
The preceding code attempts to read a schema.json file from the current directory. My project copies this alongside the compiled application.
After reading the file and deserialising, we should now have an in-memory collection of types that we will use during code generation. We’re now ready to use the Roslyn APIs to build up a simple syntax tree representing our source code.
First, we need to include a NuGet package that includes the Roslyn APIs. We’ll use the Microsoft.CodeAnalysis package for this. To add this, we can modify the project file to reference the package.
<ItemGroup>
<PackageReference Include="Microsoft.CodeAnalysis" Version="3.9.0 "/>
</ItemGroup>
We’re ready to generate C# code. Here’s the complete code that we’re going to use for the remainder of this example. Don’t worry if it’s not clear what this does right now; we’ll step through it together.
var members = schema?.Types.Select(t => CreateClass(t.TypeName)).ToArray()
?? Array.Empty<MemberDeclarationSyntax>();
var ns = NamespaceDeclaration(ParseName("CodeGen")).AddMembers(members);
await using var streamWriter = new StreamWriter(@"c:\code-gen\generated.cs", false);
ns.NormalizeWhitespace().WriteTo(streamWriter);
static ClassDeclarationSyntax CreateClass(string name) =>
ClassDeclaration(Identifier(name))
.AddModifiers(Token(SyntaxKind.PublicKeyword));
We’ll begin at the bottom, where I’ve included a simple expression-bodied local function called CreateClass. This accepts a name for the class, which we assume is correctly Pascal cased. We’ll be returning a ClassDeclarationSyntax which represents a class node in our syntax tree.
To create this, we’ll use the SyntaxFactory provided as part of the Microsoft.CodeAnalysis.CSharp namespace. Since we tend to need this static factory quite often, I prefer to import this using a static directive to avoid retyping it throughout the codebase.
using static Microsoft.CodeAnalysis.CSharp.SyntaxFactory;
We can now call the ClassDeclaration method to create a class declaration. This requires an identifier for the class. Calling Identifier will create an identifier token using the name parameter for this function. I want to generate public classes, so I must add a modifier to the class declaration using AddModifiers. This accepts a token for the modifier. We can use the public keyword syntax kind for this. That’s all we need to define the syntax of an empty class.
We use this local function inside a LINQ expression in our main method. As a reminder, we’re now talking about this code:
var members = schema?.Types.Select(t => CreateClass(t.TypeName)).ToArray()
?? Array.Empty<MemberDeclarationSyntax>();
As long as the schema is not null, we use the LINQ Select method to access each type defined in it. We then call our CreateClass local function, passing the type name from the type. We call ToArray to force immediate evaluation, producing an array of ClassDeclarationSyntax.
In cases where the schema is null, we will use an empty array. Although our CreateClass returns a ClassDeclarationSyntax, we can also treat this as MemberDeclarationSyntax from which it derives.
Our classes should live inside a namespace which we achieve with this line of code:
var ns = NamespaceDeclaration(ParseName("CodeGen")).AddMembers(members);
We call NamespaceDeclaration to create a namespace syntax node. This also needs a name which we’ll parse from a string literal for now. We can call the AddMembers method, which accepts params MemberDeclarationSyntax[], so we can pass in our array.
This is actually all we need for a basic syntax tree. Our final lines of code use this to write out the generated C# code to a file.
await using var streamWriter = new StreamWriter(@"c:\code-gen\generated.cs", false);
ns.NormalizeWhitespace().WriteTo(streamWriter);
First, we open a stream writer to the file we wish to generate. We pass false for the append argument since we want to overwrite the file if it already exists.
We call NormalizeWhitespace on the namespace syntax node, ensuring the generated code will include the standard whitespace. Without this, the code would be generated on a single line.
We call WriteTo, passing the StreamWriter to write out the full text represented by the syntax tree.
Believe it or not, that’s all we need for this very simplified example. Running the application on my PC results in the following contents for the generated.cs file.
namespace CodeGen
{
public class FirstClass
{
}
public class SecondClass
{
}
}
You’ll agree it’s pretty basic, but we have well-formatted C# represent two classes that we produced based on a JSON schema file. Things get more complex from here because we also need to include nodes for fields, properties, and methods. We’ll tackle that another day!
Summary
In this post, we have learned about generating code using the Roslyn APIs. We loaded a simple schema from a JSON file, and based on that schema, we generated a Syntax tree representing the structure of our C# code. We then wrote the syntax tree to a file.
Hopefully, this post is helpful as a getting started guide. Roslyn’s learning curve is a bit steep since the official documentation is limited to the basic API docs. There are few examples available showing how to actually combine these APIs together. Defining the syntax tree can get quite complex, and there are often multiple ways to achieve the same result, some more verbose than others.
Roslyn is a compelling way to generate valid C# code. It’s proving quite a productive way to implement a reasonably complex code generator for the Elasticsearch NEST library. The expectation is that we’ll be generating far more of the codebase by the 8.0 timeframe.
Have you enjoyed this post and found it useful? If so, please consider supporting me: