Dotnet Source Generators in 2024 Part 1: Getting Started

Dotnet Source Generators in 2024 Part 1: Getting Started
2024-10-2 02:10:32 Author: securityboulevard.com(查看原文) 阅读量:7 收藏

Introduction

In this blog post, we will cover the basics of a source generator, the major types involved, some common issues you might encounter, how to properly log those issues, and how to fix them.

Source Generators have existed since .NET 5 was first introduced in late 2020. They have seen numerous improvements since that initial release, including the creation of newer Incremental Source Generators.

TLDR: Source generators in .NET enable you to inspect user code and generate additional code on the fly based on that analysis. The example in this blog post may seem a bit redundant, but as you use more advanced patterns, you can generate hundreds of lines of code, helping to reduce boilerplate and repetitive code across your projects. Source generators are also great for lowering runtime reflection use, which can be expensive and slow down your applications.

Skip to the real content.

While developing a C# library to perform messaging between various process components or between processes, I encountered an issue where client programs using this new messaging library would need to add a list of all the “Messenger types.” I had heard of source generators and experimented with them a small amount before encountering this problem, so I figured I could dive in and devise a working solution to handle this and inject the list of “Messenger types” as a property automagically.

I have also been learning various programming paradigms and interesting practices. Vertical Slice architecture and aspect-oriented programming (AOP) are the two relevant to this blog. Vertical slices focus on grouping things that will change together, typically by the feature they represent, regardless of the layer they belong to. The goal of the slices is to minimize coupling between slices and maximize coupling in a slice (i.e., things in a feature depend on each other while trying not to rely on other slice features). This keeps the code base modular and makes it easy to update, remove, or add new slices, as the changes shouldn’t directly affect existing slices. [You can read more on vertical slices here]

AOP is a programming paradigm that aims to increase modularity by allowing the separation of cross-cutting concerns. Typically, in C#, this is implemented by creating attributes that are then placed on classes, methods, etc., to introduce or modify the decorated code. So, with these things in mind, I wanted to look at creating a feature that worked with vertical slices using AOP, and given my newfound challenge of automatically injecting a list of objects into my messenger app at build time, I had just the target goal in mind to combine all of it together.

With that brief overview of why I started messing with source generators, let’s take a quick step back and cover the basics of what a source generator is, what it lets you do, and what it doesn’t.

What is a Source Generator?

According to Microsoft: “Source generators aim to enable compile-time metaprogramming, that is, code that can be created at compile time and added to the compilation. Source generators will be able to read the contents of the compilation before running, as well as access any additional files, enabling generators to introspect both user C# code and generator-specific files. Generators create a pipeline, starting from base input sources and mapping them to the output they wish to produce. The more exposed, properly equatable states exist, the earlier the compiler will be able to cut off changes and reuse the same output.”

Figure 1 — Source Generator dataflow, Microsoft

Simply put, source generators in .NET are library projects that you can add to a solution or include in existing NuGet packages. They are meant to be utilized only during the build process to add new code to a project.

What Are Source Generators Not Meant to Do

Microsoft calls out two main concepts as areas where generators are not meant to be used. The first area is adding language features. Microsoft states: “Source generators are not designed to replace new language features: for instance, one could imagine records being implemented as a source generator that converts the specified syntax to a compilable C# representation. We explicitly consider this to be an anti-pattern; the language will continue to evolve and add new features, and we don’t expect source generators to be a way to enable this. Doing so would create new ‘dialects’ of C# that are incompatible with the compiler without generators.”

In this regard, I agree with the team that allowing any .NET developer to start adding new features to the language opens up the possibility of competing features, confusing requirements, and incompatibility with the .NET compiler; which will only serve to confuse and push developers away from source generators altogether.

The second is in code modification; the Microsoft documentation also states, “There are many post-processing tasks that users perform on their assemblies today, which here we define broadly as ‘code rewriting’. These include, but are not limited to:

Optimization
Logging injection
IL Weaving
Call site re-writing

While these techniques have many valuable use cases, they do not fit into the idea of source generation. They are, by definition, code altering operations which are explicitly ruled out by the source generator’s proposal.”

While technically accurate, this feels more like a semantic line in the sand for the team not wanting a “generator” to perform replacement and not a language-breaking functionality to have access to. With that said, I will show a workaround for code rewriting that I’ve used recently if that is part of your goal for using a source generator.

A source generator is also not an Analyzer. While often used together and sharing many of the exact same requirements to utilize one in a project, a generator’s job is to produce code and an analyzer’s job is to produce warnings or errors based on various rules such as code formatting or, as we will see in source generators, to block access to specific functions/code bases that the analyzer’s author deemed unwelcome.

The Primary Type of Source Generator in Modern .NET

At the time of writing this (September 2024), the .NET team has decided to deprecate source generators implementing “ISourceGenerator” in favor of incremental generators. This change will be enforced, seemingly blocking access to older “ISourceGenerator” APIs with versions of the Roslyn API after version 4.10.0 / .NET 9. (Old Generator Deprecated). In light of that, this blog post series will only cover “IncrementalGenerator” usage.

What Is an Incremental Source Generator?

An incremental generator is a source generator that performs its evaluation and execution on items only after they pass some filtering requirements, significantly increasing performance.

Typically, source generators try to execute during design time and compile time. While nice, this incurs an execution of any classes marked as a source generator every time something changes in the project (i.e., delete a line of code, add a line of code, make a new file, etc.). As you can imagine, having something running every time you type is not ideal for performance; thus, Microsoft created these incremental generators to help solve that performance problem.

Adding an Incremental Source Generator to Your Project

Source generators must target .NET standard 2.0. This allows them to be used in .NET framework 4.6+ or .NET core 5+ projects or other .NET standard 2.0+ projects. By the end of this section, we will have a solution containing three projects.

A .NET standard 2.0 library (Source Generator)
A .NET standard 2.0 library (A shared library for the generator and consumers)
A .NET 8 web API project (Main project)

You can use the dotnet command in a terminal or your IDE of choice to create these projects. I will use the dotnet tool since it is IDE/platform agnostic. The following commands will produce the required projects.

dotnet new sln -n IncrementalSourceGenPractice
dotnet new webapi -n WebProject — project .\IncrementalSourceGenPractice.sln
dotnet new classlib -f netstandard2.0 — langVersion 12 -n SourceGenerator — project .\IncrementalSourceGenPractice.sln
dotnet new classlib -f netstandard2.0 — langVersion 12 -n SourceGenerator.SharedLibrary — project .\IncrementalSourceGenPractice.sln
dotnet sln .\IncrementalSourceGenPractice.sln add .\SourceGenerator.SharedLibrary\SourceGenerator.SharedLibrary.csproj .\SourceGenerator\SourceGenerator.csproj .\WebProject\WebProject.csproj

Before creating the source generator, adding a few Nuget packages and changes to the .csproj files are required. Open the SourceGenerator.csproj file and ensure it matches the following content.

The version numbers of the package references will likely be different, which is fine as long as they are valid for the .NET standard 2.0 target.

The three configuration settings added are the following

1. <EnforceExtendedAnalyzerRules>true</EnforceExtendedAnalyzerRules>

Ensures the generator is using the recommended rules created by the .NET team

2. <IsRoslynComponent>true</IsRoslynComponent>

Enables the project to act as a generator and work with the Roslyn compiler, making debugging of the generator possible

3. <IncludeBuildOutput>false</IncludeBuildOutput>

This prevents the project build from being included in output, which is ideal since the generator is meant to be compile-time only

The other odd configuration in this .csproj file is the OutputItemType=”Analyzer” added to the project reference for the shared library. Even though the shared library is not an analyzer, this is required so the generator can access it during generation.

The final bit of configuration required is for the webproject.csproj file.

Add the following lines to the project.

<EmitCompilerGeneratedFiles>true</EmitCompilerGeneratedFiles>
<CompilerGeneratedFilesOutputPath>.\GeneratedFiles</CompilerGeneratedFilesOutputPath>

These two options allow the source generator files to be written to the filesystem and set a custom path to write them to vs. using the default path.

Lastly, add the following item group also to the webproject.csproj file.

<ItemGroup>
<ProjectReference Include="..\SourceGenerator.SharedLibrary\SourceGenerator.SharedLibrary.csproj" />
<ProjectReference Include="..\SourceGenerator\SourceGenerator.csproj" ReferenceOutputAssembly="false" OutputItemType="Analyzer" />
</ItemGroup>

When referencing the source generator, we do not need the output assembly again because it is not required after compile time.

Adding a Relevant Target for Generation

In this first part, we will generate something relatively simple; however, later posts go deeper into using source generators and we will create a small AOP framework to achieve the goal outlined at the start of this blog.

Open the WebProject and add a new class called Calculator.cs with the following source.

We will then generate functions for this class to add, subtract, multiply, and divide. We must mark the class as partial to stick with the intended functionality of only adding content to existing classes. This indicates that more of the class’s source code may be in a different file.

Starting on the Incremental Source Generator

Congratulations; we finally made it through the required setup.

Finally, with that configuration done, we can start writing the source generator. In the SourceGenerator project, add a class named CalculatorGenerator with the following content.

This gives the bare-bones starting point. To be a valid Incremental source generator, the class must inherit from `IIncrementalGenerator` and be decorated with the [Generator] attribute. The interface requires our generator to implement only the’ Initialize’ function.

The Providers

The IncrementalGeneratorInitializationContext argument it provides in the Initialize method will give access to the underlying source.

The context object does this via several different “providers.”

Different Provides for Accessing Various Contexts

CompilationProvider -> Can access data relevant to the entire compilation (assemblies, all source files, various solution-wide options, and configs )
SyntaxProvider -> Access to syntax trees to analyze, transform, and select nodes for future work (Most commonly accessed)
ParseOptionProvider -> Gives access to various bits of info about the code being parsed, such as language, whether it’s regular code files, script files, custom preprocessor names, etc.
AdditionalTextsProvider -> Additional texts are any non-source files you might want to access, such as a JSON file with various user-defined properties
MetadataReferencesProvider -> Allows getting references to various things like assemblies without getting the whole assembly item directly
AnalyzerConfigOptionsProvider -> If a source file has additional analyzer rules applied to it, this can access them

The ones we care about are CompilationProvider and SyntaxProvider.

Access the context.SyntaxProvider.CreateSyntaxProvider() method call.

This method takes two arguments. The first is called thepredicate, a super lightweight filter that reduces everything in the codebase to only the items we care about. The second is called thetransform, which ingests what we care about from the filter and makes any additional changes, property access, additional filtering, etc., as desired before returning the objects we want to work with later.

An example of using this syntaxProvider method is as follows.

The names of the arguments (predicate, transform) do not have to be supplied. I included them to make it easier to understand which is which.

The Predicate

The predicate code’s first argument is a SyntaxNode , and the second is a CancellationToken. The token allows any asynchronous tasking performed in this method to be gracefully stopped if needed. In this example, it is unnecessary, so we will focus on the SyntaxNode.

(SyntaxNode node, _) =>
{
  return node is ClassDeclarationSyntax classDeclaration && 
  classDeclaration.Identifier.ToString() == “Calculator”;
}

The preceding code can seem daunting initially, as you are quickly bombarded with terms not typically seen by C# developers (e.g., SyntaxNode, ClassDeclerationSyntax, identifier, etc.). If you are anything like me, you are wondering what they mean, what they are used for, and why you need to use them.

Source generators work alongside / utilize the Roslyn compiler. Roslyn is a set of compilers and code analysis APIs. Roslyn understands your code and projects by breaking down almost everything into SyntaxNodes and SyntaxTokens.

Some examples of syntax tokens include access modifiers like public or private, modifiers like static, abstract, and partial. Names of items like a class name, namespace, method name, etc.

Tokens also include grammar items in the language, like semicolons, brackets, etc.

Examples of syntax nodes include class declarations, method declarations, bodies of methods, and individual lines of code, including assignments, expressions, and using statements.

Example of a Syntax Node in this case a Class Declaration Syntax Node

This programmatic code breakdown is then used to analyze code, write new classes, modify methods, etc. While this can feel daunting, something to remember is that syntax is ultimately still text, and syntax objects can be cast into a string if required. This is precisely what we do in the predicate to convert this SyntaxToken into a string with the .ToString() method to compare it to our target name.

There are various syntax nodes and token types, which I plan to break down and provide examples of in later posts in the series.

In summary, the predicate says if this piece of code represents declaring a class like public partial class Calculator, then check if its identifier (i.e., class name) is “Calculator,” and if so, pass it to the transform. This way, when the generator sees a node like public static void Main(), it knows to skip it.

The Transform

transform: (GeneratorSyntaxContext ctx, _) =>
{
   var classDeclaration = (ClassDeclarationSyntax)ctx.Node;
   return classDeclaration;
}

The transform takes in the item that passed the filter and a cancellation token again to help cancel it if needed. The GeneratorSyntaxContext item is basically the node and some extra metadata. We then cast the context node item to a ClassDeclarationSyntax. This is required because even though the filter only passed us nodes of that type, the SyntaxContext does not understand that; however, we can cast it and safely know we are getting what we expect.

The transform is where we could extract members of the class, bodies of methods, etc.; whatever item we want to work on. In this example, we want to work on a class, so we get the item as a ClassDeclarationSyntax.

Finally, we add a where statement to filter out any null items that may have made it through. This is optional, but ensuring we aren’t getting some weird invalid item does not hurt.

The CreateSyntaxProvider returns an IncrementalValuesProvider<T> where T is whatever item type we are trying to return from the method call.

An `IncrementalValuesProvider` is a fancy word for the object holding our returned items. There is also an IncrementalValueProvider<T>, which is similar but is meant to have one object instead.

For example, our code’s ValuesProvider contains class declarations from the ClassDeclarationSyntax type.

This then leaves us with an initialization method like this:

The last central part of using a source generator is telling it what to do with the items we got back. Go ahead and add the context. RegisterSourceOutput() line to your project. This tells the generator what to do with the returned items. Next, we will go over the content of this Execute method.

The Execute Method

Alright, so we have our target type; we are filtering out everything we don’t care about, so let’s send it to the execute method and generate our source code.

The execute method is typically defined as follows:

public void Execute(ClassDeclarationSyntax calculatorClass, SourceProductionContext context)
{
  //Code to perform work on the calculator class
}

The first argument will vary depending on the work you are trying to perform and the method can be modified to take extra arguments as needed. The SourceProductionContext object gives us essential information about the project/solution and enables us to add code to the compilation to include it in the final build.

Since our goal is to generate some simple calculator functions, we will first check all the members of the class we are working on to see if they already have a method with the same name so we don’t accidentally override an existing version. Next, we will gather some metadata, like the namespace, any modifiers, and any using statements, to ensure the code compiles correctly. Lastly, we will insert the code we want into the source file and save it to the compilation.

So the final generator code should look like the following:

Alright, all the pieces are in place. Let’s build the solution and check out the generated code.

Example Error Messages When Source Generation Fails

Well, that’s not what we hoped for; however, as with many development projects, errors are bound to happen. Don’t panic yet; that is intended to show off some important things when working with source generators. First, source generators will only present a warning when they fail to generate code, so watch for warning messages like this when compiling the code.

Warning CS8785 : Generator ‘CalculatorGenerator’ failed to generate source. It will not contribute to the output and compilation errors may occur as a result. Exception was of type ‘NullReferenceException’ with message ‘Object reference not set to an instance of an object.’.

Second, source generators execute at compile time, making capturing extra context from the exception challenging as you might typically do with a try-catch where you can print info to the console.

If you try something like the following, you will notice no additional information is sent to the console.

public void Execute(ClassDeclarationSyntax calculatorClass, SourceProductionContext context) 
{
  Try
  {
     // code from before
  }
  catch(Exception ex)
  {
     Console.WriteLine(ex);
  }
}

OK, no problem. Instead, Let’s save the message to a file in the catch statement.

OK, maybe not. If we can’t log to a file in the generator and we can’t log to the console, how will we get the details we need to figure out what is going wrong?

Logging in Source Generators

This brings us to logging in source generators, which I wanted to include in this first part because it is by far the most accessible means of troubleshooting issues when using source generators.

To enable logging in the source generator, open the shared library we made at the start. It should have a single class named class1. Rename that to GeneratorLogging. While the File API is blocked inside the source generator itself, adding that functionality to a secondary library and having it write the content in a file for you is possible.

A simple logging class would be something like the following

There are a few key parts I will quickly explain.

The lock object -> This ensures that only one instance of the log message call runs simultaneously. This way, the source generators do not step on each other while trying to access the same file. Even with just one source generator, this can still happen because it checks multiple classes simultaneously.
The log message method -> This method will create the file at the provided path if it does exist. Then, so long as the log level is equal to or higher than the set level, it will log the message to the file. It will also add a small header and footer to messages set higher than info to better showcase errors.

Fixing the Example Code

Using the logging class is very straightforward; if you haven’t already, ensure the shared library is added as a dependency of the generator project so it can access it. So, let’s add some logging calls to our current code and see what the log shows.

If needed, perform dotnet clean to clean up any previous logs or generated files. Then, build the solution and check the log file.

The log then will contain output like the following:

[+] Generated Log File
[+] This file contains log messages from the source generator
Logging started at 2024–09–14 19:42:12.287
[+] Found 2 members in the Calculator class
[+] Checked if methods exist in Calculator class
[+] Added using statements to generated class[Error start]
[-] Exception occurred in generator: System.NullReferenceException: Object reference not set to an instance of an object.
 at SourceGenerator.CalculatorGenerator.Execute(ClassDeclarationSyntax calculatorClass, SourceProductionContext context)
[Error end]

From this log output, we can see the generator is running into this Null Reference Exception right after the using statement code, so let’s take a more in-depth look at that.

GeneratorLogging.LogMessage(“[+] Added using statements to generated class”);
calcGeneratedClassBuilder.AppendLine();SyntaxNode calcClassNamespace = calculatorClass.Parent;
while (calcClassNamespace is not NamespaceDeclarationSyntax)
{
  calcClassNamespace = calcClassNamespace.Parent;
}
GeneratorLogging.LogMessage($"[+] Found namespace for Calculator class {calcClassNamespace?.Name}", LoggingLevel.Info);

Here, we see the calcClassNamespace enumerates through the parents of the class object until it finds something. However, we did not add any null checks to ensure we had a namespace before continuing. Let’s modify this section of code to handle the nulls and perform a check against the nodes ancestors as well.

GeneratorLogging.LogMessage(“[+] Added using statements to generated class”);  calcGeneratedClassBuilder.AppendLine();
  BaseNamespaceDeclarationSyntax? calcClassNamespace = calculatorClass.DescendantNodes().OfType<NamespaceDeclarationSyntax>().FirstOrDefault() ?? 
 (BaseNamespaceDeclarationSyntax?)calculatorClass.DescendantNodes().OfType<FileScopedNamespaceDeclarationSyntax>().FirstOrDefault();
  calcClassNamespace ??= calculatorClass.Ancestors().OfType<NamespaceDeclarationSyntax>().FirstOrDefault();
 calcClassNamespace ??= calculatorClass.Ancestors().OfType<FileScopedNamespaceDeclarationSyntax>().FirstOrDefault();
  if(calcClassNamespace is null)
 {
 GeneratorLogging.LogMessage(“[-] Could not find namespace for Calculator class”, LoggingLevel.Error);
 }
 GeneratorLogging.LogMessage($”[+] Found namespace for Calculator class {calcClassNamespace?.Name}”);

This updated version will now search through all of the child and ancestor nodes to see if the previous checks were null and update them as needed. If they are still null, we should get a log entry to keep troubleshooting issues.

This then gives us a final working source generator of:

We can test this by modifying the WebApi project we created at the start.

Open the WebApi program.cs file and modify it to look like this:

When we run this project and send a get request to the / URL, we will get back a message with the results from the source-generated methods.

API Call Showing the Generated Code Compiled and Executed Correctly

Conclusion for Part 1

I would like to cover many other capabilities of source generators in future parts that help showcase the real power behind them. So, if you enjoyed this first installment, stick around for more in-depth looks at C# source generators.

Dotnet Source Generators in 2024 Part 1: Getting Started was originally published in Posts By SpecterOps Team Members on Medium, where people are continuing the conversation by highlighting and responding to this story.

*** This is a Security Bloggers Network syndicated blog from Posts By SpecterOps Team Members - Medium authored by Jonathan Owens. Read the original post at: https://posts.specterops.io/dotnet-source-generators-in-2024-part-1-getting-started-76d619b633f5?source=rss----f05f8696e3cc---4

文章来源: https://securityboulevard.com/2024/10/dotnet-source-generators-in-2024-part-1-getting-started/
如有侵权请联系:admin#unsafe.sh