Creating Code with Code - Introduction to Roslyn

The black box

Compilers used to be a black box and we did not have access to the information inside it; that came with drawbacks, companies providing tools had to create their own version of compiler that simulates the exact behavior of the actual one to be able to provide capabilities like colorizer, go to definition, refactoring etc. in the IDEs, Furthermore, if developers were creating some frameworks or tools, there were no proper way to enhance the capabilities of the compiler to provide meaningful diagnostics, or code suggestions for the users of the framework or library (other developers using those frameworks or libraries). at least not easily.

In order for the compiler to compile our source code, there are a compiler pipeline consists of several phases, each of which provides an output for the next phase; Parser parses the source code and tokenize it into syntax tree that follows the grammar of the language. Declaration phase is responsible to analyze source and imported metadata to form named symbols.Binder matches identifiers in the code to symbols, and last but not least, Emitter emits an assembly with all the information accumulated by the compiler from all phases.

Compiler Pipeline

Roslyn opened the box by providing several APIs (Compiler API, Diagnostic API, Scripting API and Workspace API) and gives us the capability not only to access the informations collected in each phase, but to enrich each phase with information based on our own requirements and analysis, like providing specific diagnostics, whether it is warning or error, or providing code suggestions, even generating source codes during a compilation and adding them to the source of the program. Following shows the relation between Compiler API, one of the four APIs of Roslyn and each phase in the compiler pipeline.

Compiler Pipeline

Syntax Tree API provides access to the Syntax Tree, Symbol API exposes a hierarchical symbol table, Binder exposes the result of compiler's semantic analysis, and finally, Emit API produces IL byte code. Next, we want to use these APIs, to create a program, accepting a text as an input, and then compiling it to a .NET assembly.

Build Code with Code

Needless to say, the first step is adding a reference to the NuGet package which provides us access to the Compiler API, Microsoft.CodeAnalysis.CSharp

<PackageReference Include="Microsoft.CodeAnalysis.CSharp" Version="4.12.0"/>

The next step in this process is to create a syntax tree, consider there is a string variable representing our code, we could pass that to the SyntaxFactory.ParseSyntaxTree, another alternative is to use CSharpSyntaxTree.ParseText

const string sourceCode = """
                          using System;

                          namespace BuildingCode;

                          public class Program
                          {
                            static void Main(string[] args)
                            {
                                Console.WriteLine("Hello Sir Ta Piaz!");
                            }
                          }
                          """;

// 1. Create a Syntax Tree
var syntaxTree = SyntaxFactory.ParseSyntaxTree(sourceCode);

After creating a syntax tree, we could create a compilation, keep in mind, each compilation very likely needs to reference a other assemblies to be successful, in our case System.Private.CoreLib, System.Console and System.Runtime are required.

var coreLib = typeof(object).Assembly;
var console = typeof(Console).Assembly;
var systemRuntime = AppDomain.CurrentDomain.GetAssemblies().First(a => a.GetName().FullName.Contains("System.Runtime"));

// 2. Create a Compilation
var compilation = CSharpCompilation.Create(
    assemblyName: "BuildingCode",
    options: new CSharpCompilationOptions(OutputKind.ConsoleApplication),
    syntaxTrees: [syntaxTree],
    references:
    [
        MetadataReference.CreateFromFile(coreLib.Location),
        MetadataReference.CreateFromFile(console.Location),
        MetadataReference.CreateFromFile(systemRuntime.Location)
    ]
);

Now that we have formed the compilation, it is time to tell the last phase to emit the assembly; This usually leads to have a file (FileStream), but in our example for the sake of simplicity we are saving it somewhere in the memory, using MemoryStream:

// 3. Emit the Assembly
using var ms = new MemoryStream();
var result = compilation.Emit(ms);

if the result of the whole pipeline is successful, we will have our assembly! To examine now whether our program compiled successfully and provided an executable, lets load the compiled assembly and invoke its entry point (since it is a console application):

var assembly = Assembly.Load(ms.GetBuffer());
assembly.EntryPoint?.Invoke(null, BindingFlags.Static | BindingFlags.Public, null, [null], null);

Diagnostic API

Diagnostic API is another api from the platform, hand in hand with the Compiler API, it could provide additional warnings, errors, or even code suggestions, also we could leverage it to access the diagnostics provided by the compiler. Its focus is on providing us with warnings, errors, code suggestions, etc. It ensure that the code adheres to the defined code standards and other rules defined by the development team, the following code snippet shows how we could access those information:

foreach (var diagnostic in result.Diagnostics)
    {
        Console.ForegroundColor = diagnostic.Severity == DiagnosticSeverity.Error
            ? ConsoleColor.Red
            : ConsoleColor.Yellow;
        Console.Error.WriteLine(diagnostic);
        Console.ResetColor();
    }

Hint: Remove one of the referenced assemblies from the compilation and try running the application, you should see some errors:

(9,17): error CS0012: The type 'Object' is defined in an assembly that is not referenced. You must add a reference to assembly 'System.Runtime, Version=9.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'.
(9,9): error CS0012: The type 'Decimal' is defined in an assembly that is not referenced. You must add a reference to assembly 'System.Runtime, Version=9.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'.

Conclusion

Roslyn provides us with a set of APIs (Compiler API, Diagnostic API, Scripting API and Workspace API) helping us to have more meaningful interaction with the compiler and enabling us to use the same APIs that are used in the compiler to create our own tools and analyzers.

Thanks for reading so far, enjoy coding and Dametoon Garm [1] 😊

Resources

Buy Me a Coffee at ko-fi.com

Create Syntax Trees using Roslyn APIs

Sometimes when we want to generate code whether it is source generators or code fixes for a code analyzer, it is required to know how a syntax tree could be created using the Roslyn Compiler API. There are two ways, that we will discuss them in this post.

Write your own Code Analyzer with Roslyn

We are all familiar with diagnostics that are provided from the compiler when we develop an application, they could be in form of warnings, errors or code suggestions. A diagnostic or code analyzer, inspects our open files for various metrics, like style, maintainability, design, etc. However, sometimes we need to write a tailor-made analysis for our specific situation, tool, or project.

Changing Syntax Tree uing Roslyn API

A syntax tree has knowledge about the textual content of what it represents and that content relates to C#. There are scenarios that we need to change the code, such as code fixes, refactorings, or advanced source generation, and it could be done through changing the syntax tree; since this is a bidirectional relationship. There are two approaches, Replacement Methods and SyntaxRewriter classes

An error has occurred. This application may no longer respond until reloaded. Reload x