A C# Source Generator for Oracle UDTs

Jan 12, 2022
c# oracle udt sql generator sprache parsing dsl roslyn
13 min read

Introduction

Working with Oracle recently, I found myself wanting an equivalent of SQL Server’s Table-Valued Parameters - a way to pass structured records, or tables of such records, into stored procedures. The Oracle equivalent is the User-Defined Data Type but using them requires quite a bit of work. I’ve created BassUtils.Oracle which contains helpers for creating the correct OracleCommand parameters, described in a previous blog post. But that still leaves you having to manually code the C# class that Oracle requires to map them to and from the Oracle SQL types. And it’s far from a POCO - example here - and has some strange, hard-to-discover rules, such as that all the Oracle field names have to be in UPPERCASE or it just won’t work.

Given that these classes all follow the same format it sounded like the perfect opportunity to create a source generator and to use a DSL (Domain-Specific Language) to simplify the definition of the classes. While the classes themselves are voluminous there’s not actually a lot of information required. It’s possible to capture it all in a significantly shorter form, an example of which is:

class PersonRecord MYSCHEMA.objPerson
    Collection PersonArray MYSCHEMA.tblPerson
    Namespace People
    Fields [
        int Age,
        FirstName,
        LastName
    ]

To go from this to a C# class the steps are

Parse this text into a specification, held in a C# class.
Use the specs to generate C# source code using a Source Generator
Package the source generator as a NuGet package.

This post describes how I performed each of these steps. If you find yourself wanting to generate a DSL-based Source Generator you can use this as a template. The finished generator is availble on NuGet as OracleUdtClassGenerator and the source code is available on GitHub. A list of current C# Source Generators exists which may provide help and ideas for your own implementation.

Parsing

My first thought was “this DSL is so simple I could parse it with a regex”. But the regex got bigger and nastier and I soon abandoned that idea. So what other approaches are there? Obviously I could write a custom parser all by myself, and while that is the approach taken by Luca on the Microsoft team it’s a little more work than I was hoping for. A full blown parser generator such as Antlr seemed like overkill, and installing Java is not something I wanted to do for this task.

Now there is a thing called parser combinators, which I first came across in the Rust world, though I never had the need to use it and found the intros confusing, but it did stick in my mind as ‘should be simple if I ever need to write a parser’. Briefly, parser combinators is an approach whereby you write lots of little parsers to recognise small pieces of text - like the word ‘class’ or ‘Fields’ or a newline - and then combine them with combinators like AND and OR or sequencing to make new, larger parsers. Continue until you have a parser that can recognise your entire input.

This approach has two great advantages - you don’t need to learn a new language to write a grammar file, and it lends itself to stepwise refinement, so you can start simple and test your parsers as you go along, gradually building them up to something more complex.

Write unit tests to test your parser on all possible input variants. It is a lot faster and easier than testing it in a NuGet package.

A little Googling brought me to Sprache, a parser combinator library for C# (Sprache is German for ‘language’). There are good introductory articles by Nicholas Blumhardt and Thomas Levesque, and Justin Pealing has produced a whole list of reference articles on different Sprache techniques.

The grammer for my DSL is specified as a series of parsers using LINQ expressions. You can probably figure out what is happening just by looking at the code (TargetClassSpecification is the DTO that contains all the parsed information):

static readonly Parser<TargetClassSpecification> TargetClassSpecificationParser =
    from classNames in ClassNameParser
    from collectionNames in CollectionNameParser.Optional()
    from namespaceName in NamespaceParser.Optional()
    from filename in FilenameParser.Optional()
    from ddspec in DebugParser.Optional()
    from tsspec in ToStringParser.Optional()
    from fields in FieldListParser
    select new TargetClassSpecification
    {
        FileName = filename.GetOrDefault(),
        Namespace = namespaceName.GetOrDefault(),
        ClassName = classNames.Item1,
        CollectionName = collectionNames.IsDefined ? collectionNames.Get().Item1 : null,
        OracleRecordTypeName = classNames.Item2,
        OracleCollectionTypeName = collectionNames.IsDefined ? collectionNames.Get().Item2 : null,
        DebuggerDisplayFormat = ddspec.GetOrDefault(),
        ToStringFormat = tsspec.GetOrDefault(),
        Fields = fields
    };

This declares one parser, which is composed from a sequence of sub-parsers such as ClassNameParser and CollectionNameParser (which is optional). If the entire parse succeeds then a new TargetClassSpecification object is created. At this level, most of the interesting code is tucked away in the sub-parsers, most of which return single strings, but ClassNameParserand CollectionNameParser return tuples consisting of two strings. I really admire the way that Sprache leverages LINQ to express sequencing.

If we dig into ClassNameParser we can see some of the lower-level techniques.

static readonly Parser<(string, string)> ClassNameParser =
    from _ in Parse.IgnoreCase("CLASS").TokenOnLine()
    from csharpClassName in Parse.Identifier(Parse.Letter, TokenChar).TokenOnLine().Text()
    from oracleObjectTypeName in Parse.Identifier(Parse.Letter, TokenChar).TokenOnLine().Text()
    from _2 in OptionalCommaParser
    select (csharpClassName, oracleObjectTypeName);

This is designed to parse the first line of this text:

class PersonRecord MYSCHEMA.objPerson
    Collection PersonArray MYSCHEMA.tblPerson
    ...

First we look for the word “CLASS” to start off the entire specification. We use Parse.IgnoreCase in preference to Parse.String here so the user can write in any case they want. TokenOnLine() eats whitespace around tokens but doesn’t consume newlines. (I had to write this specially because I used newline as a separator rather than requiring a comma. This was probably a mistake, I should have used comma and the built-in Token parser which does eat newlines). Text() turns a list of parsed chars into a string. Finally, if the parse succeeds the from _ means “I don’t care what you parsed, throw it away”. Since the word “class” is just a keyword and carries no information we need we can just dispose of it.

We do want to get the C# class name and the Oracle Object Type Name because we need them both for the code generation step. All my sub-parsers leverage the Parse.Identifier(first_char, subsequent_chars) built-in parser. I’ve configured TokenChar so that this accepts letters, numbers, and dots, so that it can parse “MYSCHEMA.objPerson” as well as “Person.g.cs” and “PersonRecord”.

OptionalCommaParser is a ‘tidy-up’ operation. It greedily consumes input consisting of either a comma or whitespace and then stops. This means that the overall parser will be in a position to start the next parser, in this case we will be at the ‘C’ in “Collection”.

The other individual parsers for the namespace, debug display attribute etc. are all similar, and all optional.

Finally FieldListParser returns a list of FieldSpecification DTOs. Notice how sequencing is handled simply by adding a new LINQ from expression. It’s very easy to build up from simple parsers to the final one.

The TargetClassSpecificationParser isn’t actually the final word: the parsing is done by this higher-level parser:

static readonly Parser<List<TargetClassSpecification>> TargetClassSpecificationListParser =
    from t in TargetClassSpecificationParser.Many()
    select t.ToList();

This really demonstrates the power of the combinator approach with the .Many() call being all that is required to say “parse a list of these things from the input” rather than just one.

Drill-down into FieldListParser

FieldListParser shows a couple of interesting techniques. It is defined as:

static readonly Parser<List<FieldSpecification>> FieldListParser =
    from open in FieldKeywordParser
    from elements in FieldParser.DelimitedBy(FieldSeparatorParser)
    from _ws1 in Parse.WhiteSpace.Many()
    from close in Parse.Char(']')
    from _ws2 in Parse.WhiteSpace.Many()
    select elements.ToList();

static readonly Parser<string> FieldSeparatorParser =
    from _ in Parse.LineEnd.Or(Parse.String(","))
    select "";

// Most consuming parser first.
static readonly Parser<FieldSpecification> FieldParser =
    ThreePartFieldParser
    .Or(TwoPartFieldParser)
    .Or(OnePartFieldParser);

FieldKeywordParser just eats the “Fields [” text, then we use FieldParser to parse an individual field, which may have one, two or three parts. Since the first parser to match terminates the parsing, we use the most specific three-part matcher as the first, then try for a two-part match, then finally fall back to the one-part matcher. We combine FieldParser using the Sprache built-in DelimitedBy which makes it parse a list of fields, each field being separated by FieldSeparatorParser - which may be a newline or a comma. This works because we don’t allow comma as a character in a token. You’ll notice we explicitly eat whitespace before we get to the closing ‘]’ token. This is something of a theme, you have to consume whitespace yourself, either like this or by using Token or related methods.

The 3 field parsers are all similar, here is the three-part one:

static readonly Parser<FieldSpecification> ThreePartFieldParser =
    from typeName in Parse.Identifier(Parse.Letter, TokenChar).TokenOnLine().Text()
    from propertyName in Parse.Identifier(Parse.Letter, TokenChar).TokenOnLine().Text()
    from oracleName in Parse.Identifier(Parse.Letter, TokenChar).TokenOnLine().Text()
    select new FieldSpecification
    {
        DotNetDataTypeName = typeName,
        PropertyName = propertyName,
        OracleFieldName = oracleName,
    };

Notice that at this level none of the properties are optional. The two-part is then defined to omit oracleName:

static readonly Parser<FieldSpecification> TwoPartFieldParser =
    from typeName in Parse.Identifier(Parse.Letter, TokenChar).TokenOnLine().Text()
    from propertyName in Parse.Identifier(Parse.Letter, TokenChar).TokenOnLine().Text()
    select new FieldSpecification
    {
        DotNetDataTypeName = typeName,
        PropertyName = propertyName,
    };

An alternative implementation could use a single FieldParser which gathers 1, 2 or 3 strings and then assigns them to the appropriate properties of the FieldSpecification depending on how many strings were present.

You can see the whole set of parsers at the Github repository but that’s the main ones. The stepwise refinement continues down to lower and lower levels until you reach parsers for individual words and characters.

The top-level method that drives the parsing is defined like this:

public static List<TargetClassSpecification> ParseTargetSpecs(string input)
{
    input = input.Trim();
    var specs = TargetClassSpecificationListParser.Parse(input);
    return specs;
}

Generating the UDT

With parsing the input taken care of we can move on to the second part, creating a C# source generator. First you’ll need to install the VS Extension Development Pack using the Visual Studio Installer:

VS Extension Development

A generator is quite easy to write - but debugging it is a different matter, more on that later. To start, you have to implement the ISourceGenerator interface. Create a class like this

using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.Text;

[Generator]
public class OracleUdtGenerator : ISourceGenerator
{
    public void Initialize(GeneratorInitializationContext context)
    {
    }

    public void Execute(GeneratorExecutionContext context)
    {
    }
}

The body of your generator goes in the Execute() method. For this source generator I don’t need to do anything in Initialize(), but if necessary this is where I would create a SyntaxReceiver for example if I wanted to generate based on the C# syntax tree. But my use case is much simpler: OracleUdtClassGenerator is used by adding files ending with .oraudt to a project as additional files:

VS Extension Development

So all we need to do during generation is find all these files, extract the text, parse it, and use the parsed specifications to generate C#:

public void Execute(GeneratorExecutionContext context)
{
    try
    {
        // Generating as we find each file produces better logs.
        foreach (var file in context.AdditionalFiles)
        {
            if (Path.GetExtension(file.Path).Equals(".oraudt", StringComparison.OrdinalIgnoreCase))
            {
                Logger.Log($"Found file {file.Path}");
                ProcessAdditionalFile(context, file);
            }
        }

    }
    catch (Exception ex)
    {
        Logger.Log(ex.ToString());
    }

    Logger.WriteLogsToFile(context);
}

private void ProcessAdditionalFile(GeneratorExecutionContext context, AdditionalText file)
{
    try
    {
        var text = file.GetText()?.ToString();
        if (!string.IsNullOrEmpty(text))
        {
            ProcessAdditionalFileContents(context, file, text);
        }
    }
    catch (Exception ex)
    {
        Logger.Log(ex.ToString());
    }
}

private void ProcessAdditionalFileContents(GeneratorExecutionContext context, AdditionalText file, string text)
{
    try
    {
        var targetSpecs = Grammar.ParseTargetSpecs(text);
        foreach (var spec in targetSpecs)
        {
            Logger.Log($"  Found spec for {spec.ClassName} with {spec.Fields.Count} fields");
            CreateSourceFile(context, file, spec);
        }
    }
    catch (Exception ex)
    {
        Logger.Log(ex.ToString());
    }
}

Note the call to Grammer.ParseTargetSpecs(text) - this is where we pass the contents of the .oraudt file into the Sprache grammar parser and get back a list of class specifications which will drive the C# source code generation in CreateSourceFile.

A note on the logging and exception trapping in these methods. During development debugging source generators is something of a pain. You can attach a debugger to the running generator using:

if (!Debugger.IsAttached)
{
    Debugger.Launch();
}

but I found it preferable to generate a log as an array of strings as the generator runs and then write that out to a file as the final step. The exception blocks ensure that if something blows up on one file that the generator continues and tries to complete as much work as possible. The results of running the generator are visible under the ‘Analyzers’ node in Visual Studio. (Visual Studio can sometimes be a bit reluctant to show these files, restarting VS will make them visible). The ‘_Logs.cs’ file contains the log messages inside a C# comment, looking something like this:

/*
DateTime.UtcNow: 2022-01-01T22:33:03.5397481Z

Found file C:\repos\OracleUdtClassGenerator\OracleUdtClassGenerator.ConsoleTestHarness\ArticleMaster.oraudt
  Found spec for ArticleMasterRecord with 17 fields
  Generated file ArticleMasterRecord.g.cs in namespace OracleUdtClassGenerator.ConsoleTestHarness
Found file C:\repos\OracleUdtClassGenerator\OracleUdtClassGenerator.ConsoleTestHarness\MyClass.oraudt
  Found spec for MyClass with 5 fields
  Generated file MyClass.g.cs in namespace MyNamespace
Found file C:\repos\OracleUdtClassGenerator\OracleUdtClassGenerator.ConsoleTestHarness\SubFolder\ArticleMaster2.oraudt
  Found spec for ArticleMasterRecord2 with 17 fields
  Generated file ArticleMasterRecord2.g.cs in namespace OracleUdtClassGenerator.ConsoleTestHarness.SubFolder
Found file C:\repos\OracleUdtClassGenerator\OracleUdtClassGenerator.ConsoleTestHarness\Person.oraudt
  Found spec for PersonRecord with 3 fields
  Generated file PersonRecord.g.cs in namespace People
*/

VS Extension Development

If the generator is not working correctly, you will probably get the dreaded ‘not generating files’ error:

VS Extension Development

Catching all exceptions and logging them prevents this and also gives a clue as to what is going wrong.

The CreateSourceFile method

This bit of code, which does the actual work of creating the C# classes, is the longest but least interesting part of the project. It basically uses the TargetClassSpecification DTO to drive the output of the C# code into a StringBuilder, and when done it adds the contents of the builder to the generation context:

var source = GenerateSourceText(context, spec, ns);
var filename = $"{spec.ClassName}.g.cs";
Logger.Log($"  Generated file {filename} in namespace {ns}");
context.AddSource(filename, SourceText.From(source, Encoding.UTF8));

I made use of a utility class I have written called an IndentingStringBuilder to simplify the C# generation. It indents when you call BeginCodeBlock and outdents automatically on Dispose, so you can do

using (sb.BeginCodeBlock())
{
    sb.AppendLine("public IOracleCustomType CreateObject()");
    using (sb.BeginCodeBlock())
    {
        sb.AppendLine($"return new {spec.ClassName}();");
    }
}

// results in
{
    public IOracleCustomType CreateObject()
    {
        return new PersonRecord();
    }
}

This class lives in my BassUtils NuGet package, but I copied and pasted it into the generator to avoid having another NuGet dependency to deal with (see below for the problems they cause). This thinking also meant I decided not to use a templating library to assist with the code generation.

Packaging as a NuGet Package

When packaged as a NuGet package, we want our source generator to run at compile time only, and not to become a deployment dependency of the project that it is deployed in. I followed the official recipe to do this:

<PropertyGroup>
    <GeneratePackageOnBuild>true</GeneratePackageOnBuild>
    <!-- The generator lib is used only at compile time, clients do not need it at run time as it
	contains nothing they use. This stops NuGet from packing anything from this project, so we
	will include dependencies manually below. -->
    <IncludeBuildOutput>false</IncludeBuildOutput>
    <!-- Change the default output folder for the build -->
    <BuildOutputTargetFolder>analyzers</BuildOutputTargetFolder>
</PropertyGroup>

Now that everything is being packed into an ‘analyzers’ folder, you will start getting this warning from NuGet.

warning NU5128: Some target frameworks declared in the dependencies group of the nuspec and the lib/ref folder do not have exact matches in the other location. Consult the list of actions below:
warning NU5128: - Add lib or ref assemblies for the netstandard2.0 target framework

You can suppress this in the project file:

<PropertyGroup>
  <NoWarn>NU5128</NoWarn>
</PropertyGroup>

We have to ensure that the NuGet package includes the dependencies that we used while writing it. If we don’t do this users of the NuGet will get a runtime error along the lines of ‘cannot find file or assembly…'. For our OracleUdtClassGenerator, it’s just Sprache:

<ItemGroup>
    <!-- Generator dependencies -->
    <PackageReference Include="Sprache" Version="2.3.1" PrivateAssets="all" GeneratePathProperty="true" />
</ItemGroup>

<ItemGroup>
    <!-- Package the generator in the analyzer directory of the nuget package -->
    <None Include="$(OutputPath)\$(AssemblyName).dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="false" />
    <!-- Package the our dependencies alongside the generator assembly -->
    <None Include="$(PkgSprache)\lib\netstandard2.0\Sprache*.dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="false" />
</ItemGroup>

Notice how everything is packaged into the ‘analyzers’ folder, you can examine the final folder structure using the NuGet Package Explorer:

VS Extension Development

The package should have no dependencies:

VS Extension Development

And that completes the NuGet package.

One last thought: I found it easier to test the generator as as a NuGet package within my solution, rather than adding a Project Reference. As we have seen, there are significant complexities in creating the NuGet package and using a Project Reference doesn’t exercise those scenarios at all.