NUnit to xUnit automatic test conversion: pattern match

In the previous post I described how to use the Roslyn API to find code patterns in the C# AST and how to change the AST to rewrite the original code to something else. The goal was to automate the conversion of NUnit tests to xUnit. The approach I used was quite tedious, as I had to write a very long chain or ifs and typecasts to get the job done. Let’s try to do better this time. Let’s start with just the search part in our search-and-replace tool.

What would be great is to be able to specify structural patterns like this:

Assert.That(_, Is.EqualTo(_))
Assert.That(_, Is.EqualTo(true))
Assert.That(_, Is.Throws.TypeOf<_>())

And they would match the actual code:

// Matched by 'Assert.That(_, Is.EqualTo(_))'
Assert.That(account.Id, Is.EqualTo(id))
Assert.That("".ToBytes(), Is.EqualTo(new byte[] {}))

// Matched by 'Assert.That(_, Is.EqualTo(true))'
Assert.That(info.IsMd5, Is.EqualTo(true));
Assert.That(token.BoolAt(path, true), Is.EqualTo(true));

// Matched by 'Assert.That(_, Is.Throws.TypeOf<_>())'
Assert.That(() => Quad[-1], Throws.TypeOf<ArgumentOutOfRangeException>())
Assert.That(() => access(token, path), Throws.TypeOf<JTokenAccessException>())

At first it looks like a quite difficult task. But as it turns out in its simple form is not even that hard. I got the idea first when I was generating code for AST replacement with Roslyn Quoter. Looking at its source code I discovered a bunch of Parse* methods of the SyntaxFactory class.

So basically one function call will parse the snippet and return an AST for the given pattern:

var patternAst = SyntaxFactory.ParseExpression("Assert.That(_, Is.EqualTo(_))");

The one line above is equivalent to a wall of code like this:

var patternAst =
    InvocationExpression(
        MemberAccessExpression(
            SyntaxKind.SimpleMemberAccessExpression,
            IdentifierName("Assert"),
            IdentifierName("That")))
    .WithArgumentList(
        ArgumentList(
            SeparatedList<ArgumentSyntax>(
                new SyntaxNodeOrToken[]{
                    Argument(
                        IdentifierName("_")),
                    Token(SyntaxKind.CommaToken),
                    Argument(
                        InvocationExpression(
                            MemberAccessExpression(
                                SyntaxKind.SimpleMemberAccessExpression,
                                IdentifierName("Is"),
                                IdentifierName("EqualTo")))
                        .WithArgumentList(
                            ArgumentList(
                                SingletonSeparatedList<ArgumentSyntax>(
                                    Argument(
                                        IdentifierName("_"))))))})));

It feels like a total win already and we have not even done anything useful yet. But let’s find this pattern in a source AST. First, we need to parse the file we’re searching in:

var sourceAst = CSharpSyntaxTree.ParseText(File.ReadAllText(filename));

This gives us the list of all expression nodes in the AST:

var nodes = sourceAst.GetRoot().DescendantNodes().OfType<ExpressionSyntax>();

And now we find the nodes that match:

foreach (var e in nodes)
{
    if (Ast.Match(e, patternAst))
    {
        var line = e.GetLocation().GetLineSpan().StartLinePosition.Line;
        var code = e.NormalizeWhitespace();
        Console.WriteLine($"  {line}: {code}");
    }
}

Obviously the Ast.Match function is the tricky one. But not as tricky, really. We recursively traverse both ASTs in parallel and see if they match:

public bool Match(SyntaxNode code, SyntaxNode pattern)
{
    // A placeholder matches anything
    if (IsPlaceholder(pattern))
        return true;

    // Node types don't match. Clearly not a match.
    if (code.GetType() != pattern.GetType())
        return false;

    switch (code)
    {
    case ArgumentSyntax c:
        {
            var p = (ArgumentSyntax)pattern;
            return Match(c.Expression, p.Expression);
        }
    case ArgumentListSyntax c:
        {
            var p = (ArgumentListSyntax)pattern;
            return Match(c.OpenParenToken, p.OpenParenToken)
                && Match(c.Arguments, p.Arguments)
                && Match(c.CloseParenToken, p.CloseParenToken);
        }
    case IdentifierNameSyntax c:
        {
            var p = (IdentifierNameSyntax)pattern;
            return Match(c.Identifier, p.Identifier);
        }
    case InvocationExpressionSyntax c:
        {
            var p = (InvocationExpressionSyntax)pattern;
            return Match(c.Expression, p.Expression)
                && Match(c.ArgumentList, p.ArgumentList);
        }
    case LiteralExpressionSyntax c:
        {
            var p = (LiteralExpressionSyntax)pattern;
            return Match(c.Token, p.Token);
        }
    case MemberAccessExpressionSyntax c:
        {
            var p = (MemberAccessExpressionSyntax)pattern;
            return Match(c.Expression, p.Expression)
                && Match(c.Name, p.Name);
        }
    case GenericNameSyntax c:
        {
            var p = (GenericNameSyntax)pattern;
            return Match(c.Identifier, p.Identifier)
                && Match(c.TypeArgumentList, p.TypeArgumentList);
        }
    case TypeArgumentListSyntax c:
        {
            var p = (TypeArgumentListSyntax)pattern;
            return Match(c.LessThanToken, p.LessThanToken)
                && Match(c.Arguments, p.Arguments)
                && Match(c.GreaterThanToken, p.GreaterThanToken);
        }
    default:
        return false;
    }
}

So it’s basically a giant switch with every node type in it. By far not every type is covered here, just those that I needed to get my examples to work. I imagine to cover the most of C# syntax I’d have to tediously write a couple of thousand lines of repetitive code. I’m not going to do it all any time soon. Just the stuff I need to cover my use cases.

With a few more lines of code added this already becomes a useful tool for searching for code patterns in a codebase. Next time we see how we can implement the replace part. The goal was to refactor, not just to search, wasn’t it? I have some ideas on how it could be done. See you next time.

Conclusion

Thanks to Roslyn awesome API with just 172 lines of code we have a pretty advanced code grep. Surely, it’s just a toy and a proof of concept at the moment. It would take a serious effort to make it something more than that. But I’m happy with what is possible with so little effort. Amazing.

Also published on DEV and Medium