Dapper multi-mapping relationships and value equality

.NET

444 views

Dapper multi-mapping relationships and value equality blog meme

The other day I was deep in the weeds of some data access code that relies heavily on Dapper for talking to a Postgres database, where most curmudgeonly developers like myself tend to find themselves when fine-tuning an application, and landed on a solution to a particular problem I was having in the best form possible: removing code entirely.

Value equality for free

As .NET devs, we've all probably been using IEqualityComparer<T> or IEquatable<T> for some time well before record types we're a thing, in essence creating our own value objects to use for value object equality. In other terms, asserting object A is equivalent to object B so long as their property values are same. Using a sample .NET 8 console app as a sandbox, this is easy to play around with:

var instanceA = new SomeClass(42069);
var instanceB = new SomeClass(42069);

Console.WriteLine(instanceA.GetHashCode());
Console.WriteLine(instanceB.GetHashCode());

// Will print to the console:
//
// 53036123
// 7563067

// False, instance A and B have different hash codes!
Debug.Assert(instanceA == instanceB);

public class SomeClass(int SomeValue);

With C# 9.0 record types, this changed the game quite a bit for us:

var instanceA = new SomeClass(42069);
var instanceB = new SomeClass(42069);

Console.WriteLine(instanceA.GetHashCode());
Console.WriteLine(instanceB.GetHashCode());

// Will print to the console:
//
// -1903833624
// -1903833624

// True, hash codes are the same!
Debug.Assert(instanceA == instanceB);

public record SomeClass(int SomeValue);

Record types have been out for nearly half a decade at this point, and are usually the first type of object definition I reach for when writing my code that mostly uses anemic domain models of sorts. While not always the correct choice (complex service objects are better suited for class objects, as an example), they're perfect for modeling value objects.

For those of us that have been .NET'ing for quite some time now, this isn't exactly breaking news. Unfortunately, a lot of the .NET world runs on legacy code where record types do not exist. To get the same value-based equality for class-based objects, we'd implement an IEqualityComparer<T> or IEquatable<T> contract for the class we're using as a value object. Using the same example from above (and yes, I'm mixing modern C# features to demo a legacy pattern):

var instanceA = new SomeClass(42069);
var instanceB = new SomeClass(42069);

Console.WriteLine(instanceA.GetHashCode());
Console.WriteLine(instanceB.GetHashCode());

// Will print to the console:
//
// 42069
// 42069

// True! Instance A and B have the same hash codes after implementing overrides
Debug.Assert(instanceA == instanceB);

public class SomeClass(int someValue) : IEquatable<SomeClass>
{
    public int SomeValue { get; } = someValue;

    public bool Equals(SomeClass? other)
    {
        if (other is null)
        {
            return false;
        }

        // We'd probably want to assert things like reference and
        // type equality, but we'll save that for a rainy day

        return SomeValue == other.SomeValue;
    }

    public static bool operator ==(SomeClass? left, SomeClass? right)
    {
        if (left is null && right is null)
        {
            return true;
        }

        if (left is null && right is not null)
        {
            return false;
        }

        if (left is not null && right is null)
        {
            return false;
        }

        return left!.Equals(right);
    }

    public static bool operator !=(SomeClass? left, SomeClass? right) => !(left == right);

    public override int GetHashCode() => SomeValue.GetHashCode();
}

Okay, great. Now our objects are equivalent by value... but that's a s@$t ton of boilerplate code that we need to sprinkle across our code base for something as simple as making sure that two objects are considered equal if we're only looking at their property values.

So in summary, we use record types when we need them, and implement equality contracts when we can't use them. This concludes my TED talk.

Dapper, relationships, and multi-mapping

Okay, I lied, there's more to my TED talk. So I've been blindly using record types now for a while and recently came across a scenario where I really appreciated their power and flexibility while using Dapper to query for data that had a few one-to-many relationships going on. The root of the problem was not Dapper nor the relationships themselves, but rather Dapper's multi-mapping over model queries that have two or more one-to-many relationships where the result set might by jagged, i.e. the parent model on the query has two one-to-many relationships where each JOIN has a differing number of rows that'll relate to the parent model.

Okay, that's a bunch of jargon, so let's break it down using something I think we all can understand: beer.

Suppose we're modeling the relationships within a brewery CRM SaaS app called HopTracker that manages a brewery's beer inventory and employees working for said brewery. We can think of this in terms of a data model where a brewery:

  • has many beers
  • has many employees

Though beers and employees associated to the brewery don't necessarily have a direct link to each, if we think in terms of a SQL query that's gets us an aggregate view of all of a brewery's beers and employees, it might look something like:

SELECT br.id,
       br.brewery_name AS name,
       b.id            AS beer_id,
       b.beer_name,
       b.beer_type,
       e.id            AS employee_id,
       e.first_name,
       e.last_name
FROM breweries br
         JOIN beers b ON br.id = b.brewery_id
         JOIN employees e ON br.id = e.brewery_id
WHERE br.id = @BreweryId

We might get something like:

Brewery ID Name Beer ID Beer Name Beer Type Employee ID First Name Last Name
1 Brewery A 101 Beer A Ale 201 John Doe
1 Brewery A 102 Beer B Lager 201 John Doe
2 Brewery B 103 Beer X Stout 202 Jane Smith
3 Brewery C 104 Beer Y IPA 203 Michael Johnson

If we're using EF, a lot of the behind the scenes mapping will happen within the internals of our .Include() relation mappings that we might have on a Brewery entity model. There's a good chance a brewery will have many beers and many employees, and we see from our example result set we'll get back some duplicate rows in the case we have different numbers of related entities joined on the parent record. To capture all the data from this query within a model, I might define an aggregate model to pick up all the data I'm interested in:

public class BreweryAggregate
{
    public required int Id { get; init; }

    public required string Name { get; init; }

    public ICollection<Beer> Beers { get; } = [];

    public ICollection<Employee> Employees { get; } = [];

    public override string ToString()
    {
        return $"""
            Id: {Id}
            Name: {Name}
            Beers: [{string.Join(", ", Beers.Select(b => b.Name))}]
            Employees: [{string.Join(", ", Employees.Select(e => $"{e.FirstName} {e.LastName}"))}]
            """;
    }
}

Where I might have some accompanying Beer and Employee entity models (with an accompanying model to capture common fields):

public class Beer : EntityBase
{
    public required string Name { get; init; }

    public required string Type { get; init; }

    public required int BreweryId { get; init; }
}

public class Brewery : EntityBase
{
    public required string Name { get; init; }
}

public class Employee : EntityBase
{
    public required string FirstName { get; init; }

    public required string LastName { get; init; }

    public required int BreweryId { get; init; }
}

public abstract class EntityBase
{
    public required int Id { get; init; }

    public required DateTime CreatedAt { get; init; }

    public required DateTime UpdatedAt { get; init; }
}

For the purpose of this example I've spun up a Postgres database with the help of Docker volume mapping and initialization. To model this scenario, our SQL will look something like:

-- The database initialization script is used for defining your local schema as well as postgres
-- running within a docker container, where we'll copy this file over and run on startup

DO
    $$
BEGIN IF NOT EXISTS (SELECT 1 FROM pg_database WHERE datname = 'hop_tracker') THEN
CREATE DATABASE hop_tracker;
END IF;
END
$$;

\
c hop_tracker;

DROP TABLE IF EXISTS breweries CASCADE;
CREATE TABLE breweries
(
    id           SERIAL PRIMARY KEY,
    brewery_name VARCHAR(255) NOT NULL,
    created_at   TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at   TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP
);

DROP TABLE IF EXISTS beers CASCADE;
DROP
TYPE IF EXISTS beer_type;

CREATE
TYPE beer_type AS ENUM ('ipa', 'double_ipa');
CREATE TABLE beers
(
    id         SERIAL PRIMARY KEY,
    brewery_id INTEGER      NOT NULL,
    beer_name  VARCHAR(255) NOT NULL,
    beer_type beer_type NOT NULL,
    created_at TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT fk_brewery_id FOREIGN KEY (brewery_id) REFERENCES breweries (id)
);

DROP TABLE IF EXISTS employees CASCADE;
CREATE TABLE employees
(
    id         SERIAL PRIMARY KEY,
    first_name VARCHAR(255) NOT NULL,
    last_name  VARCHAR(255) NOT NULL,
    brewery_id INTEGER      NOT NULL,
    created_at TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT fk_brewery_id FOREIGN KEY (brewery_id) REFERENCES breweries (id)
);

-- Seed a brewery into the database that we can attach employees and beers to
INSERT INTO breweries (brewery_name)
VALUES ('Fall River Brewery');

-- Next, seed some beer tasty beers
INSERT INTO beers (brewery_id, beer_name, beer_type)
VALUES ((SELECT id
         FROM breweries
         WHERE brewery_name = 'Fall River Brewery'), 'Hexagenia', 'ipa'),
       ((SELECT id
         FROM breweries
         WHERE brewery_name = 'Fall River Brewery'), 'Widowmaker', 'double_ipa');

-- Then, seed some loyal employees
INSERT INTO employees (first_name, last_name, brewery_id)
VALUES ('Sam', 'Adams', (SELECT id
                         FROM breweries
                         WHERE brewery_name = 'Fall River Brewery'));

Now within some data access code, I might use the aggregate model to retrieve the data and map it out:

public async Task<BreweryAggregate?> GetBreweryAsync(int id, CancellationToken cancellationToken)
{
    const string sql = """
                       SELECT br.id,
                              br.brewery_name AS name,
                              br.created_at,
                              br.updated_at,
                              b.id AS beer_id,
                              b.beer_name AS name, // Column alias here to avoid clashing with psql's `name` keyword
                              b.beer_type AS type, // Column alias here to avoid clashing with psql's `type` keyword
                              b.created_at,
                              b.updated_at,
                              e.id AS employee_id,
                              e.first_name,
                              e.last_name,
                              e.created_at,
                              e.updated_at
                       FROM breweries br
                       JOIN beers b ON br.id = b.brewery_id
                       JOIN employees e ON br.id = e.brewery_id
                       WHERE br.id = @BreweryId
                       """;

    BreweryAggregate? breweryAggregate = null;

    await _connection.QueryAsync<BreweryAggregate, Beer, Employee, BreweryAggregate>(
        sql,
        (brewery, beer, employee) =>
        {
            // On first pass, we'll assign the brewery based on the first row returned
            breweryAggregate ??= brewery;

            // Next, we'll add the relationships
            breweryAggregate.Beers.Add(beer);
            breweryAggregate.Employees.Add(employee);

            return breweryAggregate;
        },
        new { BreweryId = id },
        splitOn: "beer_id, employee_id"
    );

    return breweryAggregate;
}

At a quick glance, the code above:

  • Defines a query to pull all the data we need for a brewery with all the included relations to beers and employees
  • Defines some column breaks, or splitOn in Dapper terms, to tell Dapper what section of the resulting columns are associated to the models we want mapped
  • Defines a closure for Dapper to callback to for each record we get back in the result
    • We'll map the aggregate model on first pass, where subsequent iterations won't be assigned since we're using the ??= null-coalescing assignment operator
    • On each pass, we'll also add the associated beer/employee that Dapper mapped out for us in the brewery aggregates list relationships (foreshadowing intensifies)
  • Lastly, we pass the query function an anonymous object to use in order to parameterize the query so our security officers don't slap us with an 8-hour OWASP training on SQL injection

To use this code, I've spun up a simple console app that uses Npgsql as a Postgres driver. Using a few helper .NET libraries for configuration, I can run some code to pull from the database and pretty-print to the console a friendly version of our BreweryAggregate model:

using System.Diagnostics;
using Dapper;
using HopTracker;
using HopTracker.Repositories;
using Microsoft.Extensions.Configuration;

var config = new ConfigurationBuilder().AddUserSecrets<Configuration>().Build();
using var breweryRepository = new BreweryRepository(config.GetConnectionString("Postgres"));
using var tokenSource = new CancellationTokenSource(TimeSpan.FromSeconds(1));

// Sticking with Postgres convention of snake_case columns and rows, We'll tell Dapper to map those out to their PascalCase equivalents by default
DefaultTypeMap.MatchNamesWithUnderscores = true;

var cancellationToken = tokenSource.Token;
var brewery = await breweryRepository.GetBreweryAsync(1, cancellationToken);

Console.WriteLine(brewery);

Running this code, we'll see something interesting in the console:

$ dotnet run

Id: 1
Name: Fall River Brewery
Beers: [Hexagenia, Widowmaker]
Employees: [Sam Adams, Sam Adams] // Oh no, duplicate employees!

So we've got ourselves a problem. Our multi-mapping closure pushed in the same employee twice, because we had two rows returned as we JOINed on the beers table that had multiple beers associated to the brewery with only one employee (talk about bootstrapping your business).

We have a few options here to combat this scenario to guarantee that we're only ever returned unique beers and employees mapped back to the brewery aggregate model:

  • Check if the employee/beer already exists in the current list of beers/employees that are mapped to the brewery aggregate
  • Use record types

Let's pretend it 2017 and we don't have record types just yet. If we take option 1, we'll probably want an equality contract in place on our Employee model, so we can use a .Contains() with the comparer. While we could just use a string comparison directly within the multi-mapping action on the names and brewery ID, equality contracts are a bit more robust allowing us to reuse the value equality logic elsewhere in our applications. Implementing the IEquatable<Employee> contract would have our Employee now looking something like:

namespace HopTracker.Entities;

public class Employee : EntityBase, IEquatable<Employee>
{
    public required string FirstName { get; init; }

    public required string LastName { get; init; }

    public required int BreweryId { get; init; }

    public bool Equals(Employee? other)
    {
        if (other is null)
        {
            return false;
        }

        // Again, we *might* want to check for reference and type equality... but I'm too lazy to do that right now

        return string.Equals(FirstName, other.FirstName, StringComparison.CurrentCultureIgnoreCase)
            && string.Equals(LastName, other.LastName, StringComparison.CurrentCultureIgnoreCase)
            && BreweryId == other.BreweryId;
    }

    public static bool operator ==(Employee? left, Employee? right)
    {
        if (left is null && right is null)
        {
            return true;
        }

        return left?.Equals(right) ?? false;
    }

    public static bool operator !=(Employee? left, Employee? right) => !(left == right);
}

And then, we can update our model mapping query:

public async Task<BreweryAggregate?> GetBreweryAsync(int id, CancellationToken cancellationToken)
{
    // ...other stuff

    await _connection.QueryAsync<BreweryAggregate, Beer, Employee, BreweryAggregate>(
        sql,
        (brewery, beer, employee) =>
        {
            // On first pass, we'll assign the brewery based on the first row returned
            breweryAggregate ??= brewery;

            // Next, we'll add the relationships
            breweryAggregate.Beers.Add(beer);

            // Only add the employee *if* we haven't done so already, where
            // `.Contains()` will implicitly use the equality contract we've defined
            if (!breweryAggregate.Employees.Contains(employee))
            {
                breweryAggregate.Employees.Add(employee);
            }

            return breweryAggregate;
        },
        new { BreweryId = id },
        splitOn: "beer_id, employee_id"
    );

    return breweryAggregate;

Now if we run our code:

$ dotnet run

Id: 1
Name: Fall River Brewery
Beers: [Hexagenia, Widowmaker]
Employees: [Sam Adams] // Nice, only 1 employee as expected!

Great, we've got our employee that we've expected to be there without any duplicates. But... we have the technology to make this even easier. Let's move all our entity models to record types, and we can get rid of the equality contract altogether:


public record Beer : EntityBase
{
    public required string Name { get; init; }

    public required string Type { get; init; }

    public required int BreweryId { get; init; }
}

public record Brewery : EntityBase
{
    public required string Name { get; init; }
}

public record Employee : EntityBase
{
    public required string FirstName { get; init; }

    public required string LastName { get; init; }

    public required int BreweryId { get; init; }
}

public abstract record EntityBase
{
    public required int Id { get; init; }

    public required DateTime CreatedAt { get; init; }

    public required DateTime UpdatedAt { get; init; }
}

And if we run our code yet again:

$ dotnet run

Id: 1
Name: Fall River Brewery
Beers: [Hexagenia, Widowmaker]
Employees: [Sam Adams] // Still no duplicates!

We're still checking if our brewery aggregate employees list .Contains() the employee, which will use the generated equality contract behind the scenes (take a look at sharplab and play around with class/record types). What if we could make this code even leaner? Turns out, we can by using HashSets. Let's update our BreweryAggregate to utilize those instead:

using HopTracker.Entities;

namespace HopTracker.Aggregates;

public record BreweryAggregate
{
    // ...other stuff
    
    public ISet<Beer> Beers { get; } = new HashSet<Beer>();

    public ISet<Employee> Employees { get; } = new HashSet<Employee>();

    // ..other stuff
}

And now our data access method can be updated to remove the call to .Contains() within our multi-mapping action:

public async Task<BreweryAggregate?> GetBreweryAsync(int id, CancellationToken cancellationToken)
{
    // ...other stuff

    await _connection.QueryAsync<BreweryAggregate, Beer, Employee, BreweryAggregate>(
        sql,
        (brewery, beer, employee) =>
        {
            // On first pass, we'll assign the brewery based on the first row returned
            breweryAggregate ??= brewery;

            // Next, we'll add the relationships
            breweryAggregate.Beers.Add(beer);

            // No `.Contains()` check needed, we get value equality for free and the set will check it for us
            breweryAggregate.Employees.Add(employee);

            return breweryAggregate;
        },
        new { BreweryId = id },
        splitOn: "beer_id, employee_id"
    );

    return breweryAggregate;

And again, if we run our code:

$ dotnet run

Id: 1
Name: Fall River Brewery
Beers: [Hexagenia, Widowmaker]
Employees: [Sam Adams] // *Still* no duplicates!

As we expect, no duplicate employees, and we got to use even less code. I don't know about you, but my favorite code to write is the code I don't have to write at all (trademarking that).

TL;DR

Model relationships and Dapper can be tricky at first, as it requires a bit more elbow grease to get the same niceties of EF out-of-the-box, but easily achievable thanks to record types and sets. For those interested, all the sample code for this post can be found here on my GitHub. As always, I hope someone out there finds this useful.

Until next time, friends!