Dapper multi-mapping relationships and value equality
The other day I was deep in the weeds of some data access code that relies heavily on Dapper for talking to a Postgres database, where most curmudgeonly developers like myself tend to find themselves when fine tuning an application, and landed on a solution to a particular problem I was having in the best form possible: removing code entirely.
It's free value equality
As .NET devs, we've all probably been using IEqualityComparer<T>
or IEquatable<T>
for some time well before record
types we're a thing,
in essence creating our own value objects to use for value object equality. In other terms, asserting object A is
equivalent
to object B so long as their property values are same. Using a sample .NET 8 console app as a sandbox, this is easy to
play around with:
var instanceA = new SomeClass(42069);
var instanceB = new SomeClass(42069);
Console.WriteLine(instanceA.GetHashCode());
Console.WriteLine(instanceB.GetHashCode());
// Will print to the console:
//
// 53036123
// 7563067
// False, instance A and B have different hash codes!
Debug.Assert(instanceA == instanceB);
public class SomeClass(int SomeValue);
With C# 8.0 record
types, this changed the game quite a bit for us:
var instanceA = new SomeClass(42069);
var instanceB = new SomeClass(42069);
Console.WriteLine(instanceA.GetHashCode());
Console.WriteLine(instanceB.GetHashCode());
// Will print to the console:
//
// -1903833624
// -1903833624
// True, hash codes are the same!
Debug.Assert(instanceA == instanceB);
public record SomeClass(int SomeValue);
Record types have been out for nearly half a decade at this point, and are usually the first type
of object definition I reach for when writing my code that mostly uses anemic domain models of sorts.
While not always the correct choice (complex service objects are better suited for class
objects, as an example),
they're perfect for modeling value objects.
For those of us that have been .NET'ing for quite some time now, this isn't exactly breaking news. Unfortunately, a
lot of the .NET world runs on legacy code where record
types do not exist. To get
the same value-based equality for class
-based objects, we'd implement an IEqualityComparer<T>
or
IEquatable<T>
contract for the class we're using as a value object. Using the same example from above (and yes, I'm
mixing modern C# features to demo a legacy pattern):
var instanceA = new SomeClass(42069);
var instanceB = new SomeClass(42069);
Console.WriteLine(instanceA.GetHashCode());
Console.WriteLine(instanceB.GetHashCode());
// Will print to the console:
//
// 42069
// 42069
// True! Instance A and B have the same hash codes after implementing overrides
Debug.Assert(instanceA == instanceB);
public class SomeClass(int someValue) : IEquatable<SomeClass>
{
public int SomeValue { get; } = someValue;
public bool Equals(SomeClass? other)
{
if (other is null)
{
return false;
}
// We'd probably want to assert things like reference and
// type equality, but we'll save that for a rainy day
return SomeValue == other.SomeValue;
}
public static bool operator ==(SomeClass? left, SomeClass? right)
{
if (left is null && right is null)
{
return true;
}
if (left is null && right is not null)
{
return false;
}
if (left is not null && right is null)
{
return false;
}
return left!.Equals(right);
}
public static bool operator !=(SomeClass? left, SomeClass? right) => !(left == right);
public override int GetHashCode() => SomeValue.GetHashCode();
}
Okay, great. Now our objects are equivalent by value... but that's a s@$t ton of boiler plater code that we need to sprinkle across our code base for something as simple as making sure that two objects are considered equal if we're only looking at their property values.
So in summary, we use record
types when we need them, and implement equality contracts when we can't use them. This
concludes my TED talk.
Dapper, relationships, and multi-mapping
Okay, I lied, there's more to my TED talk. So I've been blindly using record
types now for awhile and
recently came across a scenario where I really appreciated their power and flexibility while using Dapper to query for
data that had a few one-to-many relationships going on. The root of the problem was not Dapper nor the relationships
themselves, but rather Dapper's multi-mapping over model queries that have two or more one-to-many relationships where
the result set might by jagged, i.e. the parent model on the query has two one-to-many relationships where each JOIN
has a differing number of rows that'll relate to the parent model.
Okay, that's a bunch of jargon, so let's break it down using something I think we all can understand: beer.
Suppose we're modeling the relationships within a brewery CRM SaaS app of sorts that manages a breweries beer inventory and employees working for said breweries. We can think of this in terms of a data model where a brewery:
- has many beers
- has many employees
Though beers and employees associated to the brewery don't necessarily have a direct link to each, if we think in terms of a SQL query that's gets us an aggregate view of all of a brewery's beers and employees, it might looks something like:
SELECT br.id,
br.brewery_name AS name,
b.id AS beer_id,
b.beer_name,
b.beer_type,
e.id AS employee_id,
e.first_name,
e.last_name
FROM breweries br
JOIN beers b ON br.id = b.brewery_id
JOIN employees e ON br.id = e.brewery_id
WHERE br.id = @BreweryId
We might get something like:
Brewery ID | Name | Beer ID | Beer Name | Beer Type | Employee ID | First Name | Last Name |
---|---|---|---|---|---|---|---|
1 | Brewery A | 101 | Beer A | Ale | 201 | John | Doe |
1 | Brewery A | 102 | Beer B | Lager | 201 | John | Doe |
2 | Brewery B | 103 | Beer X | Stout | 202 | Jane | Smith |
3 | Brewery C | 104 | Beer Y | IPA | 203 | Michael | Johnson |
If we're using EF, a lot of the behind the scenes mapping will happen behind the scenes of our .Include()
relation
mappings we might have on a Brewery
entity model. There's a good chance a brewery will have many beers and many
employees, and we see from our example result set we'll get back some duplicate rows in the case we have different
numbers of related entities. To capture all the data from this query within a model, I might define an aggregate
model to pick up all the data I'm interested in:
public class BreweryAggregate
{
public required int Id { get; init; }
public required string Name { get; init; }
public ICollection<Beer> Beers { get; } = [];
public ICollection<Employee> Employees { get; } = [];
public override string ToString()
{
return $"""
Id: {Id}
Name: {Name}
Beers: [{string.Join(", ", Beers.Select(b => b.Name))}]
Employees: [{string.Join(", ", Employees.Select(e => $"{e.FirstName} {e.LastName}"))}]
""";
}
}
Where I might have some accompanying Beer
and Employee
entity models (with an accompanying model to capture common
fields):
public class Beer : EntityBase
{
public required string Name { get; init; }
public required string Type { get; init; }
public required int BreweryId { get; init; }
}
public class Brewery : EntityBase
{
public required string Name { get; init; }
}
public class Employee : EntityBase
{
public required string FirstName { get; init; }
public required string LastName { get; init; }
public required int BreweryId { get; init; }
}
public abstract class EntityBase
{
public required int Id { get; init; }
public required DateTime CreatedAt { get; init; }
public required DateTime UpdatedAt { get; init; }
}
Now within some data access code, I might use this aggregate model to retrieve the data and map it out:
public async Task<BreweryAggregate?> GetBreweryAsync(int id, CancellationToken cancellationToken)
{
const string sql = """
SELECT br.id,
br.brewery_name AS name,
br.created_at,
br.updated_at,
b.id AS beer_id,
b.beer_name AS name, // Column alias here to avoid clashing with psql's `name` keyword
b.beer_type AS type, // Column alias here to avoid clashing with psql's `type` keyword
b.created_at,
b.updated_at,
e.id AS employee_id,
e.first_name,
e.last_name,
e.created_at,
e.updated_at
FROM breweries br
JOIN beers b ON br.id = b.brewery_id
JOIN employees e ON br.id = e.brewery_id
WHERE br.id = @BreweryId
""";
BreweryAggregate? breweryAggregate = null;
await _connection.QueryAsync<BreweryAggregate, Beer, Employee, BreweryAggregate>(
sql,
(brewery, beer, employee) =>
{
// On first pass, we'll assign the brewery based on the first row returned
breweryAggregate ??= brewery;
// Next, we'll add the relationships
breweryAggregate.Beers.Add(beer);
breweryAggregate.Employees.Add(employee);
return breweryAggregate;
},
new { BreweryId = id },
splitOn: "beer_id, employee_id"
);
return breweryAggregate;
}
At a quick glance, the code above:
- Defines a query to pull all the data we need for a brewery with all the included relations to beers and employees
- Defines some column breaks, or
splitOn
in Dapper terms, to tell Dapper what columns are associated to the models we want mapped - Defines a closure for Dapper to callback to for each record we get back in the result
- We'll map the aggregate model on first pass, where subsequent iterations won't be assigned since we're using
the
??=
null-coalescing assignment operator - On each pass, we'll also add the associated beer/employee that Dapper mapped out for us in the brewery aggregates list relationships (foreshadowing intensifies)
- We'll map the aggregate model on first pass, where subsequent iterations won't be assigned since we're using
the
- Lastly, we pass the query function an anonymous object to use in order to parameterize the query so our security officers don't slap us with an 8 hour OWASP training on SQL injection
To use this code, I've spun up a simple console app that uses Npgsql as a Postgres driver.
Using a few helper .NET libraries for configuration, I can run some code to pull from the database and pretty-print to
the console a friendly version of our BreweryAggregate
model:
using System.Diagnostics;
using Dapper;
using HopTracker;
using HopTracker.Repositories;
using Microsoft.Extensions.Configuration;
var config = new ConfigurationBuilder().AddUserSecrets<Configuration>().Build();
using var breweryRepository = new BreweryRepository(config.GetConnectionString("Postgres"));
using var tokenSource = new CancellationTokenSource(TimeSpan.FromSeconds(1));
// Sticking with Postgres convention of snake_case columns and rows, We'll tell Dapper to map those out to their PascalCase equivalents by default
DefaultTypeMap.MatchNamesWithUnderscores = true;
var cancellationToken = tokenSource.Token;
var brewery = await breweryRepository.GetBreweryAsync(1, cancellationToken);
Console.WriteLine(brewery);
Running this code, we'll see something interesting in the console:
$ dotnet run
Id: 1
Name: Fall River Brewery
Beers: [Hexagenia, Widowmaker]
Employees: [Sam Adams, Sam Adams] // Oh no, duplicate employees!
So we've gotta problem. Our multi-mapping closure pushed in the same employee twice, because we had two rows returned
as we JOIN
ed on the beers table that had multiple beers associated to the brewery with only one employee (talk about
bootstrapping your business).
We've gotta few options here to combat this scenario to guarantee that we're only ever returned unique beers and employees mapped back to the brewery aggregate model:
- Check if the employee/beer already exists in the current list of beers/employees that are mapped to the brewery aggregate
- Use
record
types
Let's pretend it 2017 and we don't have record types just yet. If we take option 1, we'll probably want an equality
contract in place on our Employee
model so we can use a .Contains()
with the comparer. Implementing
the IEquatable<Employee>
contract would have our Employee
now looking something like:
namespace HopTracker.Entities;
public class Employee : EntityBase, IEquatable<Employee>
{
public required string FirstName { get; init; }
public required string LastName { get; init; }
public required int BreweryId { get; init; }
public bool Equals(Employee? other)
{
if (other is null)
{
return false;
}
// Again, we *might* want to check for reference and type equality... but I'm too lazy to do that right now
return string.Equals(FirstName, other.FirstName, StringComparison.CurrentCultureIgnoreCase)
&& string.Equals(LastName, other.LastName, StringComparison.CurrentCultureIgnoreCase)
&& BreweryId == other.BreweryId;
}
public static bool operator ==(Employee? left, Employee? right)
{
if (left is null && right is null)
{
return true;
}
return left?.Equals(right) ?? false;
}
public static bool operator !=(Employee? left, Employee? right) => !(left == right);
}
And then, we can update our model mapping query:
public async Task<BreweryAggregate?> GetBreweryAsync(int id, CancellationToken cancellationToken)
{
// ...other stuff
await _connection.QueryAsync<BreweryAggregate, Beer, Employee, BreweryAggregate>(
sql,
(brewery, beer, employee) =>
{
// On first pass, we'll assign the brewery based on the first row returned
breweryAggregate ??= brewery;
// Next, we'll add the relationships
breweryAggregate.Beers.Add(beer);
// Only add the employee *if* we haven't done so already, where
// `.Contains()` will implicitly use the equality contract we've defined
if (!breweryAggregate.Employees.Contains(employee))
{
breweryAggregate.Employees.Add(employee);
}
return breweryAggregate;
},
new { BreweryId = id },
splitOn: "beer_id, employee_id"
);
return breweryAggregate;
Now if we run our code:
$ dotnet run
Id: 1
Name: Fall River Brewery
Beers: [Hexagenia, Widowmaker]
Employees: [Sam Adams] // Nice, only 1 employee as expected!
Great, we've got our employee that we've expected to be there without any duplicates. But... we have the technology to make this even easier. Let's move all our entity models to record types, and we can get rid of the equality contract altogether:
public record Beer : EntityBase
{
public required string Name { get; init; }
public required string Type { get; init; }
public required int BreweryId { get; init; }
}
public record Brewery : EntityBase
{
public required string Name { get; init; }
}
public record Employee : EntityBase
{
public required string FirstName { get; init; }
public required string LastName { get; init; }
public required int BreweryId { get; init; }
}
public abstract record EntityBase
{
public required int Id { get; init; }
public required DateTime CreatedAt { get; init; }
public required DateTime UpdatedAt { get; init; }
}
And if we run our code yet again:
$ dotnet run
Id: 1
Name: Fall River Brewery
Beers: [Hexagenia, Widowmaker]
Employees: [Sam Adams] // Still no duplicates!
We're still checking if our brewery aggregate employees list .Contains()
the employee, which will use the generated
equality contract behind the scenes (take a look at sharplab and play around
with class
/record
types). What if we could make this code even leaner? Turns out, we can by using HashSet
s.
Let's update our BreweryAggregate
to utilize those instead:
public async Task<BreweryAggregate?> GetBreweryAsync(int id, CancellationToken cancellationToken)
{
// ...other stuff
await _connection.QueryAsync<BreweryAggregate, Beer, Employee, BreweryAggregate>(
sql,
(brewery, beer, employee) =>
{
// On first pass, we'll assign the brewery based on the first row returned
breweryAggregate ??= brewery;
// Next, we'll add the relationships
breweryAggregate.Beers.Add(beer);
// No `.Contains()` check needed, we get value equality for free and the set will check it for us
breweryAggregate.Employees.Add(employee);
return breweryAggregate;
},
new { BreweryId = id },
splitOn: "beer_id, employee_id"
);
return breweryAggregate;
And again, if we run our code:
$ dotnet run
Id: 1
Name: Fall River Brewery
Beers: [Hexagenia, Widowmaker]
Employees: [Sam Adams] // *Still* no duplicates!
As we expect, no duplicate employees and we got to use even less code. My favorite code to write is the code I don't have to write (trademarking that).
TL;DR
Model relationships and Dapper can be tricky at first, as it requires a bit more elbow grease to get the same niceties of EF out-of-the-box. I've always been a fan of slim ORMs and firmly believe in the power of raw SQL. All of the sample code for this post can be found here on my GitHub. As always, I hope someone out there finds this useful.
Until next time, friends!