A conditional sandwich example by Mark Seemann
An F# example of reducing a workflow to an impureim sandwich.
The most common reaction to the impureim sandwich architecture (also known as functional core, imperative shell) is one of incredulity. How does this way of organising code generalise to arbitrary complexity?
The short answer is that it doesn't. Given sufficient complexity, you may not be able to 1. gather all data with impure queries, 2. call a pure function, and 3. apply the return value via impure actions. The question is: How much complexity is required before you have to give up on the impureim sandwich?
There's probably a fuzzy transition zone where the sandwich may still apply, but where it begins to be questionable whether it's beneficial. In my experience, this transition seems to lie further to the right than most people think.
Once you have to give up on the impureim sandwich, in functional programming you may resort to using free monads. In object-oriented programming, you may use Dependency Injection. Depending on language and paradigm, still more options may be available.
My experience is mostly with web-based systems, but in that context, I find that a surprisingly large proportion of problems can be rephrased and organised in such a way that the impureim sandwich architecture applies. Actually, I surmise that most problems can be addressed in that way.
I am, however, often looking for good examples. As I wrote in a comment to Dependency rejection:
"I'd welcome a simplified, but still concrete example where the impure/pure/impure sandwich described here isn't going to be possible."
Such examples are, unfortunately, rare. While real production code may seem like an endless supply of examples, production code often contains irrelevant details that obscure the essence of the case. Additionally, production code is often proprietary, so I can't share it.
In 2019 Christer van der Meeren kindly supplied an example problem that I could refactor. Since then, there's been a dearth of more examples. Until now.
I recently ran into another fine example of a decision flow that at first glance seemed a poor fit for the functional core, imperative shell architecture. What follows is, actually, production code, here reproduced with the kind permission of Criipto.
Create a user if it doesn't exist #
As I've previously touched on, I'm helping Criipto integrate with the Fusebit API. This API has a user model where you can create users in the Fusebit services. Once you've created a user, a client can log on as that user and access the resources available to her, him, or it. There's an underlying security model that controls all of that.
Criipto's users may not all be provisioned as users in the Fusebit API. If they need to use the Fusebit system, we'll provision them just in time. On the other hand, there's no reason to create the user if it already exists.
But it gets more complicated than that. To fit our requirements, the user must have an associated issuer. This is another Fusebit resource that we may have to provision if it doesn't already exist.
The desired logic may be easier to follow if visualised as a flowchart.
The user must have an issuer, but an appropriate issuer may already exist. If it doesn't, we must create the issuer before we create the user.
At first blush this seems like a workflow that doesn't fit the impureim sandwich architecture. After all, you should only check for the existence of the issuer if you find that the user doesn't exist. There's a decision between the first and second impure query.
Can we resolve this problem and implement the functionality as an impureim sandwich?
Speculative prefetching #
When looking for ways to apply the functional core, imperative shell architecture, it often pays to take a step back and look at a slightly larger picture. Another way to put it is that you should think less procedurally, and more declaratively. A flowchart like the above is essentially procedural. It may prevent you from seeing other opportunities.
One of the reasons I like functional programming is that it forces me to think in a more declarative way. This helps me identify better abstractions than I might otherwise be able to think of.
The above flowchart is representative of the most common counterargument I hear: The impureim sandwich doesn't work if the code has to make a decision about a secondary query based on the result of an initial query. This is also what's at stake here. The result of the user exists query determines whether the program should query about the issuer.
The assumption is that since the user is supposed to have an issuer, if the user exists, the issuer must also exist.
Even so, would it hurt so much to query the Fusebit API up front about the issuer?
Perhaps you react to such a suggestion with distaste. After all, it seems wasteful. Why query a web service if you don't need the result? And what about performance?
Whether or not this is wasteful depends on what kind of waste you measure. If you measure bits transmitted over the network, then yes, you may see this measure increase.
It may not be as bad as you think, though. Perhaps the HTTP GET
request you're about to make has a cacheable result. Perhaps the result is already waiting in your proxy server's RAM.
Neither the Fusebit HTTP API's user resources nor its issuer resources, however, come with cache headers, so this last argument doesn't apply here. I still included it above because it's worth taking into account.
Another typical performance consideration is that this kind of potentially redundant traffic will degrade performance. Perhaps. As usual, if that's a concern: measure.
Querying the API whether a user exists is independent of the query to check if an issuer exists. This means that you could perform the two queries in parallel. Depending on the total load on the system, the difference between one HTTP request and two concurrent requests may be negligible. (It could still impact overall system performance if the system is already running close to capacity, so this isn't always a good strategy. Often, however, it's not really an issue.)
A third consideration is the statistical distribution of pathways through the system. If you consider the flowchart above, it indicates a cyclomatic complexity of 3; there are three distinct pathways.
If, however, it turns out that in 95 percent of cases the user doesn't exist, you're going to have to perform the second query (for issuer) anyway, then the difference between prefetching and conditional querying is minimal.
While some would consider this 'cheating', when aiming for the impureim sandwich architecture, these are all relevant questions to ponder. It often turns out that you can fetch all the data before passing them to a pure function. It may entail 'wasting' some electrons on queries that turn out to be unnecessary, but it may still be worth doing.
There's another kind of waste worth considering. This is the waste in developer hours if you write code that's harder to maintain than it has to be. As I recently described in an article titled Favor real dependencies for unit testing, the more you use functional programming, the less test maintenance you'll have.
Keep in mind that the scenario in this article is a server-to-server interaction. How much would a bit of extra bandwidth cost, versus wasted programmer hours?
If you can substantially simplify the code at the cost of a few dollars of hardware or network infrastructure, it's often a good trade-off.
Referential integrity #
The above flowchart implies a more subtle assumption that turns out to not hold in practice. The assumption is that all users in the system have been created the same: that all users are associated with an issuer. Thus, according to this assumption, if the user exists, then so must the issuer.
This turns out to be a false assumption. The Fusebit HTTP API doesn't enforce referential integrity. You can create a user with an issuer that doesn't exist. When creating a user, you supply only the issuer ID (a string), but the API doesn't check that an issuer with that ID exists.
Thus, just because a user exists you can't be sure that its associated issuer exists. To be sure, you'd have to check.
But this means that you'll have to perform two queries after all. The angst from the previous section turns out to be irrelevant. The flowchart is wrong.
Instead, you have two independent, but potentially parallelisable, processes:
You can't always be that lucky when you consider how to make requirements fit the impureim sandwich mould, but this is beginning to look really promising. Each of these two processes is near-trivial.
Idempotence #
Really, what's actually required is an idempotent create action. As the RESTful Web Services Cookbook describes, all HTTP verbs except POST
should be regarded as idempotent in a well-designed API. Alas, creating users and issuers are (naturally) done with POST
requests, so these operations aren't naturally idempotent.
In functional programming you often decouple decisions from effects. In order to be able to do that here, I created this discriminated union:
type Idempotent<'a> = UpToDate | Update of 'a
This type is isomorphic to option, but I found it worthwhile to introduce a distinct type for this particular purpose. Usually, if a query returns, say UserData option
, you'd interpret the Some
case as indicating that the user exists, and the None
case as indicating that the user doesn't exist.
Here, I wanted the 'populated' case to indicate that an Update
action is required. If I'd used option
, then I would have had to map the user-doesn't-exist case to a Some
value, and the user-exists case to a None
value. I though that this might be confusing to other programmers, since it'd go against the usual idiomatic use of the type.
That's the reason I created a custom type.
The UpToDate
case indicates that the value exists and is up to date. The Update
case is worded in the imperative to indicate that the value (of type 'a
) should be updated.
Establish #
The purpose of this entire exercise is to establish that a user (and issuer) exists. It's okay if the user already exists, but if it doesn't, we should create it.
I mulled over the terminology and liked the verb establish, to the consternation of many Twitter users.
CreateUserIfNotExists is a crude name 🤢
How about EstablishUser instead?
"Establish" can both mean to "set up on a firm or permanent basis" and "show (something) to be true or certain by determining the facts". That seems to say the same in a more succinct way 👌
Just read the comments to see how divisive that little idea is. Still, I decided to define a function called establish
to convert a Boolean and an 'a
value to an Idempotent<'a>
value.
Don't forget the purpose of this entire exercise. The benefit that the impureim sandwich architecture can bring is that it enables you to drain the impure parts of the sandwich of logic. Pure functions are intrinsically testable, so the more you define decisions and algorithms as pure functions, the more testable the code will be.
It's even better when you can make the testable functions generic, because reusable functions has the potential to reduce cognitive load. Once a reader learns and understands an abstraction, it stops being much of a cognitive load.
The functions to create and manipulate Idempotent<'a>
values should be covered by automated tests. The behaviour is quite trivial, though, so to increase coverage we can write the tests as properties. The code base in question already uses FsCheck, so I just went with that:
[<Property(QuietOnSuccess = true)>] let ``Idempotent.establish returns UpToDate`` (x : int) = let actual = Idempotent.establish x true UpToDate =! actual [<Property(QuietOnSuccess = true)>] let ``Idempotent.establish returns Update`` (x : string) = let actual = Idempotent.establish x false Update x =! actual
These two properties also use Unquote for assertions. The =!
operator means should equal, so you can read an expression like UpToDate =! actual
as UpToDate should equal actual.
This describes the entire behaviour of the establish
function, which is implemented this way:
// 'a -> bool -> Idempotent<'a> let establish x isUpToDate = if isUpToDate then UpToDate else Update x
About as trivial as it can be. Unsurprising code is good.
Fold #
The establish
function affords a way to create Idempotent
values. It'll also be useful with a function to get the value out of the container, so to speak. While you can always pattern match on an Idempotent
value, that'd introduce decision logic into the code that does that.
The goal is to cover as much decision logic as possible by tests so that we can leave the overall impureim sandwich as an untested declarative composition - a Humble Object, if you will. It'd be appropriate to introduce a reusable function (covered by tests) that can fulfil that role.
We need the so-called case analysis of Idempotent<'a>
. In other terminology, this is also known as the catamorphism. Since Idempotent<'a>
is isomorphic to option
(also known as Maybe), the catamorphism is also isomorphic to the Maybe catamorphism. While we expect no surprises, we can still cover the function with automated tests:
[<Property(QuietOnSuccess = true)>] let ``Idempotent.fold when up-to-date`` (expected : DateTimeOffset) = let actual = Idempotent.fold (fun _ -> DateTimeOffset.MinValue) expected UpToDate expected =! actual [<Property(QuietOnSuccess = true)>] let ``Idempotent.fold when update required`` (x : TimeSpan) = let f (ts : TimeSpan) = ts.TotalHours + float ts.Minutes let actual = Update x |> Idempotent.fold f 1.1 f x =! actual
The most common catamorphisms are idiomatically called fold
in F#, so that's what I called it as well.
The first property states that when the Idempotent
value is already UpToDate
, fold
simply returns the 'fallback value' (here called expected
) and the function doesn't run.
When the Idempotent
is an Update
value, the function f
runs over the contained value x
.
The implementation hardly comes as a surprise:
// ('a -> 'b) -> 'b -> Idempotent<'a> -> 'b let fold f onUpToDate = function | UpToDate -> onUpToDate | Update x -> f x
Both establish
and fold
are general-purpose functions. I needed one more specialised function before I could compose a workflow to create a Fusebit user if it doesn't exist.
Checking whether an issuer exists #
As I've previously mentioned, I'd already developed a set of modules to interact with the Fusebit API. One of these was a function to read an issuer. This Issuer.get
action returns a Task<Result<IssuerData, HttpResponseMessage>>
.
The Result
value will only be an Ok
value if the issuer exists, but we can't conclude that any Error
value indicates a missing resource. An Error
may also indicate a genuine HTTP error.
A function to translate a Result<IssuerData, HttpResponseMessage>
value to a Result<bool, HttpResponseMessage>
by examining the HttpResponseMessage
is just complex enough (cyclomatic complexity 3) to warrant unit test coverage. Here I just went with some parametrised tests rather than FsCheck properties.
The first test asserts that when the result is Ok
it translates to Ok true
:
[<Theory>] [<InlineData ("https://example.com", "DN", "https://example.net")>] [<InlineData ("https://example.org/id", "lga", "https://example.gov/jwks")>] [<InlineData ("https://example.com/id", null, "https://example.org/.jwks")>] let ``Issuer exists`` iid dn jwks = let issuer = { Id = Uri iid DisplayName = dn |> Option.ofObj PKA = JwksEndpoint (Uri jwks) } let result = Ok issuer let actual = Fusebit.issuerExists result Ok true =! actual
All tests here are structured according to the AAA formatting heuristic. This particular test may seem so obvious that you may wonder how there's actually any logic to test. Perhaps the next test throws a little more light on that question:
[<Fact>] let ``Issuer doesn't exist`` () = use resp = new HttpResponseMessage (HttpStatusCode.NotFound) let result = Error resp let actual = Fusebit.issuerExists result Ok false =! actual
How do we know that the requested issuer doesn't exist? It's not just any Error
result that indicates that, but a particular 404 Not Found
result. Notice that this particular Error
result translates to an Ok
result: Ok false
.
All other kinds of Error
results, on the other hand, should remain Error
values:
[<Theory>] [<InlineData (HttpStatusCode.BadRequest)>] [<InlineData (HttpStatusCode.Unauthorized)>] [<InlineData (HttpStatusCode.Forbidden)>] [<InlineData (HttpStatusCode.InternalServerError)>] let ``Issuer error`` statusCode = use resp = new HttpResponseMessage (statusCode) let expected = Error resp let actual = Fusebit.issuerExists expected expected =! actual
All together, these tests indicate an implementation like this:
// Result<'a, HttpResponseMessage> -> Result<bool, HttpResponseMessage> let issuerExists = function | Ok _ -> Ok true | Error (resp : HttpResponseMessage) -> if resp.StatusCode = HttpStatusCode.NotFound then Ok false else Error resp
Once again, I've managed to write a function more generic than its name implies. This seems to happen to me a lot.
In this context, what matters more is that this is another pure function - which also explains why it was so easy to unit test.
Composition #
It turned out that I'd now managed to extract all complexity to pure, testable functions. What remained was composing them together.
First, a couple of private helper functions:
// Task<Result<'a, 'b>> -> Task<Result<unit, 'b>> let ignoreOk x = TaskResult.map (fun _ -> ()) x // ('a -> Task<Result<'b, 'c>>) -> Idempotent<'a> -> Task<Result<unit, 'c>> let whenMissing f = Idempotent.fold (f >> ignoreOk) (task { return Ok () })
These only exist to make the ensuing composition more readable. Since they both have a cyclomatic complexity of 1, I found that it was okay to skip unit testing.
The same is true for the final composition:
let! comp = taskResult { let (issuer, identity, user) = gatherData dto let! issuerExists = Issuer.get client issuer.Id |> Task.map Fusebit.issuerExists let! userExists = User.find client (IdentityCriterion identity) |> TaskResult.map (not << List.isEmpty) do! Idempotent.establish issuer issuerExists |> whenMissing (Issuer.create client) do! Idempotent.establish user userExists |> whenMissing (User.create client) }
The comp
composition starts by gathering data from an incoming dto
value. This code snippet is part of a slightly larger Controller Action that I'm not showing here. The rest of the surrounding method is irrelevant to the present example, since it only deals with translation of the input Data Transfer Object and from comp
back to an IHttpActionResult
object.
After a little pure hors d'œuvre the sandwich arrives with the first impure actions: Retrieving the issuerExists
and userExists
values from the Fusebit API. After that, the sandwich does fall apart a bit, I admit. Perhaps it's more like a piece of smørrebrød...
I could have written this composition with a more explicit sandwich structure, starting by exclusively calling Issuer.get
and User.find
. That would have been the first impure layer of the sandwich.
As the pure centre, I could then have composed a pure function from Fusebit.issuerExists
, not << List.isEmpty
and Idempotent.establish
.
Finally, I could have completed the sandwich with the second impure layer that'd call whenMissing
.
I admit that I didn't actually structure the code exactly like that. I mixed some of the pure functions (Fusebit.issuerExists
and not << List.isEmpty
) with the initial queries by adding them as continuations with Task.map
and TaskResult.map
. Likewise, I decided to immediately pipe the results of Idempotent.establish
to whenMissing
. My motivation was that this made the code more readable, since I wanted to highlight the symmetry of the two actions. That mattered more to me, as a writer, than highlighting any sandwich structure.
I'm not insisting I was right in making that choice. Perhaps I was; perhaps not. I'm only reporting what motivated me.
Could the code be further improved? I wouldn't be surprised, but at this time I felt that it was good enough to submit to a code review, which it survived.
One possible improvement, however, might be to parallelise the two actions, so that they could execute concurrently. I'm not sure it's worth the (small?) effort, though.
Conclusion #
I'm always keen on examples that challenge the notion of the impureim sandwich architecture. Usually, I find that by taking a slightly more holistic view of what has to happen, I can make most problems fit the pattern.
The most common counterargument is that subsequent impure queries may depend on decisions taken earlier. Thus, the argument goes, you can't gather all impure data up front.
I'm sure that such situations genuinely exist, but I also think that they are rarer than most people think. In most cases I've experienced, even when I initially think that I've encountered such a situation, after a bit of reflection I find that I can structure the code to fit the functional core, imperative shell architecture. Not only that, but the code becomes simpler in the process.
This happened in the example I've covered in this article. Initially, I though that ensuring that a Fusebit user exists would involve a process as illustrated in the first of the above flowcharts. Then, after thinking it over, I realised that defining a simple discriminated union would simplify matters and make the code testable.
I thought it was worthwhile sharing that journey of discovery with others.