REPL-able tests in Clojure
July 3, 2024
Tests play a central role to how I interact with Clojure code. And while I've slowly embraced a more REPL-driven development style over the years, the structure of my tests didn't really adapt to become more amiable to REPL interactions.
At Nextjournal I went about setting up the testing patterns for our main consulting project, feeling pretty content with the coverage and stability it brought. My colleagues on the other hand were slow to embrace the testing setup. Whittling away at their blocks, I got rid of a DSL layer and removed mocks that added more complexity than they were worth.
And most recently my colleague Martin Kavalar pointed out how difficult it was to interact with the tests via the REPL, encouraging me to explore a few things that might remedy this.
The changes Martin suggested were simple but upon realizing them I noticed a significant shift in how I interacted with test failures. Enough so that I wanted to write them up here.
The core idea: to view the REPL as a debugger.
As simple as that.
For me this definitely wasn't obvious; my daily REPL experience doesn't look like my days of using a step-debugger with Java code. So what are the implications of this "REPL as a debugger" idea and what changes to we need to make to enable this?
General structure of tests
First I want to describe the general shape of our tests.
For server-side code we mostly rely on whole-system tests, which I generally call integration tests, a bit out of laziness because they don't really test the integration between things. Compared to unit tests, which involve little to no state setup, these whole-system tests provide access to the entire server-side application. You can make HTTP requests to it, query or mutate the database, and other such effectful calls. Through these calls you build up state in a way similar to if a front-end client were making requests.
This is done by first structuring your code such that all state is encapsulated into a system of components, like a database component, a file store component, an email sending component, etc. When the application starts, it starts up these components, and you can swap in specific versions of components to adapt to whether you are in production, local dev, or test environments. For instance, in local dev and test environments you'll want to replace the email sending component with a stub so emails don't get sent during development or test execution.
Lastly, each whole-system test will spin up a fresh system on start-up such that the test can interact with it to exercise the flow of a complete feature. On failure or completion the system is torn down to ensure the next test can have a fresh system to interact with, thus ensuring tests don't work over systems that are in a random state. This is achieved via a with-system
macro that does the system startup and binds it to a variable for use in the with-system
body.
Debugging with the REPL
The REPL as a code-writing tool is a pretty common concept in the Clojure community, and it is framing with which I came to think of the REPL.
But what about the REPL as a debugging tool? In particular, as a tool for debugging test failures.
A debugger generally allows interaction at a point frozen in time. For tests, the point of most interest is probably around the time that of a failure.
Teardown for what?
Above I mentioned that each test starts a fresh system and tears it down on completion or failure. This means at the point of failure all you get is a pile of assertion failure messages and a shut down system. In the past I would sprinkle in some print statements and rerun the test to see if something would become clearer.
Martin was frustrated that in this setup one couldn't poke around with the REPL when there was a failure. He asked if it was always necessary to have a fresh system, and if it was, then why not leave it running until the next test starts?
I've always started a fresh system per test because it has a lower mental overhead to that of writing tests that are agnostic to state manipulations from other tests. As for leaving a system running after a test finished, the idea had never occurred to me, given I was so much in the mental box of "start a system, run the test, stop the system".
The idea of another system quietly lurking around initially weirded me out. But since so many things at Nextjournal that ran contrary to my past conditioning turned out well, I decided to see what a lurking test system would bring.
Let's check out a concrete example of a system, the adapted with-system
macro, and a test:
The simple-filestore-test
is run and with-system
starts a new system and makes it available at *system*
in its body. After the test finishes, with-system
keeps *system*
bound and leaves it running in the background, regardless of the test outcome.
This means you can run queries and execute logic using the state of the system as the test left it!
So when a test does fail, you can for example dig in further by putting your cursor at (contents *system* cat-filename)
and evaluating the form via the REPL.
Or check other things by typing new forms, like (list-files *system*)
for example.
This is all done in a manner that doesn't get in the way of starting a fresh system for each test, so we still get good test-level isolation.
I've been using this approach for several months now and it is a joy. I rerun tests less and have largely replaced print statements with REPL interactions.
Additional implications
With this setup it is often helpful to use def
s in the place of let
s to ensure that when you evaluate a form via the REPL, all the vars are in-scope.
For instance, in the example above I would normally let
-bind cat-filename
within the deftest
to show the reader that it is only relevant within the scope of the test.
Now I use a def
to ensure that (contents *system* cat-filename)
is REPL-able.
I write a bit more about inline defs and other REPL ergonomics in this post.
I did find one place where things can go wrong in this setup: Changing the tear-down code while there is a system running means that when you run a new test and the old system might have trouble being stopped. Additionally, this idea doesn't fit well with running tests in parallel.
Conclusion
In this post I share the simple idea of leaving a system running to allow for interacting with the state of the application at the point of a test failure. That, combined with other REPL ergonomics, end up going a long ways towards extending the reach of the REPL into the world of tests.
From a personal perspective, this journey was especially enjoyable because it came from visiting previously unchallenged assumptions of mine, and ultimately led to a deepening of how I interact with my tools. Such experiences are among the great humbling joys of programming and I'm especially grateful for my colleagues at Nextjournal that enabled this.
As mentioned, these explorations came out of suggestions from Martin Kavalar, as well as some critical perspective from Jack Rusher.