CassandraUnit (not) Reloaded: making Cassandra testing faster

A few weeks ago I wrote a post about using CassandraUnit to help you writing your integration tests against a conveniently embedded Cassandra instance.

I also expressed my preference for exposing a CassandraCQLUnit to the JUnit tests using the @Rule annotation, which allowed me to remain in control of the Cassandra test instance.

But then experience brought new insights into the equation…

It all started with a test… or 10!

I started using CassandraUnit for a real, medium-sized project. Little by little, I reached a certain amount of integration tests needing a test Cassandra server: 10 tests to be precise.

And I got a “too many open files” error!

A quick search on the net quickly provided me with an explanation. Apparently the OS limits the amount of files simultaneously open (I was working on OS X Yosemite that day).

To that was added an exception thrown by dependencies to Guava versions higher than 15:

Apparently the exception was handled in an unfortunate way, leaving Cassandra files open instead of closing them after the test. Hence the “too many open files” message…

An alternative solution

Knowing that, I was about to move back to Guava 15 and have a look a Datastax’s recommended settings, when suddenly something hit me: should I really agree on opening and closing a Cassandra data file for each test?

Instead, how about having my embedded test Cassandra run and build my keyspace once and for all, cleaning it after each test?

I decided it was worth a try…

Making the embedded Cassandra run once for all tests

I started by making sure that the cluster and session objects from the embedded Cassandra instance were being created only once, and made them accessible to all tests.

For that purpose, I decided to use a parent IntegrationTests class from which all my integration tests would be extended:

Every test class would extend this parent class and inherit not only the configuration-related annotations, but also all its accessible methods.

Those cluster and session objects were filled at the IntegrationTests class’ instantiation:

Part of this code was extracted from the load() method in the CassandraCQLUnit class (from the CassandraUnit distro). The CQLDataLoader is put to good use to set up our keyspace and column families.

Clean up when you’re done

Every test is in charge of inserting what data is needed for its purpose. Most times I did this by using JUnit‘s @Before annotation right inside each test class. Then after each test has run the data is removed, leaving the keyspace and the column families intact.

The pros of this approach is that I only insert what data is needed for my test suite to run. The cons is that the data is destroyed and re-created at every test. In my experience however, that is not as costly as having to create a new data file for every test!

So I made sure that the DB is cleaned after each test with a small method that truncates all tables known by the keyspace. I called it after each test, inside an @After-annotated method in each test class:

And so…

And so after these modifications I ran the tests and, boy, they executed just fine and at lightning speed!

Of course I will stick to Guava 15 for the moment, if only to prevent that error message to appear every time I run a test. But all things considered, I’ll think I’ll stick with this approach for running the embedded Cassandra test server.

Although I believe they are needed, integration tests cost more than unit tests in terms of execution time. So I consider that being able to keep their execution time down is worth the extra effort, don’t you think?