Chaos in your APIs: defect injection to test your tests

(aka the lost art of "Bedbugging")

Eeww bedbugs

"Quis custodiet ipsos custodes" - Juvenal

Let's get meta: how to test the tests?

One of the most irksome questions I've come across in QA (Automation in general) is figuring out whether you're doing what you expect. In the QA realm, this is summed up as: are my tests (or manual testers) going to catch the bugs that I expect them to catch?

Colleagues and friends tell tales of that time they ripped out huge swathes of test code, only to discover that it didn't matter. Bugs would slip right through the cracks, defeating the entire purpose, because they weren't really testing anything but patience. It's like cargo cult science - if you write automated tests, bugs just disappear. Right?

So. Wrong. Nevermind that automation code isn't special; it rots just like other any other code, calling forth the supervillain of the software world: maintenence. Just the friction that's being added to the average developer's workflow results in negative behaviours, such as skipping tests when no one is watching.

Besides happenstance and heroism, I've wondered how one could programmatically address this problem (and slim down test suites in the process). I took inspiration from Chaos Engineering a la Netflix's Chaos Monkey to demonstrate one method: defect injection aka bedbugging.

Defect Injection demo - the setup

Let's start with the Conduit application a Medium clone and a suite of Cypress E2E tests (link to code is coming) that cover some basic functionality. To create defects, we're going to mangle the responses & response codes from the APIs that we hit during the Cypress tests. The expectation is that our tests should catch the errors that we've created, and/or we see a graceful partial failure in line with expectations. We'll use MITMProxy to make this a reality.

Since we only care about the APIs that are called during a test run, I've used the "automocker" plugin, or better yet, the "autorecord" plugin to identify the endpoints that we care about in each case. Note that there are a bunch of other ways to do this using traffic collection or proxies. Example of captured requests

Here's a graphic of the APIs called in each functional test: APIs we hit

Now, we'll fire up MITMProxy in it's standard HTTP mode. This was tricky for me, but I'm not good at this. Let me know if I should post my notes if that'd help you... Let's set up a filter to make it easier to see what we're doing:


MITMProxy can be scripted to do some pretty fascinating stuff automatically. We're going to keep it simple and just intercept requests:


Demonstrating Defect Injection for fun and profit

We're ready to rock. This screen recording shows how defect injection works with this setup.

(enlarge it to see the details!)

Here's what you're seeing:

    it("can see popular tags on homepage", function() {
      cy.get(".tag-list").find('.tag-pill').should('have.length.greaterThan', 1)

Note: if we'd just checked that the class "tag-list" has loaded, we'd have missed this error. Tag list is easy to miss

I'd argue that this feature isn't essential so the graceful failure is pretty solid, even without an error message in the UI.

Concluding musings

Hopefully the defect injection strategy to test your tests sparks your interest, too. There are some interesting parallels to Chaos Testing, though there's something nice about how this doesn't require you to actually take down services or machines - and also, it's great that this directly maps to the actual user experience. I imagine this would also have some interesting applications to root cause analysis or reproducing nasty bugs, since we have pretty good control and visibility at the API layer.

The next step here would be to automate the extraction of APIs hit by each E2E test, and programmatically mangling responses using MITMProxy's scripting facility. Also probably good to make the proxy setup a little bit easier or better documented. So do let me know if that would be useful to your organization so I feel like finishing it :P

One last thing: This doesn't just apply to automated testing; it can also be used to check that manual test scripts are working (and being run at all - a fear that many who outsource QA have). Turns out a similar technique, called "bedbugging", was employed to ensure that bored radar operators didn't miss rare events - spawning a popular software engineering test coverage technique. Implementing this with a manual test team, and learning from the fault seeding efforts of yesteryear, would be super interesting as well!


Did you like this? Want to discuss it? More ways to play are coming soon. But for now, email to start a conversation (and get into our Slack community)

Post History