November 8, 2015 by Doug Wade

Updated on October 5th, 2020

When you’re developing new functionality on an existing website, especially on a hot new young fresh startup like Redfin, you can get really excited about moving fast and breaking things and really break things. I want to spend a little bit of time reflecting on a time that I broke things, and how load testing with Taurus helped me fix it.

What happened

A couple of months ago, I shipped a new feature to include links on our listing and property pages (an example). The change was mostly intended for SEM, to help search engines understand the relationship between various pages that we have on Redfin.com. Usually, when we ship a new feature, we have it behind a runtime feature toggle we call Bouncer, but I shipped some of my debugging code into production:

if (Bouncer.isOn(Feature.SMART_INTERLINKING) || true) {
  // CRUSH THE DB!!!
}

Usually, we would have gradually dialed up the feature, and discovered slowly but surely that the load this feature adds to our PostgreSQL cluster is untenable, but because of my debugging code, we saw the new controller go from not existing to taking a startling 15% of our total database SQL execution time overnight. I pushed a hotfix, and then had to quickly pull together a long-term solution. Because of the high visibility of the bug, I had a lot of developers suggesting less expensive ways to query our database, and since the “later” had come in my plan to Optimize Later, it was important to come up with the solution that would put the least load on the database and return results blazingly fast.

What I tested

By the end of my initial investigation, I had come up with 5 different solutions for how I could retrieve the data I needed from the database, some of which used a pre-computed mapping table, some of which used GIS queries, some of which used native SQL, some of which used criteria queries. I added 5 separate methods to my Spring MVC controller:

@Controller
public class ExpensiveComputationController {

  private @Inject ExpensiveComputationHelper helper;

  @RequestMapping(value = "/api/variant1", method = RequestMethod.GET)
  public doExpensiveComputation1() {
    return helper.expensiveVariant1();
  }

  @RequestMapping(value = "/api/variant2", method = RequestMethod.GET)
  public doExpensiveComputation2() {
    return helper.expensiveVariant2();
  }

// snip
}

Then, I needed to load test each of the variants. My first instinct was to use cURL to time the total time used to return a response. I found this fantastic article on how to time curl requests, and got a reasonable bellwether for how long each of variant takes:

curl -w "ntime: %{time_total}sn" https://redfintest.com/api/variant1

However, sending a single request at a time is not a particularly representative statistic for how effective a given method or query functions because a real production database would never be running just one query — for instance, Redfin has thousands of active, concurrent users while I’m writing this article. To mimic this behavior, we needed to use a load generator to generate a lot of concurrent requests to simulate a real-world-like test. There are a lot of load generation tools available, like Siege, Apache JMeter, The Grinder, and Gatling. At Redfin, we recommend Siege for load testing major architectural changes, since solutions like Apache JMeter just can’t generate enough load to replicate real production-like traffic, but I hoped to find a solution that was a little easier to use. I saw Taurus, which claimed I could write my tests once and then run them using JMeter AND Grinder AND Gatling by adding a DSL for the supported tools! Sweet!

How I tested it

Getting started with Taurus couldn’t have been easier. I followed this article, but I’ll also include the CliffNotes® version here. I started out by installing the dependency:

sudo pip install bzt

Then I created my test:

---
execution:
 concurrency: 25
 hold-for: 5m
 ramp-up: 1m
 scenario:
   requests:
     - url: https://redfintest.com/api/variant1?param1=val1&param2=val2
       method: GET
     - url: https://redfintest.com/api/variant1?param1=val3&param2=val4
       method: GET
     ...snip...

Let’s break that down line-by-line:

---
execution:
    concurrency: 25

I generated load for 25 concurrent users. This isn’t a particularly realistic number, but since I was generating the load from my laptop, and testing a locally-running process also running on my laptop, I was concerned that setting concurrent users too high would melt my laptop into my desk.

ramp-up: 1m

The test gives 1 minute of ramp-up time to Taurus to get 25 concurrent users making requests so that once the test starts, there are 25 actual concurrent users making requests. It’s much like how in stock car racing the race doesn’t start from a cold stop, but instead the pack circles the track for a few laps before the race starts.

hold-for: 5m
scenario:
  requests:
    - url: https://redfintest.com/api/variant1?param1=val1&param2=val2
      method: GET
    - url: https://redfintest.com/api/variant1?param1=val3&param2=val4
      method: GET

The test goes on for 5 minutes of testing, and, in this example, each user makes two requests in succession, to https://redfintest.com/api/variant1, before creating a new user and starting from the beginning.

Then, I fired up the load test:

bzt scenario.yml

And was so very pleased — all in all, from deciding to use Taurus to having the results of a real load test took me less than a half hour, and 6 minutes of that was spent actually running the test. Taurus installed all the JMeter dependencies for me, and after a short wait, I was greeted with a super awesome-looking terminal of charts and graphs and numbers:

How I groked the results

Taurus gave me lots of fantastic data about the amount of time it took to return results under load. The console output for one of the variants looks like this:

16:26:59 INFO: Samples count: 23740, 0.00% failures

16:26:59 INFO: Average times: total 0.348, latency 0.348, connect 0.000

16:26:59 INFO: Percentile 0.0%: 0.062
16:26:59 INFO: Percentile 50.0%: 0.317
16:26:59 INFO: Percentile 90.0%: 0.508
16:26:59 INFO: Percentile 95.0%: 0.670
16:26:59 INFO: Percentile 99.0%: 0.876
16:26:59 INFO: Percentile 99.9%: 1.227
16:26:59 INFO: Percentile 100.0%: 1.750

The top-line number, the samples count, was a reasonable approximation for how efficient a given variant was, since being able to process more requests in a fixed amount of time with a fixed number of concurrent requests means that each request must have been faster on average.

And all in all, the table of results that came out:

	1	2	3	4	5
samples	35,299	35,382	37,970	4,443	37,669
mean	0.234s	0.234s	0.218s	1.868s	0.219s
p0	0.039s	0.045s	0.029s	0.078s	0.040s
p50	0.220s	0.224s	0.207s	1.794s	0.210s
p90	0.301s	0.293s	0.287s	2.814s	0.282s
p95	0.351s	0.327s	0.333s	3.225s	0.322s
p99.5	0.718s	0.703s	0.695s	4.131s	0.689s
p99.9	1.182s	1.683s	0.849s	5.577s	0.961s
p100	1.722s	3.296s	2.049s	6.544s	2.041s

But those numbers are only half of what I needed to know to make the right decision — it tells me how fast it is for customers, but not necessarily how much load we are producing on the database. To get at the database statistics was more complicated, but thanks to some work by some other teams here at Redfin, most of the heavy lifting had already been done for me. Here at Redfin we use a C Port of Etsy’s StatsD, stored by Carbon, displayed by Graphite and collected by a Spring interceptor that sends data to StatsD every time we call the database. Through this magical setup, I was able to get at the second half of what I wanted to know without doing any additional setup — I just opened up my browser to the appropriate controller and voila!

Note that the order of variants on the graph is 1, 2, 2, 4, 5, 3 because I initially ran variant 2 when I should have run variant 3.

Based on the results we got, it was easy to tell that if we deployed variant 3 in place of variant 4 (the version I initially deployed), we could expect for 90% of users to render the component in a tenth the time that we were serving this component. Oh, and of course I shipped with automated end-to-end tests using Selenium to prevent pushing bad code again, but that’s a post for another day.

Load Testing with Taurus

What happened

What I tested

How I tested it

How I groked the results

Doug Wade

Leave a Comment

Find the right loan for the home you love

Follow Redfin

Be the first to see the latest real estate news:

Looking for tips and advice about buying, selling, and home improvement? Visit our blog!