How Hibernate’s Lazy Loading Nearly Killed Our Email

Updated on August 6th, 2019
Share on facebook
Share on twitter
Share on linkedin
Share on pinterest

Our Data Addiction

On a typical Saturday afternoon in the height of the summer home-buying season, Redfin will send millions of listing update emails and push notifications to our users.

One of our biggest advantages is our ability to send these notifications within moments of an event appearing in the Multiple Listing Service (MLS) feed, the data source real estate agents use to put new homes on the market. In hot markets like San Francisco or Seattle, this timing can be critical: putting in a strong offer within hours of the initial listing can make the difference between securing a dream home and losing it in a bidding war.

Many of our customers (hundreds of thousands in fact) opt to receive daily digests rather than instant notifications throughout the day. And when these daily emails — unique to each user — are generated in the morning, our machines can be kept busy for hours trying to fetch all the required data to send them out.

Under extreme conditions, the implications of this excessive workload are twofold:

  1. Other jobs (listing importers, search indexers, etc.) can get backed up while the daily email job hogs resources.
  2. Users’ emails can get delayed, potentially hours after they would normally expect them.

In early 2015 — at a time when our email jobs relied almost entirely on batch-oriented operations — we began to observe exactly these two frightening scenarios:

Formatting Time Per Email (pre-improvement)

This is the kind of a chart you don’t want to see. Our email formatting time had inexplicably inflated threefold in just two months, and our team had no internal changes to point to, no recent hardware modifications — nothing that could explain the dangerous upward trend.

And then we found our critical dependency.

The Insidious Helper

As it turned out, helper methods and data services that our team was using to build emails were also being used by other teams to render web pages. And these services were quickly evolving behind the scenes to accommodate the growing needs of Redfin.com: Shared Search, Book It Now, and more cool new features were requiring more and more data and thus incurring additional latency with every database access.

Typically an added latency of a few milliseconds would go unnoticed in the webapp–concealed by the much longer time it takes to download and render JavaScript on the client, but our email jobs were feeling the pain. Big time.

The problem originated from code like this:

home.isTourableByRedfin(user) // Can Redfin take this user on a tour of this home?

Because Redfin uses Hibernate to abstract away the SQL, we tend to not notice what’s really happening when we call this function — until we look.

While at the surface, the function was just checking some fields on the home and user objects to determine if the home was tourable, under the hood there was a lot more going on:

// Get this user’s Shared Search group.
SharedSearchGroup sharedSearchGroup = user.getSharedSearchGroup();

...

// Get agents for the Shared Search group.
Set<Agent> agents = new HashSet<>();
for (User cobuyer : sharedSearchGroup.getCobuyers()) {
    Person person = cobuyer.getPerson();
    TourRequest tourRequest = tourService.getPendingTour(person);
    agents.add(tourRequest.getAgent());
}

As Redfin had recently introduced Shared Search, we were no longer just interested in whether we could take you on a tour but if we could take your co-buyer on a tour too.

Unfortunately, every getter in the block of code above was lazily loading data via round trips to the database, thus incurring tremendous costs when executed on the order of hundreds of thousands of users at once.

The solution was to change the data service that prefetches our User objects to eagerly fetch the data we know the isTourableByRedfin method will need during email formatting.

To accomplish this we modify our data service to include Hibernate criteria that request that specific fields be eagerly fetched. For example:

// Return a Shared Search Group with the cobuyers eagerly fetched.
public List<SharedSearchGroup> getSharedSearchGroupsWithCobuyers(
    Collection<Long> sharedSearchGroupIds) {
        return getBlankCriteria()
            .createAlias("cobuyers", "c", JoinFragment.INNER_JOIN)
            .setFetchMode("c", FetchMode.JOIN)
            .add(Restrictions.in("id", ids))
            .list();
}

The above statement will fetch all the cobuyers in a SharedSearchGroup in addition to the SharedSearchGroup itself. By doing so, we avoid a round trip to the database when sharedSearchGroup.getCobuyers() is called since the cobuyers are already in memory.

When we rolled these changes out to production, the results were staggering:

Formatting Time Per Email (post-improvement)

Within minutes of deploy, we watched our formatting time plummet back to safe levels, exhibiting a 70% performance improvement on average. And most importantly, our customers were getting their notifications on time, right as we swung into the busiest time of the annual real estate cycle.

A Case for Isolation and Data Frugality

Walking away from this situation, it would be easy to argue in favor of eagerly fetching data wherever possible. After all, had Redfin’s data services done so, we never would have run into this performance wall in the first place.

However, we choose to fetch data lazily for a reason:

  • Doing so yields lower memory utilization because we’re only loading the fields we need.
  • Redfin’s database hardware–the storage and the connections–are fast. Like really fast. So the cost of making a round trip to the database is pretty small for the typical webapp use-case.
  • Lazy fetching prevents new fields that reference other tables from incurring unexpected costs due to new table joins.

Unfortunately, this approach seems at odds with the lessons learned from our brief, but frightening, ride on the email-formatting rollercoaster of Spring 2015.

The compromise is not to build out our own email-specific variants of Redfin helper methods but rather to minimize the scope of the database transaction. We fetch only the data we want, let Hibernate hydrate our Java POJOs, and then promptly close the database connection so that no secret return visits to the database may occur without us knowing it. (Should that happen, we would see a Hibernate error immediately in our test environment.)

From there — with the database connection severed — we pipe our minimally hydrated data through our notifications pipeline without having to worry that some other team’s new feature will mysteriously slow down our email as it did more than a year ago.

A final lesson here is the importance of regular, automated performance testing. While we may do our best to design a system resilient to external changes, we are still susceptible to the inevitable performance anomaly. Therefore, in the wake of this experience, our team has set up the instrumentation and alarming required to detect these anomalies as they happen — and act on them immediately — before our customers are impacted.

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
daniel-ehrmanredfin-com

I'm a full stack dev on Redfin's Notifications team. I come from a background in microprocessor design — before I happily traded annual tape-outs for daily deploys.

Email Daniel
Search for homes by state
Scroll to Top