WHAT? THAT’S CRAZY!!!
Redfin has traditionally had Java on the backend and JS in the browser, so switching to server-side JS is a big change for us. We’ve thought long and hard about it, and we feel that switching over to a universal JS framework offers us several benefits:
We can render pages server-side. This offers a perceived improvement boost for users, and an actual boost for crawlers like GoogleBot. Our prior solution involves routing crawler requests to machines running PhantomJS. This approach has worked for us for years, but we decided to move away from it for two main reasons: 1) developers can accidentally introduce bugs that are only reproducible in Phantom without knowing it, and 2) it’s slower than we want, because it relies on our prior client-side JS framework, and uses an actual DOM.
We can easily parallelize data fetches server-side. We can fire off multiple HTTP requests for data, and do other things while we wait for the results. This is possible for us to do server-side in Java, but it’s pretty clunky.
We get to use React. This sounds kind of silly, but we’ve wanted to replace the framework we’ve been using to build our UI for years; there just hasn’t been an alternative that appeared to be worth the churn required to rewrite everything. When we started working on our isomorphic framework, we had several teams already experimenting with React in production, and we knew it was the right call.
We can do streaming page rendering. Our framework divides the page into sections that can all be rendered independently. If a timeout is hit server-side, we’ll force-render the page with whatever data is available but leave the initial HTTP connection open and stream data to the browser as it becomes available, taking advantage of React’s lifecycle to trigger component renders when their data changes.
Also, Speed. All of the previous points add up to a noticeable page load improvement for both users and crawlers. This is just all-around WIN.
We spent a lot of time figuring things out as we went along. We got a huge boost when Walk Score joined Redfin last fall. The Walk Score team had prior experience running Node in production, and brought to Redfin a new level of focus on performance and monitoring.
The following is an unordered list of the major takeaways from our transition.
Server-side JS is different than client-side JS. Globals are a much bigger problem. Uncaught exceptions are a bigger problem, because they take out the whole worker process, which might be serving multiple requests simultaneously. Debugging is harder. Node-inspector is a great library, but it has some quirks that can make the experience less than optimal.
Node processes crash. I’m pretty sure it’s inevitable. There will be a time where that thing that you thought couldn’t possibly be undefined in that one object is actually undefined, and an exception is thrown, and you forgot to catch it, and your child process crashes. We use domains to handle this as gracefully as possible, and also have statsd graphs of the number of cluster workers that have been killed. These kinds of errors also result in an email to a list that we monitor; we file and fix them as soon as possible.
Monitoring is key. We had statsd timing and logging baked into the framework from early on. We’re able to identify which components on the page are slow to render, and which pages are making the slowest data requests. We also have Grafana dashboards of traffic broken out by market, and request type (user or bot). When we roll a new page onto the framework, or make a change to a data endpoint, we can see the effect immediately, and roll it back if it isn’t positive.
React.render(…) is blocking. A frequently rendered component that takes 300+ms to render blocks Node from doing other things while the render is happening. When our service gets overloaded, we see render times spike while requests back up behind one another. We haven’t had a huge problem with this yet, but it’s something we monitor closely. One of my favorite graphs is the graph of CPU usage vs. render time:
React is still pretty cool. We’ve been using React for a little over a year, and the feedback is still positive. Rethinking old code to work in a React world isn’t always easy, but the code often comes out looking cleaner.
The cluster module is pretty sweet. The cluster module got an upgrade in io.js/node-0.12. In prior versions, the request distribution between child processes in the cluster wasn’t great, but that appears to be better now.
Library authors aren’t always as careful as they should be. We’ve experienced at least two issues that I’m aware of where a transitive dependency five layers down is updated in a non-compatible way, and promptly breaks us. One time the library author realized the issue before we could track it down and things magically improved; the other time we had builds failing left and right before we found and filed a bug and the author corrected it. Both of these issues were in transitive dependencies, not in libraries we’d explicitly depended on.
Library authors: please follow the guidelines outlined at http://semver.org/ and double-check that your changes don’t break anything! It’s surprising how often a small change can have unexpected ripple effects upstream.
The npm registry and Github are potential points of failure. The npm registry has had much better uptime recently than it has in the past, but it’s still a single point of failure — if the registry is down, you have three choices: 1) already have your dependencies checked in, 2) host a mirror of the repository in-house, or 3) wait until it’s back up to deploy. We chose option #2, building around npm_lazy. The mirror has been good to us, but Github had an issue the other day where downloads were failing, and some dependency way down the chain was downloading a tarball from github, which apparently isn’t covered by our mirror, so we were stuck while Github sorted things out.
Webpack mostly just works. We chose webpack for module bundling for the browser. We’ve run into issues here and there with some plugins, but it mostly works for us. We’ll be publishing more blog posts about this in the future.
There are so many choices! Npm is both blessing and a curse. For any problem that you want to solve, there are probably several options hosted on npm. Some of them are better than others, some are no longer maintained, and some of them you probably shouldn’t trust at all. Choose wisely.
And much, much more!
We have a lot more to say about the transition, but this post is long enough as it is. We’ll put up more information over the next few weeks about the specific benefits we’re seeing. If you are hoping to see some graphs you probably won’t be disappointed.