{"id":41508,"date":"2010-06-04T11:13:39","date_gmt":"2010-06-04T18:13:39","guid":{"rendered":"https:\/\/redfindevelop.wpengine.com\/blog\/devblog\/?p=368"},"modified":"2020-10-05T13:12:35","modified_gmt":"2020-10-05T20:12:35","slug":"evolving_a_new_analytical_platform_with_hadoop","status":"publish","type":"post","link":"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/","title":{"rendered":"Engineer-to-Engineer: Evolving a New Analytical Platform with Hadoop"},"content":{"rendered":"<p><strong> <\/strong><\/p>\n<p>In the second installment of our San Francisco series of <a href=\"http:\/\/www.facebook.com\/pages\/Engineer-to-Engineer-San-Francisco-Tech-Talks\/119013258113825\">engineer-to-engineer lectures<\/a>, <a href=\"http:\/\/twitter.com\/hackingdata\">Jeff Hammerbacher<\/a> described the challenges of building data-intensive, distributed applications and how using <ins datetime=\"2010-06-04T09:29\" cite=\"mailto:Gordon%20Brown\"><a href=\"http:\/\/hadoop.apache.org\/\">Hadoop<\/a><\/ins> saved the day at Facebook.\u00a0 Speaking to an audience of approximately thirty Hadoop experts and enthusiasts hailing from all around the Bay Area, the Valley, and even Seattle, he also discussed what\u2019s wrong with today\u2019s analytical platforms and what will shape the platform of the future.<\/p>\n<p>And Jeff should know.\u00a0 After studying Mathematics at Harvard and wearing a suit as a quantitative analyst on Wall Street, he conceived, built, and led the data team at Facebook.\u00a0 He then went on to start <a href=\"http:\/\/www.cloudera.com\/\">Cloudera<\/a>, the leader in commercializing Apache Hadoop, where he currently works as Chief Scientist and VP of Products.\u00a0 Jeff also served as Contributing Editor for a book: <a href=\"http:\/\/www.amazon.com\/Beautiful-Data-Stories-Elegant-Solutions\/dp\/0596157118\/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1274724626&amp;sr=8-1\">Beautiful Data: The Stories Behind Elegant Data Solutions<\/a>, the proceeds of which are split between Creative Commons and Sunlight Labs.<\/p>\n<p><strong>The Scoop on Hadoop<\/strong><\/p>\n<p><a href=\"http:\/\/hadoop.apache.org\/\">Hadoop<\/a> is an open source framework that enables data-intensive distributed applications to efficiently process gigantic amounts of data.\u00a0 It\u2019s an open source implementation of the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Mapreduce\">MapReduce<\/a> approach to processing data.\u00a0 MapReduce was invented at Google to deal with the massive quantities of data necessary to index the web. \u00a0There are two main components to the system: the Hadoop Distributed File System (HDFS) which stores and maintains data across many machines, and the MapReduce engine which processes the data.<\/p>\n<p>But the talk didn\u2019t really go into Hadoop internals &#8212; as Jeff pointed out, the documentation is readily available online.\u00a0 Rather, the talk was about how and why Hadoop will provide the foundation on which the next generation platform for analytics will be built.\u00a0 Making bold predictions about technology is hard.\u00a0 Jeff quoted Larry Ellison\u2019s quip that \u201cthe computer industry is the only industry that is more fashion-driven than women&#8217;s <a href=\"http:\/\/cloudcomputing.sys-con.com\/node\/694992\">fashion<\/a>.\u201d\u00a0 And yet, using real-world examples from his experience at Facebook, Jeff makes a compelling sell.<\/p>\n<p><strong>Bottlenecks, Costs, the Black Box, and the Kitchen Sink<\/strong><\/p>\n<p>A typical architecture for large-scale data analysis includes a data source, a data warehouse, ETL (aka: \u201cextract-transform-load\u201d; the step that gets data out of and into RDBMSs and converts source data to the data warehouse\u2019s format), and business intelligence and analytics systems \u2013 all of which are usually centered around relational databases. \u00a0However, Jeff stressed that a relational database is a <em>specialty<\/em> and not a <em>foundation<\/em>, arguing that the abstractions provided by them are no longer useful on their own for analytical data management.<\/p>\n<p>One reason is that over the past few years, there has been an explosion in data volume primarily originating from machine-generated logs. \u00a0By simply tweaking an Apache log, you can grow your data volume and complexity by several orders of magnitude.\u00a0 As we\u2019ll see in Facebook\u2019s case, their relational database approach simply didn\u2019t scale and they soon needed new tools to handle the load.<\/p>\n<p>Another point Jeff made was that the percentage of data that actually gets stored in a relational database is shrinking.\u00a0 What do you do with all the unstructured data (accounting for <a href=\"http:\/\/alexbarnett.net\/blog\/archive\/2007\/03\/06\/The-Expanding-Digital-Universe.aspx\">95% of the digital universe<\/a>) that doesn\u2019t necessarily make sense to persist relationally?\u00a0 Do you still need expensive relational data warehouses and proprietary boutique servers?\u00a0 Jeff\u2019s team at Facebook made a bet on commodity hardware which turned out to be a good move, ultimately pushing the complexity out of hardware and into the software layer.<\/p>\n<p>They also bet on open source data stores built by consumer web firms, arguing that web properties have the most representative problems: scalability and unstructured data management.\u00a0 Jeff stated that most production-quality data stores came from enterprise software firms in the mid-1990s, but now a growing percentage of the world\u2019s data is persisted in open source data stores.\u00a0 He also mentioned that a nice side-effect of adopting open source solutions is that it\u2019s much better to have a modular collection of open tools rather than an opaque abstraction.\u00a0 Why? \u00a0Because there\u2019s great benefit in being able to pick and choose solutions and understand what\u2019s going on under the hood.<\/p>\n<p>Jeff noted that another problem is that, in many cases, enterprise software does not service developers well. Many relational data warehouses simply just expose SQL; but to get real traction\/adoption from developers, you need more than that&#8230;\u00a0 You need open applications for analysis, not just a SQL interface.\u00a0 He feels that \u201cin addition, these data stores often expose a proprietary interface for application programming (e.g. PL\/SQL or TSQL), but not the full power of procedural programming.\u00a0 More programmer-friendly parallel dataflow languages await discovery, I think. \u00a0MapReduce is one (small) step in that direction.\u201d<\/p>\n<p>Where is this new platform going to come from?\u00a0 Any new platform must be centered around addressing these new user needs, which is hard to achieve by re-implementing an old spec in a new, clever way.\u00a0 He cautioned that implementing a new, successful cut of the ANSI SQL spec would be a real undertaking.\u00a0 Not only would it take ages before you had anything to show, but it would likely suffer the same scalability problems of previous implementations.<\/p>\n<p><strong>Facebook and Hadoop are Now Friends<\/strong><\/p>\n<p>Using Facebook as a real-world example, Jeff described the challenge of measuring how changes to the site improved or impaired user experience.\u00a0 Their original data analysis system featured source data living on a horizontally partitioned MySQL tier and a cron job running Python scripts that pinged stats back to a central MySQL database.\u00a0 The main problem with this setup was that it made intensive historical analysis difficult since the source data was spread over many machines and aggregating the data to the analytics database was a slow, inefficient process.\u00a0 Plus, when it barfed, it took three days to replay the edit logs in order to diagnose the problem.<\/p>\n<p>So Facebook hired a data warehouse engineer to build a 10TB Oracle warehouse.\u00a0 This worked for a bit and would\u2019ve been fine for small and medium-sized businesses, but ultimately didn\u2019t scale &#8212; particularly when they turned on impression logging which generated over 400GB of data on the first day!\u00a0 This quickly grew to 1TB of data per day in 2007.<\/p>\n<p>You might suggest that since disks are cheap, why not throw more storage at the warehouse?\u00a0 It turns out that, in addition to the problem of data volume, there was also a bottlenecking CPU utilization problem. The ETL process ended up taking more than a day to aggregate, import, and load the necessary data for analysis.\u00a0 Jeff went on to explain that proprietary ETL vendors have lots of downsides and generally don\u2019t scale well for large sets of databases (on the order of thousands, in Facebook\u2019s case).\u00a0 In addition, when \u201cwarts\u201d start to show up in proprietary vendors, the closed nature of the software prohibits developers from tinkering with the source to diagnose and resolve problems.<\/p>\n<p>Meanwhile, his team started to play with Hadoop on the side as an open source alternative.\u00a0 They got a Hadoop cluster to replace the data collection and processing tiers.\u00a0 So the new architecture still has multiple data sources (log files, MySQL) but is now fed into HDFS instead.\u00a0 Work is done via MapReduce and the artifacts are then published to Oracle RAC servers for consumption by business intelligence and analysis.\u00a0 It also simultaneously publishes results back to the MySQL tier.<\/p>\n<p style=\"text-align: center\"><img fetchpriority=\"high\" decoding=\"async\" class=\"size-full wp-image-369  aligncenter\" title=\"Data Flow Architecture at Facebook\" src=\"https:\/\/redfin.com\/blog\/devblog\/wp-content\/uploads\/sites\/3\/2010\/06\/facebook.png\" alt=\"Data Flow Architecture at Facebook\" width=\"571\" height=\"428\" \/><\/p>\n<p style=\"text-align: center\">From &#8220;<a href=\"http:\/\/www.slideshare.net\/royans\/facebooks-petabyte-scale-data-warehouse-using-hive-and-hadoop\">Facebook\u2019s Petabyte Scale Data Warehouse using Hive and Hadoop<\/a>&#8220;, slide 21<\/p>\n<p>Initially, this shift was met with a lot of resistance mostly because Hadoop is Java-based and, since the majority of Facebook\u2019s services were written in C++, the developers there weren\u2019t comfortable in Java. But it wasn\u2019t long before the new platform showed its strengths:<\/p>\n<ul>\n<li>Switching to this system greatly reduced latency because the ETL process is no longer done in flight \u2013 it\u2019s done after persistence in Hadoop.<\/li>\n<li>Hadoop enabled Facebook to efficiently crunch extremely large data sets on the order of multi-petabytes, previously impractical under the old system.<\/li>\n<li>The Hadoop data warehouse became easily accessible to developers which turned out to be a real bonus.\u00a0 Developers previously found SQL to be an unfriendly environment because they couldn\u2019t predict the impact of running SQL (it was easy for them to hose themselves and others) and because the dev environment for SQL was crude.\u00a0 After switching, however, they found that a lot of Facebook\u2019s developers started freely playing with the data set which fostered innovation and led to new features.<\/li>\n<\/ul>\n<p><strong>Shaping a New Platform<\/strong><\/p>\n<p>Jeff emphasized that while Hadoop provides a great foundation for data analysis, it\u2019s not the whole story.\u00a0 Today, there are many technologies built on top of Hadoop that need to be considered for your system.\u00a0 For example: there is Hive, a system for offline analysis; there is HBase, an open source implementation of Google&#8217;s BigTable to name a couple.\u00a0 He remarked that the abstraction layer needs to be redrawn to include the functionality provided by ETL, master data management (MDM), stream management, reporting, online analytical processing (OLAP), and search tools; all with a unified UI.<\/p>\n<p>Jeff explained that <a href=\"http:\/\/www.microsoft.com\/sqlserver\/2008\/en\/us\/r2.aspx\">SQL Server 2008 R2<\/a> is a good model.\u00a0 SQL Server is no longer just a database \u2013 there are a bunch of associated products in the box offering a full suite of features.\u00a0 You still have the old features like SQL Server Integration Services (ETL), SQL Server (data warehouse), SQL Server Reporting Services, SQL Server Analysis Services, and full-text search.\u00a0 But now you also get a bunch of new features such as stream management (StreamInsight) providing real-time analytics, OLAP (PowerPivot) enabling rapid navigation of subsets of data, collaboration via SharePoint, MDM for integrating disparate data sources and entity resolution, and features that aid in scaling your servers out to a many-node SQL solution.\u00a0 Jeff remarked that it\u2019s \u201ckind of scary that Microsoft has started to do a lot right within the last 5 years.\u201d<\/p>\n<p>Providing a full suite of features is also what Cloudera does well, but for Hadoop.\u00a0 They\u2019re not the primary developers of this stuff (currently only 3 out of 17 contributors on HDFS), but they do an excellent job at packaging and polishing Hadoop and make their money in training, services, and support.\u00a0 And, like Microsoft, they eat their own dogfood: using the tools they build to solve their own business problems.\u00a0 Jeff joked that it\u2019s \u201cinteresting being a vendor now \u2013 I can see what we put these other vendors through [while at Facebook].\u201d<\/p>\n<p>Many thanks to Jeff for the great talk, <a href=\"http:\/\/www.greylock.com\/\">Greylock<\/a> for helping with the logistics (and providing the delicious pizza and beer), and to everybody that came out!\u00a0 Be sure to check out the next talk on June 10<sup>th<\/sup> when our own Sasha Aickin, Redfin\u2019s head of user experience, will weigh <a href=\"http:\/\/www.facebook.com\/event.php?eid=114714258552626&amp;ref=mf\">HTML 5 vs. Native Apps<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the second installment of our San Francisco series of engineer-to-engineer lectures, Jeff Hammerbacher described the challenges of building data-intensive, distributed applications and how using Hadoop saved the day at Facebook.\u00a0 Speaking to an audience of approximately thirty Hadoop experts and enthusiasts hailing from all around the Bay Area, the Valley, and even Seattle, he [&hellip;]<\/p>\n","protected":false},"author":13212,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[57],"tags":[],"dashboard":[],"coauthors":[],"class_list":["post-41508","post","type-post","status-publish","format-standard","hentry","category-company-news"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v24.7 (Yoast SEO v27.6) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Engineer-to-Engineer: Evolving a New Analytical Platform with Hadoop - Redfin Real Estate News<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Engineer-to-Engineer: Evolving a New Analytical Platform with Hadoop\" \/>\n<meta property=\"og:description\" content=\"In the second installment of our San Francisco series of engineer-to-engineer lectures, Jeff Hammerbacher described the challenges of building data-intensive, distributed applications and how using Hadoop saved the day at Facebook.\u00a0 Speaking to an audience of approximately thirty Hadoop experts and enthusiasts hailing from all around the Bay Area, the Valley, and even Seattle, he [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/\" \/>\n<meta property=\"og:site_name\" content=\"Redfin Real Estate News\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/redfin\" \/>\n<meta property=\"article:published_time\" content=\"2010-06-04T18:13:39+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-10-05T20:12:35+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.redfin.com\/news\/wp-content\/uploads\/2008\/03\/rss1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"455\" \/>\n\t<meta property=\"og:image:height\" content=\"187\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Gordon Brown\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@redfin\" \/>\n<meta name=\"twitter:site\" content=\"@redfin\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Gordon Brown\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/evolving_a_new_analytical_platform_with_hadoop\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/evolving_a_new_analytical_platform_with_hadoop\\\/\"},\"author\":{\"name\":\"Gordon Brown\",\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/#\\\/schema\\\/person\\\/9e07f078899b60d3169de086d3bb0bb5\"},\"headline\":\"Engineer-to-Engineer: Evolving a New Analytical Platform with Hadoop\",\"datePublished\":\"2010-06-04T18:13:39+00:00\",\"dateModified\":\"2020-10-05T20:12:35+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/evolving_a_new_analytical_platform_with_hadoop\\\/\"},\"wordCount\":1815,\"commentCount\":223,\"publisher\":{\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/evolving_a_new_analytical_platform_with_hadoop\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/redfin.com\\\/blog\\\/devblog\\\/wp-content\\\/uploads\\\/sites\\\/3\\\/2010\\\/06\\\/facebook.png\",\"articleSection\":[\"Company News\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.redfin.com/news\\\/evolving_a_new_analytical_platform_with_hadoop\\\/#respond\"]}],\"copyrightYear\":\"2010\",\"copyrightHolder\":{\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/#organization\"}},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/evolving_a_new_analytical_platform_with_hadoop\\\/\",\"url\":\"https:\\\/\\\/www.redfin.com/news\\\/evolving_a_new_analytical_platform_with_hadoop\\\/\",\"name\":\"Engineer-to-Engineer: Evolving a New Analytical Platform with Hadoop - Redfin Real Estate News\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/evolving_a_new_analytical_platform_with_hadoop\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/evolving_a_new_analytical_platform_with_hadoop\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/redfin.com\\\/blog\\\/devblog\\\/wp-content\\\/uploads\\\/sites\\\/3\\\/2010\\\/06\\\/facebook.png\",\"datePublished\":\"2010-06-04T18:13:39+00:00\",\"dateModified\":\"2020-10-05T20:12:35+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/evolving_a_new_analytical_platform_with_hadoop\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.redfin.com/news\\\/evolving_a_new_analytical_platform_with_hadoop\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/evolving_a_new_analytical_platform_with_hadoop\\\/#primaryimage\",\"url\":\"https:\\\/\\\/redfin.com\\\/blog\\\/devblog\\\/wp-content\\\/uploads\\\/sites\\\/3\\\/2010\\\/06\\\/facebook.png\",\"contentUrl\":\"https:\\\/\\\/redfin.com\\\/blog\\\/devblog\\\/wp-content\\\/uploads\\\/sites\\\/3\\\/2010\\\/06\\\/facebook.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/evolving_a_new_analytical_platform_with_hadoop\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.redfin.com/news\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Engineer-to-Engineer: Evolving a New Analytical Platform with Hadoop\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/#website\",\"url\":\"https:\\\/\\\/www.redfin.com/news\\\/\",\"name\":\"Redfin Real Estate News\",\"description\":\"The latest real estate news and research from technology-powered residential real estate company, Redfin.\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.redfin.com/news\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/#organization\",\"name\":\"Redfin\",\"url\":\"https:\\\/\\\/www.redfin.com/news\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.redfin.com\\\/news\\\/wp-content\\\/uploads\\\/2020\\\/10\\\/Redfin-News-Logo.png\",\"contentUrl\":\"https:\\\/\\\/www.redfin.com\\\/news\\\/wp-content\\\/uploads\\\/2020\\\/10\\\/Redfin-News-Logo.png\",\"width\":1100,\"height\":235,\"caption\":\"Redfin\"},\"image\":{\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/redfin\",\"https:\\\/\\\/x.com\\\/redfin\",\"https:\\\/\\\/www.instagram.com\\\/redfinrealestate\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/redfin\",\"https:\\\/\\\/www.pinterest.com\\\/redfin\\\/\",\"https:\\\/\\\/en.wikipedia.org\\\/wiki\\\/Redfin\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.redfin.com/news\\\/#\\\/schema\\\/person\\\/9e07f078899b60d3169de086d3bb0bb5\",\"name\":\"Gordon Brown\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/2623c464a19391b964a400ac68800c70d618e53a6d17ad32474ee8dfb8776194?s=96&d=wp_user_avatar&r=g6494fd539cad7ea2d80762cede955ec0\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/2623c464a19391b964a400ac68800c70d618e53a6d17ad32474ee8dfb8776194?s=96&d=wp_user_avatar&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/2623c464a19391b964a400ac68800c70d618e53a6d17ad32474ee8dfb8776194?s=96&d=wp_user_avatar&r=g\",\"caption\":\"Gordon Brown\"},\"url\":\"https:\\\/\\\/www.redfin.com/news\\\/author\\\/gordon-brownredfin-com\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Engineer-to-Engineer: Evolving a New Analytical Platform with Hadoop - Redfin Real Estate News","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/","og_locale":"en_US","og_type":"article","og_title":"Engineer-to-Engineer: Evolving a New Analytical Platform with Hadoop","og_description":"In the second installment of our San Francisco series of engineer-to-engineer lectures, Jeff Hammerbacher described the challenges of building data-intensive, distributed applications and how using Hadoop saved the day at Facebook.\u00a0 Speaking to an audience of approximately thirty Hadoop experts and enthusiasts hailing from all around the Bay Area, the Valley, and even Seattle, he [&hellip;]","og_url":"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/","og_site_name":"Redfin Real Estate News","article_publisher":"https:\/\/www.facebook.com\/redfin","article_published_time":"2010-06-04T18:13:39+00:00","article_modified_time":"2020-10-05T20:12:35+00:00","og_image":[{"width":455,"height":187,"url":"https:\/\/www.redfin.com\/news\/wp-content\/uploads\/2008\/03\/rss1.jpg","type":"image\/jpeg"}],"author":"Gordon Brown","twitter_card":"summary_large_image","twitter_creator":"@redfin","twitter_site":"@redfin","twitter_misc":{"Written by":"Gordon Brown","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/#article","isPartOf":{"@id":"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/"},"author":{"name":"Gordon Brown","@id":"https:\/\/www.redfin.com\/news\/#\/schema\/person\/9e07f078899b60d3169de086d3bb0bb5"},"headline":"Engineer-to-Engineer: Evolving a New Analytical Platform with Hadoop","datePublished":"2010-06-04T18:13:39+00:00","dateModified":"2020-10-05T20:12:35+00:00","mainEntityOfPage":{"@id":"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/"},"wordCount":1815,"commentCount":223,"publisher":{"@id":"https:\/\/www.redfin.com\/news\/#organization"},"image":{"@id":"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/#primaryimage"},"thumbnailUrl":"https:\/\/redfin.com\/blog\/devblog\/wp-content\/uploads\/sites\/3\/2010\/06\/facebook.png","articleSection":["Company News"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/#respond"]}],"copyrightYear":"2010","copyrightHolder":{"@id":"https:\/\/www.redfin.com\/news\/#organization"}},{"@type":"WebPage","@id":"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/","url":"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/","name":"Engineer-to-Engineer: Evolving a New Analytical Platform with Hadoop - Redfin Real Estate News","isPartOf":{"@id":"https:\/\/www.redfin.com\/news\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/#primaryimage"},"image":{"@id":"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/#primaryimage"},"thumbnailUrl":"https:\/\/redfin.com\/blog\/devblog\/wp-content\/uploads\/sites\/3\/2010\/06\/facebook.png","datePublished":"2010-06-04T18:13:39+00:00","dateModified":"2020-10-05T20:12:35+00:00","breadcrumb":{"@id":"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/#primaryimage","url":"https:\/\/redfin.com\/blog\/devblog\/wp-content\/uploads\/sites\/3\/2010\/06\/facebook.png","contentUrl":"https:\/\/redfin.com\/blog\/devblog\/wp-content\/uploads\/sites\/3\/2010\/06\/facebook.png"},{"@type":"BreadcrumbList","@id":"https:\/\/www.redfin.com\/news\/evolving_a_new_analytical_platform_with_hadoop\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.redfin.com\/news\/"},{"@type":"ListItem","position":2,"name":"Engineer-to-Engineer: Evolving a New Analytical Platform with Hadoop"}]},{"@type":"WebSite","@id":"https:\/\/www.redfin.com\/news\/#website","url":"https:\/\/www.redfin.com\/news\/","name":"Redfin Real Estate News","description":"The latest real estate news and research from technology-powered residential real estate company, Redfin.","publisher":{"@id":"https:\/\/www.redfin.com\/news\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.redfin.com\/news\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.redfin.com\/news\/#organization","name":"Redfin","url":"https:\/\/www.redfin.com\/news\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.redfin.com\/news\/#\/schema\/logo\/image\/","url":"https:\/\/www.redfin.com\/news\/wp-content\/uploads\/2020\/10\/Redfin-News-Logo.png","contentUrl":"https:\/\/www.redfin.com\/news\/wp-content\/uploads\/2020\/10\/Redfin-News-Logo.png","width":1100,"height":235,"caption":"Redfin"},"image":{"@id":"https:\/\/www.redfin.com\/news\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/redfin","https:\/\/x.com\/redfin","https:\/\/www.instagram.com\/redfinrealestate\/","https:\/\/www.linkedin.com\/company\/redfin","https:\/\/www.pinterest.com\/redfin\/","https:\/\/en.wikipedia.org\/wiki\/Redfin"]},{"@type":"Person","@id":"https:\/\/www.redfin.com\/news\/#\/schema\/person\/9e07f078899b60d3169de086d3bb0bb5","name":"Gordon Brown","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/2623c464a19391b964a400ac68800c70d618e53a6d17ad32474ee8dfb8776194?s=96&d=wp_user_avatar&r=g6494fd539cad7ea2d80762cede955ec0","url":"https:\/\/secure.gravatar.com\/avatar\/2623c464a19391b964a400ac68800c70d618e53a6d17ad32474ee8dfb8776194?s=96&d=wp_user_avatar&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/2623c464a19391b964a400ac68800c70d618e53a6d17ad32474ee8dfb8776194?s=96&d=wp_user_avatar&r=g","caption":"Gordon Brown"},"url":"https:\/\/www.redfin.com\/news\/author\/gordon-brownredfin-com\/"}]}},"_links":{"self":[{"href":"https:\/\/www.redfin.com\/news\/wp-json\/wp\/v2\/posts\/41508","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.redfin.com\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.redfin.com\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.redfin.com\/news\/wp-json\/wp\/v2\/users\/13212"}],"replies":[{"embeddable":true,"href":"https:\/\/www.redfin.com\/news\/wp-json\/wp\/v2\/comments?post=41508"}],"version-history":[{"count":0,"href":"https:\/\/www.redfin.com\/news\/wp-json\/wp\/v2\/posts\/41508\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.redfin.com\/news\/wp-json\/wp\/v2\/media?parent=41508"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.redfin.com\/news\/wp-json\/wp\/v2\/categories?post=41508"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.redfin.com\/news\/wp-json\/wp\/v2\/tags?post=41508"},{"taxonomy":"dashboard","embeddable":true,"href":"https:\/\/www.redfin.com\/news\/wp-json\/wp\/v2\/dashboard?post=41508"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.redfin.com\/news\/wp-json\/wp\/v2\/coauthors?post=41508"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}