This past summer, I had an amazing experience working as a software engineering intern on Redfin’s Data team. My work was mainly focused on two projects: creating a pipeline to automatically import zip code region shape data into the database, and exploring the use of hashing to store low-resolution copies of all of the listing photos to reduce the time spent updating listing photo updates. My internship was by far one of the most educational experiences I have received outside of college, and made me realize how much effort was required to develop features at a well-established company as opposed to a university setting.
As part of my work on the Data team, I learned a lot about the tools Redfin uses to serve data on the site. In particular, I spent a fair amount of time with Hibernate, which is an object mapping framework that allows you to take data stored as objects and insert it in a SQL table, as well as perform the reverse operation of querying the table for rows that are then transformed into objects. On the Redfin website, when someone performs a search in a certain locality, one of the views they can use to locate neighborhoods are zip codes. At one point, some of our zip codes had started to become outdated, requiring someone to figure out which zips had changed, and how to update them on the site. For my Zip Code project, I wrote a process that automatically downloads zipped raw shape data from a remote server, performed various intermediate steps to map it into a table of updates, and queried the existing zip code data to find which zip codes changed, and finally map them to the appropriate zip code shapes that were imported at the beginning. Working on this project was a great introduction to the Redfin backend, and allowed for a lot of back and forth between myself, my mentor, and co-workers, especially when it came to writing the Mockito unit tests to guarantee my code actually worked.
The second project I worked on was primarily an in-depth investigation into how photos were downloaded and stored to be seen on the website. During the course of this project, I was able to have a lot of design discussions with my team that allowed me to choose the metrics I would use to measure how often the photos will be updated, and how long it will take for them to be downloaded from our data sources, with respect to each image type we had available. To find under what conditions the photo hashing would work, I implemented a prototype of the hashing concept in Python, and was able to explore how the metrics I defined earlier has changed from data source to data source. This ultimately identified our slowest photo import jobs that could benefit from a speedup by downloading photos from a remote server. And later I built a prototype to selectively add hashing to our image download pipeline. The project helped me understand that defining metrics and collecting data is one way to validate ideas. This is another valuable skill I learned at Redfin.
Overall, my time at Redfin was extremely memorable and fun. The other interns in the Seattle office were amazing, and getting to work with them and going on various excursions with them after work were some of the highlights of my summer. I enjoyed my time there so much, and I can’t wait to go back to Seattle.