This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
Back To Schedule
Wednesday, June 29 • 11:42am - 12:00pm
Exploring the R / SQL boundary

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Databases have a long history of delivering highly scalable solutions for storing, manipulating, and analyzing data, transaction processing and data warehousing, while R is the most widely used language for data analytics and machine learning due to its rich ecosystem of machine learning algorithms and data manipulation capabilities. But, when using these tools together, how do you decide how much processing to do in SQL before switching to R? In this talk, we will explore setting the R / SQL boundary under three scenarios: RODBC connections, dplyr data extractions, and in-database R processing, and examine the consequences of each of these approaches with respect to data exploration, feature engineering, modeling and predictions. We identify common performance killers such as excessive data movements and serial processing, and illustrate the techniques, with examples from both an open source database (Postgres) and a commercial database (Microsoft SQL Server).

avatar for Balasubramanian Narasimhan

Balasubramanian Narasimhan

Stanford University


Gopi Kumar

Microsoft Corporation

Wednesday June 29, 2016 11:42am - 12:00pm PDT