This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
Back To Schedule
Wednesday, June 29 • 10:48am - 11:06am
ETL for medium data

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Packages provide users with software that extends the core functionality of R, as well as data that illustrates the use of that functionality. However, by design the type of data that can be contained in an R package on CRAN is limited. First, packages are designed to be small, so that the amount of data stored in a package is supposed to be less than 5 megabytes. Furthermore, these data are static, in that CRAN allows only monthly releases. Alternative package repositories -- such as GitHub -- are also limited in their ability to store and deliver data that could be changing in real-time to R users. The etl package provides a CRAN-friendly framework that allows R users to work with medium data in a responsible and responsive manner. It leverages the dplyr package to facilitate Extract-Load-Transfer (ETL) operations that bring real-time data into local or remote databases controllable by R users who may have little or no SQL experience. The suite of etl-dependent packages brings the world of medium data -- too big to store in memory, but not so big that it won't fit on a hard drive -- to a much wider audience.

avatar for Balasubramanian Narasimhan

Balasubramanian Narasimhan

Stanford University


Ben S Baumer

Smith College

Wednesday June 29, 2016 10:48am - 11:06am PDT