Loading…
This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
Back To Schedule
Wednesday, June 29 • 10:48am - 11:06am
Size of Datasets for Analytics and Implications for R

Log in to save this to your schedule, view media, leave feedback and see who's attending!

With so much hype about "big data" and the industry pushing for distributed computing vs traditional single-machine tools, one wonders about the future of R. In this talk I will argue that most data analysts/data scientists don't actually work with big data the majority of the time, therefore using immature "big data" tools is in fact counter productive. I will show that contrary to widely-spread believes, the increase of dataset sizes used for analytics has been actually outpaced in the last 10 years by the increase in memory (RAM), making the use of single-machine tools ever more attractive. Furthermore, base R and several widely used R packages have undergone significant performance improvements (I will present benchmarks to quantify this), making R the ideal tool for data analysis on even relatively large datasets. In particular, R has access (via CRAN packages) to excellent high-performance machine learning libraries (benchmarks will be presented), while high-performance and parallel computing facilities have been part of the R ecosystem for many years. Nevertheless, the R community shall of course continue pushing the boundaries and extend R with new and ever more performant features.

Moderators
avatar for Dirk  Eddelbuettel

Dirk Eddelbuettel

Debian and R Projects

Speakers
avatar for Szilard  Pafka

Szilard Pafka

Chief Data Scientist, Epoch
Szilard studied Physics in the 90s and has obtained a PhD by using statistical methods to investigate the risk of financial portfolios. Next he has worked in a bank quantifying and managing market risk. About a decade ago he moved to California to become the Chief Scientist of a credit... Read More →


Wednesday June 29, 2016 10:48am - 11:06am PDT
Econ 140