This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
View analytic
Tuesday, June 28 • 4:45pm - 5:03pm
How to keep your R code simple while tackling big datasets

Log in to save this to your schedule and see who's attending!

Like many statistical analytic tools, R can be incredibly memory intensive. A simple GAM (generalized additive model) or K-nearest neighbor routine can devour many multiples of memory size compared to the starting dataset. And, R doesn't always behave nicely when it runs out of memory.

There are techniques to get around memory limitations, like using partitioning tools or sampling down. But these require extra work. It would be really nice to run elegantly simple R analytics without that hassle.

Using a really big, public dataset, from CMS.gov, Chuck will show GAM, GLM, Decision Trees, Random Forest and K Nearest Neighbor routines that were prototyped and run on a laptop then run unchanged on a single simple Linux instance with over a Terabyte of RAM against the entire dataset. This big computer is actually a collection of smaller off-the-shelf servers using TidalScale to create a single, virtual server with several terabytes of RAM.

avatar for Gabriela de Queiroz

Gabriela de Queiroz

Developer Advocate, IBM
Gabriela de Queiroz is a Data and AI Developer Advocate at IBM CODAIT. She is the founder of R-Ladies (Global), a worldwide organization for promoting diversity in the R community with more than 100 chapters in 35+ countries. She runs the R-Ladies San Francisco chapter and serves... Read More →


Charles Arthur Piercey

TidalScale, Inc.

Attendees (143)