Name: How to keep your R code simple while tackling big datasets
Start: 2016-06-28T16:45:00-0700
End: 2016-06-28T17:03:00-0700

Click here to return to main conference site. For a one page, printable overview of the schedule, see this.

Back To Schedule

How to keep your R code simple while tackling big datasets

Like many statistical analytic tools, R can be incredibly memory intensive. A simple GAM (generalized additive model) or K-nearest neighbor routine can devour many multiples of memory size compared to the starting dataset. And, R doesn't always behave nicely when it runs out of memory.

There are techniques to get around memory limitations, like using partitioning tools or sampling down. But these require extra work. It would be really nice to run elegantly simple R analytics without that hassle.

Using a really big, public dataset, from CMS.gov, Chuck will show GAM, GLM, Decision Trees, Random Forest and K Nearest Neighbor routines that were prototyped and run on a laptop then run unchanged on a single simple Linux instance with over a Terabyte of RAM against the entire dataset. This big computer is actually a collection of smaller off-the-shelf servers using TidalScale to create a single, virtual server with several terabytes of RAM.

Moderators

Gabriela de Queiroz

Sr. Developer Advocate/Manager, IBM

Gabriela de Queiroz is a Sr. Engineering & Data Science Manager and a Sr. Developer Advocate at IBM where she leads the CODAIT Machine Learning Team. She works in different open source projects and is actively involved with several organizations to foster an inclusive community. She... Read More →

Speakers

Chuck Piercey

KumoScale Product Management, Kioxia

B2B software product management & marketing. Writer: https://medium.com/@chuck1.piercey

Tuesday June 28, 2016 4:45pm - 5:03pm PDT
Barnes & McDowell & Cranston

Contributed talk, Analytics

Attendees (143)

C
L
R
G
S
T
A
C
S
G
N
M
K
A
M
View All →

user2016

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Gabriela de Queiroz

Chuck Piercey

Attendees (143)