Loading…
This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
Back To Schedule
Tuesday, June 28 • 11:42am - 12:00pm
FlashR: Enable Parallel, Scalable Data Analysis in R

Log in to save this to your schedule, view media, leave feedback and see who's attending!

In the era of big data, R is rapidly becoming one of the most popular tools forndata analysis. But the R framework is relatively slow and unablento scale to large datasets. The general approach of speeding up an implementation in R is to implement the algorithms in C or FORTRAN and provide an R wrapper. There are many works that parallelize R andnscale it to large datasets. For example, Revolution R Open parallelizes a limited set of matrix operations individually, which limits its performance. Others such as Rmpi and R-Hadoop exposes low-level programmingninterface to R users and require more explicit parallelization. It is challenging to provide a framework that has a high-level programming interface while achieving efficiency. FlashR is a matrix-oriented R programming framework that supports automatic parallelization and out-of-core execution for large datasets. FlashR reimplements matrix operations in the R base package and provides some generalized matrix operations to improve expressiveness. FlashR automatically fuses matrix operations to reduce data movement between CPU and disks. We implement machine learning algorithms such as Kmeans and GMM in FlashR to benchmark its performance. On a large parallelnmachine, both in-memory and out-of-core execution of these R implementations in FlashR significantly outperforms the ones in Spark Mllib. We believe FlashR significantly lowers the expertise for writing parallel and scalable implementations of machine learning algorithms and provides new opportunities for large-scale machine learning in R. FlashR is implemented as an R package and is released as open source (http://flashx.io/).

Moderators
DT

Duncan Temple Lang

University of California Davis

Speakers
DZ

Da Zheng

Johns Hopkins University
I'm a PhD student of computer science at Johns Hopkins University. I'm working on building large-scale data analysis frameworks, especially for graph analysis and machine learning.


Tuesday June 28, 2016 11:42am - 12:00pm PDT
SIEPR 130