Loading…
This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
Thursday, June 30 • 10:50am - 11:08am
ranger: A fast implementation of random forests for high dimensional data

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Random forests are widely used in applications, such as gene expression analysis, credit scoring, image processing or genome-wide association studies (GWAS). With currently available software, the analysis of high dimensional data is time-consuming or even impossible for very large datasets. We therefore introduce ranger, a fast implementation of random forests, which is particularly suited for high dimensional data. We describe the implementation, illustrate the usage with examples and compare runtime and memory usage with other implementations. ranger is available as standalone C++ application and R package. It is platform independent and designed in a modular fashion. Due to efficient memory management, datasets on genome-wide scale can be handled on a standard personal computer. We illustrate this by application to a real GWAS dataset. We show that ranger is a fast and memory efficient implementation of random forests to analyze high dimensional data. Compared with other implementations, the runtime of ranger proves to scale best with the number of features, samples, trees, and features tried for splitting.

Moderators
avatar for John Tamaresis

John Tamaresis

Biostatistician, Biomedical Data Science, Stanford University

Speakers
avatar for Marvin N. Wright

Marvin N. Wright

Universität zu Lübeck


Thursday June 30, 2016 10:50am - 11:08am PDT
Lane & Lyons & Lodato