Name: ranger: A fast implementation of random forests for high dimensional data
Start: 2016-06-30T10:50:00-0700
End: 2016-06-30T11:08:00-0700

Click here to return to main conference site. For a one page, printable overview of the schedule, see this.

Back To Schedule

ranger: A fast implementation of random forests for high dimensional data

Random forests are widely used in applications, such as gene expression analysis, credit scoring, image processing or genome-wide association studies (GWAS). With currently available software, the analysis of high dimensional data is time-consuming or even impossible for very large datasets. We therefore introduce ranger, a fast implementation of random forests, which is particularly suited for high dimensional data. We describe the implementation, illustrate the usage with examples and compare runtime and memory usage with other implementations. ranger is available as standalone C++ application and R package. It is platform independent and designed in a modular fashion. Due to efficient memory management, datasets on genome-wide scale can be handled on a standard personal computer. We illustrate this by application to a real GWAS dataset. We show that ranger is a fast and memory efficient implementation of random forests to analyze high dimensional data. Compared with other implementations, the runtime of ranger proves to scale best with the number of features, samples, trees, and features tried for splitting.

Moderators

John Tamaresis

Biostatistician, Biomedical Data Science, Stanford University

Speakers

Marvin N. Wright

Universität zu Lübeck

Thursday June 30, 2016 10:50am - 11:08am PDT
Lane & Lyons & Lodato

Contributed talk, Statistics & Big Data

Attendees (114)

C
G
A
C
M
J
R
S
J
X
D
D
View All →

user2016

Log in to save this to your schedule, view media, leave feedback and see who's attending!

John Tamaresis

Marvin N. Wright

Attendees (114)