This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
Back To Schedule
Wednesday, June 29 • 11:24am - 11:42am
Efficient in-memory non-equi joins using data.table

Log in to save this to your schedule, view media, leave feedback and see who's attending!

A join operation combines two (or more) tables on some shared columns based on a condition. An equi-join is a case where this combination condition is defined by the binary operator $==$. It is a special type of $\theta$-join which consists of the entire set of binary operators: {=, ==}. This talk presents the recent developments in the data.table package to extend its equi-join functionality to any/all of these binary operators very efficiently. For example, X[Y, on = .(X.a >= Y.a, X.b Y.a, X.b < Y.a)] performs a range join. Many databases are fully capable of performing both equi and non-equi joins. R/Bioconductor packages IRanges and GenomicRanges contain efficient implementations for dealing with interval ranges alone. However, so far, there are no direct in-memory R implementations of non-equi joins that we are aware of. We believe this is an extremely useful feature that a lot of R users can benefit from.

avatar for Balasubramanian Narasimhan

Balasubramanian Narasimhan

Stanford University

avatar for Arun  Srinivasan

Arun Srinivasan

Open Analytics

Wednesday June 29, 2016 11:24am - 11:42am PDT