This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
Back To Schedule
Thursday, June 30 • 1:00pm - 1:05pm
Hash Tables in R are Slow

Log in to save this to your schedule, view media, leave feedback and see who's attending!

An array hash is a cache-conscious data structure that takes advantage of hardware prefetchers for improved performance on large hash tables, those large enough to fit in main memory and larger than fast fixed size cpu caches.nnHowever, their implementation is a radical departure from standard chained hash tables. Rather than using chains of hash buckets for collision resolution, array hashes use segments of contiguous memory called dynamic arrays to store keys and values. Adding and deleting items from the hash involve copying the entire segment to new areas in memory. While this may seem wasteful and slow, it's surprisingly efficient in both time and space[2].nnIn R, hashed environments are implemented using lists with each list element (a CONS cell) acting as the hash bucket. The CONS cell is the binding agent for a symbol and value. Hashed environments are searched using the pointer address of the symbol rather than the symbol's printed name.nnR-Array-Hash takes advantage of this by implementing an integer array hash[1] to store addresses of symbols and their associated values. Care is also taken to account for whether or not a binding is locked, active, etc.nnSimilarly, R-Array-Hash reimplements R's string cache using a string array hash. This introduces the most radical change to R's API: CHAR() no longer returns an address that points to the area at the end of the SEXP. Rather it returns an address located in one of the contiguous dynamic arrays of the string hash table. Therefore, care must be taken in C code to use the address immediately since additions and deletions to the string hash could render the result of CHAR() useless. There are many areas of the code that sidestep this by calling translateChar(), which has been changed to always copy the string pointed by CHAR().

avatar for Julie Josse

Julie Josse


avatar for Jeffrey  Horner

Jeffrey Horner

Vanderbilt University Department of Biostatistics

Thursday June 30, 2016 1:00pm - 1:05pm PDT