Loading…
This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
Keynote [clear filter]
Tuesday, June 28
 

9:00am PDT

Forty years of S

Bell Labs in the 1970s was a hotbed of research in computing, statistics and many other fields. The conditions there encouraged the growth of the S language and influenced its content. The 40th anniversary of S is an appropriate time to relate a personal view of that scene and reflect on why S (and R) turned out as it did.


Moderators
avatar for Trevor Hastie

Speakers
avatar for Richard Becker

Richard Becker

AT&T Labs - Research


Tuesday June 28, 2016 9:00am - 10:00am PDT
McCaw Hall

3:30pm PDT

Literate Programming
The speaker will discuss what he considers to be the most important outcome of his work developing TeX in the 1980s, namely the accidental discovery of a new approach to programming --- which caused a radical change in his own coding style. Ever since then, he has aimed to write programs for human beings (not computers) to read. The result is that the programs have fewer mistakes, they are easier to modify and maintain, and they can indeed be understood by human beings. This facilitates reproducible research, among other things.

Moderators
avatar for Susan Holmes

Susan Holmes

Professor, Statistics, Stanford
I like teaching nonparametric multivariate analyses to biologists. Reproducible research is really important to me and I make all my work available online, mostly as Rmd files. I still like to code, use Github and shiny as well as Bioconductor. I am trying to finish a book for biologists... Read More →

Speakers
avatar for Donald Knuth

Donald Knuth

Stanford University


Tuesday June 28, 2016 3:30pm - 4:30pm PDT
McCaw Hall
 
Wednesday, June 29
 

9:00am PDT

Towards a grammar of interactive graphics
I announced ggvis in 2014, but there has been little progress on it since. In this talk, I'll tell you a little bit about what I've been working on instead (data ingest, purrr, multiple models, ...) and tell you my plans for the future of ggvis. The goal is for 2016 to be the year of ggvis, and I'm going to be putting a lot of time into ggvis until it's a clear replacement for ggplot2. I'll talk about some of the new packages that will make this possible (including ggstat, ggeom, and gglayout), and how this work is also going to improve ggplot2.

Moderators
avatar for Karthik Ram

Karthik Ram

co-founder, rOpenSci
Karthik Ram is a co-founder of ROpenSci, and a data science fellow at the University of California's Berkeley Institute for Data Science. Karthik primarily works on a project that develops R-based tools to facilitate open science and access to open data.

Speakers
avatar for Hadley Wickham

Hadley Wickham

Chief Scientist, RStudio
Hadley is Chief Scientist at RStudio, winner of the 2019 COPSS award, and a member of the R Foundation. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (like the tidyverse, which includes... Read More →


Wednesday June 29, 2016 9:00am - 10:00am PDT
McCaw Hall

3:30pm PDT

Flexible and Interpretable Regression Using Convex Penalties

We consider the problem of fitting a regression model that is both flexible and interpretable. We propose two procedures for this task: the Fused Lasso Additive Model (FLAM), which is an additive model of piecewise constant fits; and Convex Regression with Interpretable Sharp Partitions (CRISP), which extends FLAM to allow for non-additivity. Both FLAM and CRISP are the solutions to convex optimization problems that can be efficiently solved. We show that FLAM and CRISP outperform competitors, such as sparse additive models (Ravikumar et al, 2009), CART (Breiman et al, 1984), and thin plate splines (Duchon, 1977), in a range of settings. We propose unbiased estimators for the degrees of freedom of FLAM and CRISP, which allow us to characterize their complexity.

This is joint work with Ashley Petersen and Noah Simon at University of Washington.



Moderators
RT

Rob Tibshirani

Stanford University
Robert Tibshirani is a Professor in the Departments of Statistics and Health Research and Policy at Stanford University. He received a B.Math. from the University of Waterloo, an M.Sc. from the University of Toronto and a Ph.D. from Stanford University. He was a Professor at the University... Read More →

Speakers
avatar for Daniela Witten

Daniela Witten

University of Washington


Wednesday June 29, 2016 3:30pm - 4:30pm PDT
McCaw Hall
 
Thursday, June 30
 

9:00am PDT

Statistical Thinking in a Data Science Course
The intuition and experience needed for sound statistics practice can be hard to learn, and a course that combines computing, statistics, and working with data offers an excellent learning environment in this regard. Moreover, an integrated approach to data science creates opportunities to reinforce statistical thinking skills throughout the full data analysis cycle, from data acquisition and cleaning to data organization and analysis to communicating results. As a result, students gain the ability to reason computationally, actively engage in statistical problem solving, and learn how to keep abreast of new technologies as they evolve. This talk describes approaches and provides examples for teaching data science in this integrated fashion.

Moderators
avatar for John Chambers

John Chambers

Stanford University

Speakers
avatar for Deborah Nolan

Deborah Nolan

Professor, UC Berkeley


Thursday June 30, 2016 9:00am - 10:00am PDT
McCaw Hall

2:00pm PDT

RCloud - Collaborative Environment for Visualization and Big Data Analytics
Analyzing Big Data in real life poses challenges with respect to performance, methodology and reusability. R is well known for its succinct syntax for analytic tasks as well as plethora of tools for data analysis and visualization, but it is not always associated with scalability. In this talk we will present a scalable environment that allows the use of R (and other languages) in a collaborative setting, enabling sharing, reusability and reproducibility. In addition, it opens new possibilities for visualization and interactive graphics by providing seamless integration of JavaScript and R. Finally, the distributed nature of the design allows us to provide R tools that allow out-of-core data processing interfacing different back-ends including Hadoop without sacrificing the ease of use of R. We will also show a flexible framework for developing distributed models in R while re-using as much of existing work as possible. As part of this talk we will illustrate the use of those tools on real data sets, including interactive visualization and distributed computing.

Moderators
avatar for Balasubramanian Narasimhan

Balasubramanian Narasimhan

Stanford University

Speakers
avatar for Simon Urbanek

Simon Urbanek

AT&T Labs - Research


Thursday June 30, 2016 2:00pm - 3:00pm PDT
McCaw Hall
 
Filter sessions
Apply filters to sessions.