This event has ended. Visit the official site or create your own event on Sched.

user2016

Click here to return to main conference site. For a one page, printable overview of the schedule, see this.

9:00am PDT

Dynamic Documents with R Markdown (Part 1)

This is an intermediate/advanced level tutorial on dynamic documents with R Markdown. It starts with the basic idea of literate programming as well as its role in reproducible research. Among all document formats that knitr supports, we will only focus on R Markdown (.Rmd). We will give an overview of existing output formats in rmarkdown, and explain how to customize them. We will show how to build new output format functions by extending exising formats. The packages tufte and bookdown will be used as examples. We will mention other applications related to R Markdown such as HTML widgets [Vaidyanathan et al., 2015], Shiny documents [Chang et al., 2015], and how to run code from other languages (C, C++, and so on).

For details, refer to tutorial description.

Speakers

Yihui Xie

Software Engineer, RStudio, PBC

Yihui Xie is a software engineer at RStudio. He earned his PhD from the Department of Statistics, Iowa State University. He has authored and co-authored several R packages, such as knitr, rmarkdown, bookdown, blogdown, and xaringan. He has published a number of books, including “Dynamic... Read More →

Monday June 27, 2016 9:00am - 10:15am PDT
SIEPR 130

Tutorial, Morning

9:00am PDT

Genome-wide association analysis and post-analytic interrogation with R (Part 1)

For complex traits, such as cardiometabolic disease, we increasingly recognize that the intergeneric space between protein coding genes (PCGs) contains highly ordered regulatory elements that control expression and function of PCGs and in themselves can be actively transcribed molecules. Indeed, over 50% of genome-wide association studies (GWAS) of complex traits identify single nucleotide polymorphisms (SNPs) that fall in intergenic regions and it is only recently becoming apparent that these regions are highly organized to perform specific functions. A next step in advancing precision medicine is careful and rigorous interrogation of the role of these regulatory elements, and their interplay with known PCGs and environmental factors, in the heritability of complex disease phenotypes. This tutorial focuses on analytic techniques and R tools designed to uncover these complex, and largely uncharacterized relationships.

For details, refer to tutorial description.

Speakers

Andrea Foulkes

Professor of Mathematics and Statistics, Mount Holyoke College

Monday June 27, 2016 9:00am - 10:15am PDT
Wallenberg Hall 124

Tutorial, Morning

9:00am PDT

Handling and analyzing spatial, spatiotemporal and movement data (Part 1)

The tutorial will introduce users to the different types of spatial data (points, lines, polygons, rasters) and demonstrate how they are read in R. It will also explain how time series data can be imported, handled and analyzed in R. Then, it will explain the different types of spatiotemporal data and trajectory data, and present ways of importing them and analyzing them.

For details, refer to tutorial description.

Speakers

Edzer Pebesma

University of Muenster

I lead the spatio-temporal modelling laboratory at the institute for geoinformatics, and am deputy head of institute. I hold a PhD in geosciences, and am interested in spatial statistics, environmental modelling, geoinformatics and GI Science, semantic technology for spatial analysis... Read More →

Monday June 27, 2016 9:00am - 10:15am PDT
SIEPR 120

Tutorial, Morning

9:00am PDT

Machine Learning Algorithmic Deep Dive (Part 1)

The goal of this tutorial is to provide participants with a deep understanding of four widely used algorithms in machine learning:Generalized Linear Model (GLM), Gradient Boosting Machine (GBM), Random Forest and Deep Neural Nets. This includes a deep dive into the algorithms in the abstract sense, and a review of the implementations of these algorithms available within the R ecosystem.

Due to their popularity, each of these algorithms have several implementations available in R. Each package author takes a unique approach to implementing the algorithm, and each package provides an overlapping, but not identical, set of model parameters available to the user. The tutorial will provide an in-depth analysis of how each of these algorithms were implemented in a handful of R packages for each algorithm.

After completing this tutorial, participants will have a understanding of how each of these algorithms work, and knowledge of the available R implementations and how they differ. The participants will understand, for example, why the xgboost package has, in less than a year, become one of the most popular GBM packages in R, even though the gbm R package has been around for years and has been widely used -- what are the implementation tricks used in xgboost that are not (yet) used in the gbm package? Or, why do some practioners in certain domains prefer the one implementation over another? We will answer these questions and more!

For details, refer to tutorial description.

Speakers

Erin LeDell

Chief Machine Learning Scientist, H2O.ai

Monday June 27, 2016 9:00am - 10:15am PDT
Campbell Rehearsal Hall

Tutorial, Morning

9:00am PDT

MoRe than woRds, Text and Context: Language Analytics in Finance with R (Part 1)

This tutorial surveys the technology and empirics of text analytics with a focus on nance applications. We present various tools of information extraction and basic text analytics. We survey a range of techniques of classication and predictive analytics, and metrics used to assess the performance of text analytics algorithms. We then review the literature on text mining and predictive analytics in nance, and its connection to networks, covering a wide range of text sources such as blogs, news, web posts, corporate lings, etc. We end with textual content presenting forecasts and predictions about future directions. The tutorial will use the R programming language throughout and present many hands-on examples.

For details, refer to tutorial description.

Speakers

Sanjiv Das

Terry Professor of Finance and Data Science, Santa Clara University

Text Analytics, FinTech, Network Risk.

Karthik Mokashi

Santa Clara University

I am pursuing my Master's at Santa Clara University with a focus on Data Science and Business Analytics. I am passionate about improving my knowledge of the different models and the accompanying tools, and how they are applied to diverse business questions. I am working on building... Read More →

Monday June 27, 2016 9:00am - 10:15am PDT
McDowell & Cranston

Tutorial, Morning

9:00am PDT

Never Tell Me the Odds! Machine Learning with Class Imbalances (Part 1)

This tutorial will provide an overview of using R to create effective predictive models in cases where at least one class has a low event frequency. These types of problems are often found in applications such as: click through rate prediction, disease prediction, chemical quantitative structure - activity modeling, network intrusion detection, and quantitative marketing. The session will step through the process of building, optimizing, testing, and comparing models that are focused on prediction. A case study is used to illustrate functionality.

For details, refer to tutorial description.

Speakers

Max

principal software engineer, Posit PBC

Max Kuhn is a software engineer at Posit PBC where he is working on improving R’s modeling capabilities and maintaining about 30 packages, including caret and tidymodels. He has a Ph.D. in Biostatistics. Max was a Senior Director of Nonclinical Statistics at Pfizer Global R&D and... Read More →

Monday June 27, 2016 9:00am - 10:15am PDT
Econ 140

Tutorial, Morning

9:00am PDT

Small Area Estimation with R (Part 1)

The tutorial will introduce different types of statistical methods for the analysis of survey data to produce estimates for small domains (sometimes termed ‘small areas’). This will include design-based estimators, that are only based on the study design and observed data, and model-based estimators, that rely on an underlying model to provide estimates. The tutorial will cover frequentist and Bayesian inference for Small Area Estimation. All methods will be accompanied by several examples that attendants will be able to reproduce.

This tutorial will be roughly based on the tutorial presented at useR! 2008 but will include updated materials. In particular, it will cover new R packages that have appeared since then.

For details, refer to tutorial description.

Speakers

Virgilio Gómez-Rubio

Universidad de Castilla-La Mancha

Monday June 27, 2016 9:00am - 10:15am PDT
Lane

Tutorial, Morning

9:00am PDT

Time-to-Event Modeling as the Foundation of Multi-Channel Revenue Attribution (Part 1)

In the realm of marketing analytics, time to event modeling at the customer level can provide a more granular view of the incremental impact that marketing campaigns have on individuals. Media that is addressable can be mapped to an individual, and even aggregated data can be mapped down to an individual via various techniques (i.e. geo, dma, etc.). To accurately assess the incremental effect of marketing, a primary task during modeling is not only to estimate the magnitude/amplitude of the marketing effect, but also to capture the differing decay rates that each specific one has.

This tutorial will describe the basic techniques of applying time-to-event statistical modeling techniques to marketing analytics problems. Beginning with data preparation, sampling, outlier detection and techniques to control for non-marketing effects, the tutorial will move on to consider various modeling strategies and methods for evaluating model effectiveness. The techniques and processes presented will mimic a typical marketing analytics workflow. We will be using a random sample from a (anonymized) large retail firm.

For details, refer to tutorial description.

Speakers

Tess Calvez

Neustar

Monday June 27, 2016 9:00am - 10:15am PDT
Barnes

Tutorial, Morning

9:00am PDT

Using Git and GitHub with R, Rstudio, and R Markdown (Part 1)

Data analysts can use the Git version control system to manage a motley assortment of project files in a sane way (e.g., data, code, reports, etc.). This has benefits for the solo analyst and, especially, for anyone who wants to communicate and collaborate with others. Git helps you organize your project over time and across different people and computers. Hosting services like GitHub, Bitbucket, andGitLab provide a home for your Git-based projects on the internet.

What's special about using R and Git(Hub)?

the active R package development community on GitHub
workflows for R scripts and R Markdown files that make it easy to share source and rendered results on GitHub
Git- and GitHub-related features of the RStudio IDE

For details, refer to tutorial description.

Speakers

Jenny Bryan

University of British Columbia, rOpenSci

Monday June 27, 2016 9:00am - 10:15am PDT
Lyons & Lodato

Tutorial, Morning

10:30am PDT

Dynamic Documents with R Markdown (Part 2)

This is an intermediate/advanced level tutorial on dynamic documents with R Markdown. It starts with the basic idea of literate programming as well as its role in reproducible research. Among all document formats that knitr supports, we will only focus on R Markdown (.Rmd). We will give an overview of existing output formats in rmarkdown, and explain how to customize them. We will show how to build new output format functions by extending exising formats. The packages tufte and bookdown will be used as examples. We will mention other applications related to R Markdown such as HTML widgets [Vaidyanathan et al., 2015], Shiny documents [Chang et al., 2015], and how to run code from other languages (C, C++, and so on).

For details, refer to tutorial description.

Speakers

Yihui Xie

Software Engineer, RStudio, PBC

Yihui Xie is a software engineer at RStudio. He earned his PhD from the Department of Statistics, Iowa State University. He has authored and co-authored several R packages, such as knitr, rmarkdown, bookdown, blogdown, and xaringan. He has published a number of books, including “Dynamic... Read More →

Monday June 27, 2016 10:30am - 12:00pm PDT
SIEPR 130

Tutorial, Morning

10:30am PDT

Genome-wide association analysis and post-analytic interrogation with R (Part 2)

For complex traits, such as cardiometabolic disease, we increasingly recognize that the intergeneric space between protein coding genes (PCGs) contains highly ordered regulatory elements that control expression and function of PCGs and in themselves can be actively transcribed molecules. Indeed, over 50% of genome-wide association studies (GWAS) of complex traits identify single nucleotide polymorphisms (SNPs) that fall in intergenic regions and it is only recently becoming apparent that these regions are highly organized to perform specific functions. A next step in advancing precision medicine is careful and rigorous interrogation of the role of these regulatory elements, and their interplay with known PCGs and environmental factors, in the heritability of complex disease phenotypes. This tutorial focuses on analytic techniques and R tools designed to uncover these complex, and largely uncharacterized relationships.

For details, refer to tutorial description.

Speakers

Andrea Foulkes

Professor of Mathematics and Statistics, Mount Holyoke College

Monday June 27, 2016 10:30am - 12:00pm PDT
Wallenberg Hall 124

Tutorial, Morning

10:30am PDT

Handling and analyzing spatial, spatiotemporal and movement data (Part 2)

The tutorial will introduce users to the different types of spatial data (points, lines, polygons, rasters) and demonstrate how they are read in R. It will also explain how time series data can be imported, handled and analyzed in R. Then, it will explain the different types of spatiotemporal data and trajectory data, and present ways of importing them and analyzing them.

For details, refer to tutorial description.

Speakers

Edzer Pebesma

University of Muenster

I lead the spatio-temporal modelling laboratory at the institute for geoinformatics, and am deputy head of institute. I hold a PhD in geosciences, and am interested in spatial statistics, environmental modelling, geoinformatics and GI Science, semantic technology for spatial analysis... Read More →

Monday June 27, 2016 10:30am - 12:00pm PDT
SIEPR 120

Tutorial, Morning

10:30am PDT

Machine Learning Algorithmic Deep Dive (Part 2)

The goal of this tutorial is to provide participants with a deep understanding of four widely used algorithms in machine learning:Generalized Linear Model (GLM), Gradient Boosting Machine (GBM), Random Forest and Deep Neural Nets. This includes a deep dive into the algorithms in the abstract sense, and a review of the implementations of these algorithms available within the R ecosystem.

Due to their popularity, each of these algorithms have several implementations available in R. Each package author takes a unique approach to implementing the algorithm, and each package provides an overlapping, but not identical, set of model parameters available to the user. The tutorial will provide an in-depth analysis of how each of these algorithms were implemented in a handful of R packages for each algorithm.

After completing this tutorial, participants will have a understanding of how each of these algorithms work, and knowledge of the available R implementations and how they differ. The participants will understand, for example, why the xgboost package has, in less than a year, become one of the most popular GBM packages in R, even though the gbm R package has been around for years and has been widely used -- what are the implementation tricks used in xgboost that are not (yet) used in the gbm package? Or, why do some practioners in certain domains prefer the one implementation over another? We will answer these questions and more!

For details, refer to tutorial description.

Speakers

Erin LeDell

Chief Machine Learning Scientist, H2O.ai

Monday June 27, 2016 10:30am - 12:00pm PDT
Campbell Rehearsal Hall

Tutorial, Morning

10:30am PDT

MoRe than woRds, Text and Context: Language Analytics in Finance with R (Part 2)

This tutorial surveys the technology and empirics of text analytics with a focus on nance applications. We present various tools of information extraction and basic text analytics. We survey a range of techniques of classication and predictive analytics, and metrics used to assess the performance of text analytics algorithms. We then review the literature on text mining and predictive analytics in nance, and its connection to networks, covering a wide range of text sources such as blogs, news, web posts, corporate lings, etc. We end with textual content presenting forecasts and predictions about future directions. The tutorial will use the R programming language throughout and present many hands-on examples.

For details, refer to tutorial description.

Speakers

Sanjiv Das

Terry Professor of Finance and Data Science, Santa Clara University

Text Analytics, FinTech, Network Risk.

Karthik Mokashi

Santa Clara University

I am pursuing my Master's at Santa Clara University with a focus on Data Science and Business Analytics. I am passionate about improving my knowledge of the different models and the accompanying tools, and how they are applied to diverse business questions. I am working on building... Read More →

Monday June 27, 2016 10:30am - 12:00pm PDT
McDowell & Cranston

Tutorial, Morning

10:30am PDT

Never Tell Me the Odds! Machine Learning with Class Imbalances (Part 2)

This tutorial will provide an overview of using R to create effective predictive models in cases where at least one class has a low event frequency. These types of problems are often found in applications such as: click through rate prediction, disease prediction, chemical quantitative structure - activity modeling, network intrusion detection, and quantitative marketing. The session will step through the process of building, optimizing, testing, and comparing models that are focused on prediction. A case study is used to illustrate functionality.

For details, refer to tutorial description.

Speakers

Max

principal software engineer, Posit PBC

Max Kuhn is a software engineer at Posit PBC where he is working on improving R’s modeling capabilities and maintaining about 30 packages, including caret and tidymodels. He has a Ph.D. in Biostatistics. Max was a Senior Director of Nonclinical Statistics at Pfizer Global R&D and... Read More →

Monday June 27, 2016 10:30am - 12:00pm PDT
Econ 140

Tutorial, Morning

10:30am PDT

Small Area Estimation with R (Part 2)

The tutorial will introduce different types of statistical methods for the analysis of survey data to produce estimates for small domains (sometimes termed ‘small areas’). This will include design-based estimators, that are only based on the study design and observed data, and model-based estimators, that rely on an underlying model to provide estimates. The tutorial will cover frequentist and Bayesian inference for Small Area Estimation. All methods will be accompanied by several examples that attendants will be able to reproduce.

This tutorial will be roughly based on the tutorial presented at useR! 2008 but will include updated materials. In particular, it will cover new R packages that have appeared since then.

For details, refer to tutorial description.

Speakers

Virgilio Gómez-Rubio

Universidad de Castilla-La Mancha

Monday June 27, 2016 10:30am - 12:00pm PDT
Lane

Tutorial, Morning

10:30am PDT

Time-to-Event Modeling as the Foundation of Multi-Channel Revenue Attribution (Part 2)

In the realm of marketing analytics, time to event modeling at the customer level can provide a more granular view of the incremental impact that marketing campaigns have on individuals. Media that is addressable can be mapped to an individual, and even aggregated data can be mapped down to an individual via various techniques (i.e. geo, dma, etc.). To accurately assess the incremental effect of marketing, a primary task during modeling is not only to estimate the magnitude/amplitude of the marketing effect, but also to capture the differing decay rates that each specific one has.

This tutorial will describe the basic techniques of applying time-to-event statistical modeling techniques to marketing analytics problems. Beginning with data preparation, sampling, outlier detection and techniques to control for non-marketing effects, the tutorial will move on to consider various modeling strategies and methods for evaluating model effectiveness. The techniques and processes presented will mimic a typical marketing analytics workflow. We will be using a random sample from a (anonymized) large retail firm.

For details, refer to tutorial description.

Speakers

Tess Calvez

Neustar

Monday June 27, 2016 10:30am - 12:00pm PDT
Barnes

Tutorial, Morning

10:30am PDT

Using Git and GitHub with R, Rstudio, and R Markdown (Part 2)

Data analysts can use the Git version control system to manage a motley assortment of project files in a sane way (e.g., data, code, reports, etc.). This has benefits for the solo analyst and, especially, for anyone who wants to communicate and collaborate with others. Git helps you organize your project over time and across different people and computers. Hosting services like GitHub, Bitbucket, andGitLab provide a home for your Git-based projects on the internet.

What's special about using R and Git(Hub)?

the active R package development community on GitHub
workflows for R scripts and R Markdown files that make it easy to share source and rendered results on GitHub
Git- and GitHub-related features of the RStudio IDE

For details, refer to tutorial description.

Speakers

Jenny Bryan

University of British Columbia, rOpenSci

Monday June 27, 2016 10:30am - 12:00pm PDT
Lyons & Lodato

Tutorial, Morning