This is an intermediate/advanced level tutorial on dynamic documents with R Markdown. It starts with the basic idea of literate programming as well as its role in reproducible research. Among all document formats that knitr supports, we will only focus on R Markdown (.Rmd). We will give an overview of existing output formats in rmarkdown, and explain how to customize them. We will show how to build new output format functions by extending exising formats. The packages tufte and bookdown will be used as examples. We will mention other applications related to R Markdown such as HTML widgets [Vaidyanathan et al., 2015], Shiny documents [Chang et al., 2015], and how to run code from other languages (C, C++, and so on).
For details, refer to tutorial description.For complex traits, such as cardiometabolic disease, we increasingly recognize that the intergeneric space between protein coding genes (PCGs) contains highly ordered regulatory elements that control expression and function of PCGs and in themselves can be actively transcribed molecules. Indeed, over 50% of genome-wide association studies (GWAS) of complex traits identify single nucleotide polymorphisms (SNPs) that fall in intergenic regions and it is only recently becoming apparent that these regions are highly organized to perform specific functions. A next step in advancing precision medicine is careful and rigorous interrogation of the role of these regulatory elements, and their interplay with known PCGs and environmental factors, in the heritability of complex disease phenotypes. This tutorial focuses on analytic techniques and R tools designed to uncover these complex, and largely uncharacterized relationships.
For details, refer to tutorial description.The goal of this tutorial is to provide participants with a deep understanding of four widely used algorithms in machine learning:Generalized Linear Model (GLM), Gradient Boosting Machine (GBM), Random Forest and Deep Neural Nets. This includes a deep dive into the algorithms in the abstract sense, and a review of the implementations of these algorithms available within the R ecosystem.
Due to their popularity, each of these algorithms have several implementations available in R. Each package author takes a unique approach to implementing the algorithm, and each package provides an overlapping, but not identical, set of model parameters available to the user. The tutorial will provide an in-depth analysis of how each of these algorithms were implemented in a handful of R packages for each algorithm.
After completing this tutorial, participants will have a understanding of how each of these algorithms work, and knowledge of the available R implementations and how they differ. The participants will understand, for example, why the xgboost package has, in less than a year, become one of the most popular GBM packages in R, even though the gbm R package has been around for years and has been widely used -- what are the implementation tricks used in xgboost that are not (yet) used in the gbm package? Or, why do some practioners in certain domains prefer the one implementation over another? We will answer these questions and more!
For details, refer to tutorial description.The tutorial will introduce different types of statistical methods for the analysis of survey data to produce estimates for small domains (sometimes termed ‘small areas’). This will include design-based estimators, that are only based on the study design and observed data, and model-based estimators, that rely on an underlying model to provide estimates. The tutorial will cover frequentist and Bayesian inference for Small Area Estimation. All methods will be accompanied by several examples that attendants will be able to reproduce.
This tutorial will be roughly based on the tutorial presented at useR! 2008 but will include updated materials. In particular, it will cover new R packages that have appeared since then.
For details, refer to tutorial description.In the realm of marketing analytics, time to event modeling at the customer level can provide a more granular view of the incremental impact that marketing campaigns have on individuals. Media that is addressable can be mapped to an individual, and even aggregated data can be mapped down to an individual via various techniques (i.e. geo, dma, etc.). To accurately assess the incremental effect of marketing, a primary task during modeling is not only to estimate the magnitude/amplitude of the marketing effect, but also to capture the differing decay rates that each specific one has.
This tutorial will describe the basic techniques of applying time-to-event statistical modeling techniques to marketing analytics problems. Beginning with data preparation, sampling, outlier detection and techniques to control for non-marketing effects, the tutorial will move on to consider various modeling strategies and methods for evaluating model effectiveness. The techniques and processes presented will mimic a typical marketing analytics workflow. We will be using a random sample from a (anonymized) large retail firm.
For details, refer to tutorial description.Data analysts can use the Git version control system to manage a motley assortment of project files in a sane way (e.g., data, code, reports, etc.). This has benefits for the solo analyst and, especially, for anyone who wants to communicate and collaborate with others. Git helps you organize your project over time and across different people and computers. Hosting services like GitHub, Bitbucket, andGitLab provide a home for your Git-based projects on the internet.
What's special about using R and Git(Hub)?
This is an intermediate/advanced level tutorial on dynamic documents with R Markdown. It starts with the basic idea of literate programming as well as its role in reproducible research. Among all document formats that knitr supports, we will only focus on R Markdown (.Rmd). We will give an overview of existing output formats in rmarkdown, and explain how to customize them. We will show how to build new output format functions by extending exising formats. The packages tufte and bookdown will be used as examples. We will mention other applications related to R Markdown such as HTML widgets [Vaidyanathan et al., 2015], Shiny documents [Chang et al., 2015], and how to run code from other languages (C, C++, and so on).
For details, refer to tutorial description.For complex traits, such as cardiometabolic disease, we increasingly recognize that the intergeneric space between protein coding genes (PCGs) contains highly ordered regulatory elements that control expression and function of PCGs and in themselves can be actively transcribed molecules. Indeed, over 50% of genome-wide association studies (GWAS) of complex traits identify single nucleotide polymorphisms (SNPs) that fall in intergenic regions and it is only recently becoming apparent that these regions are highly organized to perform specific functions. A next step in advancing precision medicine is careful and rigorous interrogation of the role of these regulatory elements, and their interplay with known PCGs and environmental factors, in the heritability of complex disease phenotypes. This tutorial focuses on analytic techniques and R tools designed to uncover these complex, and largely uncharacterized relationships.
For details, refer to tutorial description.The goal of this tutorial is to provide participants with a deep understanding of four widely used algorithms in machine learning:Generalized Linear Model (GLM), Gradient Boosting Machine (GBM), Random Forest and Deep Neural Nets. This includes a deep dive into the algorithms in the abstract sense, and a review of the implementations of these algorithms available within the R ecosystem.
Due to their popularity, each of these algorithms have several implementations available in R. Each package author takes a unique approach to implementing the algorithm, and each package provides an overlapping, but not identical, set of model parameters available to the user. The tutorial will provide an in-depth analysis of how each of these algorithms were implemented in a handful of R packages for each algorithm.
After completing this tutorial, participants will have a understanding of how each of these algorithms work, and knowledge of the available R implementations and how they differ. The participants will understand, for example, why the xgboost package has, in less than a year, become one of the most popular GBM packages in R, even though the gbm R package has been around for years and has been widely used -- what are the implementation tricks used in xgboost that are not (yet) used in the gbm package? Or, why do some practioners in certain domains prefer the one implementation over another? We will answer these questions and more!
For details, refer to tutorial description.The tutorial will introduce different types of statistical methods for the analysis of survey data to produce estimates for small domains (sometimes termed ‘small areas’). This will include design-based estimators, that are only based on the study design and observed data, and model-based estimators, that rely on an underlying model to provide estimates. The tutorial will cover frequentist and Bayesian inference for Small Area Estimation. All methods will be accompanied by several examples that attendants will be able to reproduce.
This tutorial will be roughly based on the tutorial presented at useR! 2008 but will include updated materials. In particular, it will cover new R packages that have appeared since then.
For details, refer to tutorial description.In the realm of marketing analytics, time to event modeling at the customer level can provide a more granular view of the incremental impact that marketing campaigns have on individuals. Media that is addressable can be mapped to an individual, and even aggregated data can be mapped down to an individual via various techniques (i.e. geo, dma, etc.). To accurately assess the incremental effect of marketing, a primary task during modeling is not only to estimate the magnitude/amplitude of the marketing effect, but also to capture the differing decay rates that each specific one has.
This tutorial will describe the basic techniques of applying time-to-event statistical modeling techniques to marketing analytics problems. Beginning with data preparation, sampling, outlier detection and techniques to control for non-marketing effects, the tutorial will move on to consider various modeling strategies and methods for evaluating model effectiveness. The techniques and processes presented will mimic a typical marketing analytics workflow. We will be using a random sample from a (anonymized) large retail firm.
For details, refer to tutorial description.Data analysts can use the Git version control system to manage a motley assortment of project files in a sane way (e.g., data, code, reports, etc.). This has benefits for the solo analyst and, especially, for anyone who wants to communicate and collaborate with others. Git helps you organize your project over time and across different people and computers. Hosting services like GitHub, Bitbucket, andGitLab provide a home for your Git-based projects on the internet.
What's special about using R and Git(Hub)?