The tutorials will take place on 1011 July 2018. Click the tutorial for more information and register here.
Presenter  Title  Target audience  

Tuesday Morning 9:0012:30 Break 10:3011:00  
Paula Moraga  Disease risk modeling and visualization using R  People who are interested in health surveillance or any subject that deals with spatially referenced data  
In this tutorial we will learn how to estimate disease risk and quantify risk factors using areal and geostatistical data. We will also create interactive maps of disease risk and risk factors, and introduce presentation options such as interactive dashboards and shiny apps. We will work through two disease mapping examples using data of malaria in the Gambia and cancer in Pennsylvania, United States. We will focus on disease risk, but the approaches covered are also applicable to other fields such as climate, ecology or crime. We will cover the following topics:


Simon Jackson  Wrangling data in the Tidyverse  Beginnertointermediate R users who want to improve the daytoday quality and efficiency of their data wrangling skills  
This handson tutorial will help beginnertointermediate R users take their data wrangling skills to the next level with an introduction to the Tidyverse: a collection of data science packages including dplyr, tidyr, purrr, ggplot2, and more. Using provided data sets and practical examples, you will learn how to efficiently import, tidy, and transform data to more quickly focus on tasks like visualization and modeling. 

Elizabeth Stark  Productionready R: Getting started with R and docker  Experience with using the command line and running basic scripts is helpful but not necessary. Some prior exposure to docker, git and cloud computing is helpful but no in depth knowledge is required.  
We will present some realworld data science scenarios and use these as a basis to walk participants through the process of building and deploying Rdocker apps. Participants will gain experience in writing R scripts to run as standalone docker applications through examples, discussion and activities. We will provide code that can be used as a basis for participants' own projects. 

Scott Came  Applications with R and Docker  Attendees with some exposure to Docker interested in building multicontainer networked applications using Docker and R  
In this tutorial we will explore several "advanced" scenarios of using Docker and R together to ease deployment of R applications. Attendees will gain handson experience building and deploying docker images for Shiny, databases, plumber, and keras. We will also look at cloud deployment and scaling applications with Kubernetes. 

Przemyslaw Biecek  DALEX: Descriptive mAchine Learning EXplanations. Tools for exploration, validation and explanation of complex machine learning models  Applied data scientists, analysts interested in machine learning models.  
Complex machine learning models are frequently used in predictive modelling. There are a lot of examples for random forest like or boosting like models in medicine, finance, agriculture etc. In this workshop we will show why and how one would analyse the structure of the blackbox model. This will be a handson workshop with four parts. In each part there will be a short lecture (around 2025 minutes) and then time for practice and discussion (around 2025 min). * Introduction Here we will show what problems may arise from blind application of blackbox models. Also we will show situations in which the understanding of a model structure leads to model improvements, model stability and larger trust in the model. During the handson part we will fit few complex models (like xgboost, randomForest) with the mlr package and discuss basic diagnostic tools for these models. * Conditional Explainers In this part we will introduce techniques for understanding of marginal/conditional response of a model given a one two variables. We will cover PDP (Partial Dependence Plots) and ICE (Individual Conditional Expectations) packages for continuous variables and MPP (Merging Path Plot from factorMerger package) for categorical variables. * Local Explainers In this part we will introduce techniques that explain key factors that drive single model predictions. This covers Break Down plots for linear models (lm / glm) and treebased models (randomForestExplainer, xgboostExplainer) along with model agnostic approaches implemented in the live package (an extension of the LIME method). * Global Explainers In this part we will introduce tools for global analysis of the blackbox model, like variable importance plots, interaction importance plots and tools for model diagnostic. * Literature Staniak, Mateusz, and Przemysław Biecek. 2017. Live: Local Interpretable (ModelAgnostic) Visual Explanations. Sitko, Agnieszka, and Przemyslaw Biecek. 2017. FactorMerger: Hierarchical Algorithm for PostHoc Testing. https://github.com/MI2DataLab/factorMerger. Greenwell, Brandon M. 2017. “Pdp: An R Package for Constructing Partial Dependence Plots.” The R Journal 9 (1): 421–36. https://journal.rproject.org/archive/2017/RJ2017016/index.html. Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1): 44–65. doi:10.1080/10618600.2014.907095. Apley, Dan. 2017. ALEPlot: Accumulated Local Effects (Ale) Plots and Partial Dependence (Pd) Plots. https://CRAN.Rproject.org/package=ALEPlot. Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier.” In, 1135–44. ACM Press. doi:10.1145/2939672.2939778. Biecek, Przemyslaw. 2017. BreakDown: BreakDown Plots. https://CRAN.Rproject.org/package=breakDown. 

Hanjo Odendaal  The ultimate online collection toolbox: Combining RSelenium and Rvest  Intermediate R users looking to explore online data collection  
Rvest from Hadley Wickham has become the go to package for all online collection or website interaction (webscraping) tasks in R. Although the package is amazing, it is not able to interact with a webpage when the page is dynamically loaded through javascript. For the latter, we need to have a browser that we 'drive' around the website to collect/load and interact with objects. Welcome to Rselenium from John Harrison. The package provides the necessary tools that allows the user to drive a webbrowser, from R using script commands. In this tutorial, we will be looking at installing RSelenium, learning basic commands, look at javascript tips and how to play well with others like rvest. 

Tuesday Afternoon 1:305:00 Break 3:003:30  
Thomas Lumley  fasteR: ways to speed up R code  Intermediate R programmers interested in speeding up their code  
This workshop will cover some intermediate and advanced techniques for optimising R code. We will look at both processing speed and memory use, but will not cover converting your code into other languages (eg C). 

Heather Turner  Generalized Nonlinear Models using the gnm Package  People who wish to find out what generalized nonlinear models are and whether such models might be useful in their field of application  
The class of generalized linear models encompasses many tools commonly used in data analysis, including multiple linear regression, logistic regression, loglinear models, etc. But a linear predictor does not always capture the relationship we wish to model. Rather, a nonlinear predictor may provide a better description of the observed data, often with fewer and more interpretable parameters. This tutorial introduces the wider class of generalized nonlinear models (GNMs) and their implementation via the R package `gnm`. 

Kevin Kuo  Deep learning with TensorFlow and Keras  Anyone interested in deep learning  
We begin with a quick introduction of deep learning concepts, just enough to have a working vocabulary to facilitate construction of neural networks during the tutorial. The TensorFlow suite of R packages will be covered, including keras, tfestimators, and tfdatasets. Together with the participants, we build endtoend workflows to perform classification and regression tasks using neural networks. We discuss the data preprocessing needs specific to neural network models, architectural choices, and best practices. Examples will be chosen to span a wide range of interests, including learning on structured data, time series, and unstructured text and image data. 

Johann GagnonBartsch  Looking to clean your data? Learn how to Remove Unwanted Variation with R  Data analysis, Statistics, Bioinformatics and Computational Biology  
Highdimensional data often suffer from unwanted variation; for example, gene expression data commonly contain batch effects, and fMRI data commonly suffer from various systematic errors as well. Removing this unwanted variation while preserving the true signal in the data is essential to deriving the right scientific conclusions. A major complication, however, is that the factors causing the unwanted variation are often unknown and must be inferred from the data. In this tutorial we present the RUV (remove unwanted variation) package. RUV methods cover a range of approaches for removing unwanted variation depending on the purpose of the study: differential expression analysis, global data normalisation and visualisation, or classification. We also demonstrate an R shiny application that provides an overview of the methods, along with interactive options for data visualisation and method diagnostics. 

Stephanie Kovalchik  Statistical Models for Sport in R  Beginner to intermediate R users with an interest in sports  
The workshop will cover a number of skills and statistical models that are common in sports statistics and show how each can be implemented in R. The workshop will introduce participants to a range of R packages and real sports examples. 

Dale BryanBrown  Spatial modelling using ‘raster’ package  People interested in spatial modelling using satellite images, or other raster data sets. Worked examples will revolve around environmental modelling.  
The topics covered in this workshop include; an introduction to 1) using R as a GIS (5 minutes), 2) pointtype data (5 minutes), and 3) raster data (5 minutes). We will then build on the basic knowledge of using R as a GIS by creating raster data (10 minutes) and exploring its basic features (10 minutes). After that we will import raster data into R (10 minutes) and begin to manipulate its extent, resolution, projection and values (45 minutes) and visualise the data (15 minutes). After the group is comfortable with editing the features of raster data we will discuss summarising raster data using point data (45 minutes). Finally, we will use this summarised data in a simple generalised linear model (45 minutes). The outline in dotpoints follows. Times are rough estimates. 

Wednesday Morning 9:0012:30 Break 10:3011:00  
Julie Josse  Missing values imputation  People who want to know more about how dealing with missing values in their analysis and what is the available methods implemented  Basic knowledge of PCA and linear models are required  
The ability to easily collect and gather a large amount of data from different sources can be seen as an opportunity to better understand many processes. It has already led to breakthroughs in several application areas. However, due to the wide heterogeneity of measurements and objectives, these large databases often exhibit an extraordinary high number of missing values. Hence, in addition to scientific questions, such data also present some important methodological and technical challenges for data analyst. In this tutorial, we give an overview of the missing values literature as well as the recent improvements that caught the attention of the community due to their ability to handle large matrices with large amount of missing entries. We will illustrate the methods on medical, environmental and survey data. 

Carson Sievert  Interactive data visualization on the web with R  Anyone interested in interactive data visualization  
This tutorial teaches practical workflows for creating interactive web graphics which support common data analysis tasks. Through a series of examples, demos, exercises, and lecture, attendees will gain a foundation for navigating through common barriers of productivity associated with both the creation (e.g. startup cost, iteration cost, deadend cost) and distribution (e.g., deployment cost, scaling cost, latency cost) of interactive web graphics. 

Matteo Fasiolo  Quantile Generalized Additive Models: moving beyond Gaussianity  The attendees should have a basic understanding of regression models and of the basic concepts underlying statistics and machine learning (e.g. probability densities, quantiles, etc).  
Generalized Additive Models (GAMs) models are an extension of traditional parametric regression models, which have proved highly useful for both predictive and inferential purposes in a wide variety of scientific and commercial applications. One reason behind the popularity of GAMs is that they strike an interesting balance between flexibility and interpretability, while being able to handle large data sets. The mgcv R package is arguably the stateoftheart tool for fitting such models, hence the first half of this tutorial will introduce GAMs and mgcv, in the context of electricity demand forecasting. The second part of the tutorial will show how traditional GAMs can be extended to quantile GAMs, and how the latter can be fitted using the qgam R package. By the end of the tutorial the attendees should be able to build, fit and visualize traditional or quantile GAM models, using a combination of the mgcv, qgam and mgcViz R packages. This tutorial is aimed at a broad audience of statistical modellers, interested in using GAMs for predictive or inferential purposes. The models which will be presented in the tutorial have a very wide range of applicability, hence they should be of interest to practitioners in business intelligence, ecology, linguistics, epidemiology and geoscience to name a few. 

Maria Prokofieva  Follow Me: Introduction to social media analysis in R  The tutorial will aim at the broad range of participants from various backgrounds (business, academics, etc.)  
The tutorial will review a range of R packages in social media analysis in R and will aim at teaching general principles of working with social media platforms and analysing information there. The social media platform covered are Facebook, Twitter, Instagram and Youtube. Topics covered during the tutorial include: 1. Structure of the social media data (e.g. userrelated data, posting related data, hashtags) 2. Benefits and challenges working with social media data (textual/nontextual information, large data volume, API limitations, 3. Connecting to a social media platform (e.g. authentication) and downloading data 4. Data analysis of the profile information (e.g. followers, likes, dislikes, favorites  platform dependent) 5. Data analysis of textual information (e.g. user posts, comments, dynamics, sentiment analysis, word clouds, etc.) 6. Visualisation of the social media data. 

Tong He  xgboost and MXNet  TBA  
TBA 

Dirk Eddelbuettel  Extending R with C++: Motivation, Introduction and Examples  Beginning to intermediate users of R who want to go further and farther  
Rcpp has become the principal venue for extending R with compiled code. It makes it easy to extend R with C or C++ spanning the range from simple oneliners to larger routines and bindings of entire external libraries. We will motivate and introduce Rcpp as a natural extension to R that provides an easytouse and powerful interface. Helper functions and tools including RStudio will be used to easy creation of R extensions. Several examples will introduce basic use cases including writing code with RcppArmadillo which is the most widelyused package on top of Rcpp. This provides a natual bridge to the more recent RcppMLPACK package (which combines the MLPACK machine learning library with the Armadillo linear algebra library) from which we will study one or two examples. 