The tutorials will take place on 1011 July 2018. Click the tutorial for more information and register here.
There is a latebreaking change. Heather Turner will not be able to make it to Australia. Her tutorial notes are available at https://github.com/hturner/gnmdaycourse. We are lucky that Max Kuhn from RStudio has stepped in to provide an alternative tutorial for that time slot. Details are below.
Information on handling preferences: Thank you if you entered your tutorial preferences in our form. This has helped us keep tabs on numbers for each tutorial, and allocate presenters to room based on these numbers. We are now quite sure that we can handle all preferences with our room sizes. You can change your mind now! We are not checking the records of your preferences, you can simply go to the tutorial of your choice in the session. No need to enter any new preferences.
You will need your badge to get into the tutorial. It will be colourcoded depending on what session you have registered for.
Presenter  Title  Venue  Target audience  What to bring  

Tuesday Morning 9:0012:30 Break 10:3011:00  
Paula Moraga  Disease risk modeling and visualization using R  P8  People who are interested in health surveillance or any subject that deals with spatially referenced data  Please come to the tutorial with R and RStudio installed, and ensure you have installed the following packages:"dplyr", "ggplot2", "leaflet", "geoR", "rgdal", "raster", "sp", "spdep", "SpatialEpi", "SpatialEpiApp". The package rgdal may take a long time to install depending on the system so best done ahead of time. We will also need the R package INLA, install it typing this:
install.packages("INLA", repos = "https://inla.rinladownload.org/R/stable", dep = TRUE) 

In this tutorial we will learn how to estimate disease risk and quantify risk factors using areal and geostatistical data. We will also create interactive maps of disease risk and risk factors, and introduce presentation options such as interactive dashboards and shiny apps. We will work through two disease mapping examples using data of malaria in the Gambia and cancer in Pennsylvania, United States. We will focus on disease risk, but the approaches covered are also applicable to other fields such as climate, ecology or crime. We will cover the following topics:


Simon Jackson  Wrangling data in the Tidyverse  P11  Beginnertointermediate R users who want to improve the daytoday quality and efficiency of their data wrangling skills  Please remember to bring a laptop with R and RStudio installed. To speed things up, please also install the tidyverse package and familiarise yourself with RStudio projects. Data files will be made available online at the workshop.  
This handson tutorial will help beginnertointermediate R users take their data wrangling skills to the next level with an introduction to the Tidyverse: a collection of data science packages including dplyr, tidyr, purrr, ggplot2, and more. Using provided data sets and practical examples, you will learn how to efficiently import, tidy, and transform data to more quickly focus on tasks like visualization and modeling. 

Elizabeth Stark  Productionready R: Getting started with R and docker  P9  Experience with using the command line and running basic scripts is helpful but not necessary. Some prior exposure to docker, git and cloud computing is helpful but no in depth knowledge is required.  The tutorial will have handson components, so it would be great if you can preinstall Docker on your laptop using the instructions here https://docs.docker.com/install/ . Once you have it installed you can type `docker run helloworld` at a console prompt to test it (try `sudo` if it complains). We will be testing out the Rstudio containers, so when you install docker please run `docker pull rocker/studio` and `docker pull rocker/geospatial` to download those images prior. And don't worry if you don't have a laptop or can't get things installed prior  we can assist or you can work alongside someone else for the exercises.  
We will present some realworld data science scenarios and use these as a basis to walk participants through the process of building and deploying Rdocker apps. Participants will gain experience in writing R scripts to run as standalone docker applications through examples, discussion and activities. We will provide code that can be used as a basis for participants' own projects. 

Scott Came  Applications with R and Docker  P6  Attendees with some exposure to Docker interested in building multicontainer networked applications using Docker and R  Attendees should plan to complete Part 1 of the Docker Getting Started orientation at https://docs.docker.com/getstarted/ prior to the tutorial. And no worries if you are not planning to bring a laptop to the tutorial, as you can just pair up with someone else for the exercises.  
In this tutorial we will explore several "advanced" scenarios of using Docker and R together to ease deployment of R applications. Attendees will gain handson experience building and deploying docker images for Shiny, databases, plumber, and keras. We will also look at cloud deployment and scaling applications with Kubernetes. 

Przemyslaw Biecek  DALEX: Descriptive mAchine Learning EXplanations. Tools for exploration, validation and explanation of complex machine learning models  P10  Applied data scientists, analysts interested in machine learning models.  Please bring a laptop with R and following libraries installed via CRAN install.packages(c("DALEX", "breakDown", "live", "auditor", "randomForest", "ceterisParibus")) 

Complex machine learning models are frequently used in predictive modelling. There are a lot of examples for random forest like or boosting like models in medicine, finance, agriculture etc. ascending order In this workshop we will show why and how one would analyse the structure of the blackbox model. This will be a handson workshop with four parts. In each part there will be a short lecture (around 2025 minutes) and then time for practice and discussion (around 2025 min). * Introduction Here we will show what problems may arise from blind application of blackbox models. Also we will show situations in which the understanding of a model structure leads to model improvements, model stability and larger trust in the model. During the handson part we will fit few complex models (like xgboost, randomForest) with the mlr package and discuss basic diagnostic tools for these models. * Conditional Explainers In this part we will introduce techniques for understanding of marginal/conditional response of a model given a one two variables. We will cover PDP (Partial Dependence Plots) and ICE (Individual Conditional Expectations) packages for continuous variables and MPP (Merging Path Plot from factorMerger package) for categorical variables. * Local Explainers In this part we will introduce techniques that explain key factors that drive single model predictions. This covers Break Down plots for linear models (lm / glm) and treebased models (randomForestExplainer, xgboostExplainer) along with model agnostic approaches implemented in the live package (an extension of the LIME method). * Global Explainers In this part we will introduce tools for global analysis of the blackbox model, like variable importance plots, interaction importance plots and tools for model diagnostic. * Literature Staniak, Mateusz, and Przemysław Biecek. 2017. Live: Local Interpretable (ModelAgnostic) Visual Explanations. Sitko, Agnieszka, and Przemyslaw Biecek. 2017. FactorMerger: Hierarchical Algorithm for PostHoc Testing. https://github.com/MI2DataLab/factorMerger. Greenwell, Brandon M. 2017. “Pdp: An R Package for Constructing Partial Dependence Plots.” The R Journal 9 (1): 421–36. https://journal.rproject.org/archive/2017/RJ2017016/index.html. Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1): 44–65. doi:10.1080/10618600.2014.907095. Apley, Dan. 2017. ALEPlot: Accumulated Local Effects (Ale) Plots and Partial Dependence (Pd) Plots. https://CRAN.Rproject.org/package=ALEPlot. Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier.” In, 1135–44. ACM Press. doi:10.1145/2939672.2939778. Biecek, Przemyslaw. 2017. BreakDown: BreakDown Plots. https://CRAN.Rproject.org/package=breakDown. 

Hanjo Odendaal  The ultimate online collection toolbox: Combining RSelenium and Rvest  P7  Intermediate R users looking to explore online data collection  Installation details are at https://bit.ly/2Moe5nH  
Rvest from Hadley Wickham has become the go to package for all online collection or website interaction (webscraping) tasks in R. Although the package is amazing, it is not able to interact with a webpage when the page is dynamically loaded through javascript. For the latter, we need to have a browser that we 'drive' around the website to collect/load and interact with objects. Welcome to Rselenium from John Harrison. The package provides the necessary tools that allows the user to drive a webbrowser, from R using script commands. In this tutorial, we will be looking at installing RSelenium, learning basic commands, look at javascript tips and how to play well with others like rvest. 

Tuesday Afternoon 1:305:00 Break 3:003:30  
Thomas Lumley  fasteR: ways to speed up R code  P10  Intermediate R programmers interested in speeding up their code  See instructions at https://github.com/tslumley/useRfasteR  
This workshop will cover some intermediate and advanced techniques for optimising R code. We will look at both processing speed and memory use, but will not cover converting your code into other languages (eg C). 

Max Kuhn  Recipes for Data Processing  P9  The tutorial is for people who do feature engineering or need to include preprocessing with their models.  The audience should include people who do feature engineering or need to include preprocessing with their models. From a technical standpoint, some experience in modeling and R is a good idea. Basic tidyverse syntax will be reviewed. The materials will be added a few days before the tutorial. To install the required packages: 'AmesHousing', 'broom', 'kknn', 'recipes', 'rsample', 'tidyverse', 'yardstick', 'caret'. Notes are available https://github.com/topepo/user2018  
R has an excellent framework for specifying models using formulas. While elegant and useful, it was designed in a time when models had small numbers of terms and complex preprocessing of data was not commonplace. As such, it has some limitations. In this tutorial, a new package called `recipes` is shown where the specification of model terms and preprocessing steps can be enumerated sequentially. The recipe can be estimated and applied to any dataset. Current options include simple transformations (log, BoxCox, interactions, dummy variables, ...), signal extraction (PCA, PLS, ICA, MDS, ...), basis functions (splines, polynomials, ...), imputation methods, and others. An example is used to demonstrate the functionality. 

Kevin Kuo  Deep learning with TensorFlow and Keras  P11  Anyone interested in deep learning  TBA  
We begin with a quick introduction of deep learning concepts, just enough to have a working vocabulary to facilitate construction of neural networks during the tutorial. The TensorFlow suite of R packages will be covered, including keras, tfestimators, and tfdatasets. Together with the participants, we build endtoend workflows to perform classification and regression tasks using neural networks. We discuss the data preprocessing needs specific to neural network models, architectural choices, and best practices. Examples will be chosen to span a wide range of interests, including learning on structured data, time series, and unstructured text and image data. 

Johann GagnonBartsch  Looking to clean your data? Learn how to Remove Unwanted Variation with R  P8  Data analysis, Statistics, Bioinformatics and Computational Biology  TBA  
Highdimensional data often suffer from unwanted variation; for example, gene expression data commonly contain batch effects, and fMRI data commonly suffer from various systematic errors as well. Removing this unwanted variation while preserving the true signal in the data is essential to deriving the right scientific conclusions. A major complication, however, is that the factors causing the unwanted variation are often unknown and must be inferred from the data. In this tutorial we present the RUV (remove unwanted variation) package. RUV methods cover a range of approaches for removing unwanted variation depending on the purpose of the study: differential expression analysis, global data normalisation and visualisation, or classification. We also demonstrate an R shiny application that provides an overview of the methods, along with interactive options for data visualisation and method diagnostics. 

Stephanie Kovalchik  Statistical Models for Sport in R  P6  Beginner to intermediate R users with an interest in sports  Please bring a laptop with R installed and install the following libraries via CRAN: dplyr, tidyr, ggplot2, rvest, jsonlite, stringr, mgcv, rjags, BradleyTerry2, lubridate, pitchRx. These additional libraries should be installed via github using devtools: Rselenium (https://github.com/ropensci/RSelenium), deuce (https://github.com/skoval/deuce). There will also be part of the Web scraping material that will require Docker, which you can install here: https://docs.docker.com/install/.  
The workshop will cover a number of skills and statistical models that are common in sports statistics and show how each can be implemented in R. The workshop will introduce participants to a range of R packages and real sports examples. 

Dale BryanBrown and Brett Parker  Spatial modelling using ‘raster’ package  P7  People interested in spatial modelling using satellite images, or other raster data sets. Worked examples will revolve around environmental modelling.  TBA  
The topics covered in this workshop include; an introduction to 1) using R as a GIS (5 minutes), 2) pointtype data (5 minutes), and 3) raster data (5 minutes). We will then build on the basic knowledge of using R as a GIS by creating raster data (10 minutes) and exploring its basic features (10 minutes). After that we will import raster data into R (10 minutes) and begin to manipulate its extent, resolution, projection and values (45 minutes) and visualise the data (15 minutes). After the group is comfortable with editing the features of raster data we will discuss summarising raster data using point data (45 minutes). Finally, we will use this summarised data in a simple generalised linear model (45 minutes). The outline in dotpoints follows. Times are rough estimates. 

Wednesday Morning 9:0012:30 Break 10:3011:00  
Julie Josse and Nick Tierney  Missing values imputation  P10  People who want to know more about how dealing with missing values in their analysis and what is the available methods implemented  Basic knowledge of PCA and linear models are required  For this tutorial, remember to come with your laptop, Rstudio and the following packages installed: "VIM", "naniar", "missMDA", "Amelia", "mice", "missForest", "FactoMineR", "tidyverse". Slides, course notes, data sets, and Rmarkdown analyses will be available on my web page: http://juliejosse.com/teaching/  
The ability to easily collect and gather a large amount of data from different sources can be seen as an opportunity to better understand many processes. It has already led to breakthroughs in several application areas. However, due to the wide heterogeneity of measurements and objectives, these large databases often exhibit an extraordinary high number of missing values. Hence, in addition to scientific questions, such data also present some important methodological and technical challenges for data analyst. In this tutorial, we give an overview of the missing values literature as well as the recent improvements that caught the attention of the community due to their ability to handle large matrices with large amount of missing entries. We will illustrate the methods on medical, environmental and survey data. 

Carson Sievert  Interactive data visualization on the web with R  Auditorium  Anyone interested in interactive data visualization  TBA  
This tutorial teaches practical workflows for creating interactive web graphics which support common data analysis tasks. Through a series of examples, demos, exercises, and lecture, attendees will gain a foundation for navigating through common barriers of productivity associated with both the creation (e.g. startup cost, iteration cost, deadend cost) and distribution (e.g., deployment cost, scaling cost, latency cost) of interactive web graphics. 

Matteo Fasiolo  Quantile Generalized Additive Models: moving beyond Gaussianity  P6  The attendees should have a basic understanding of regression models and of the basic concepts underlying statistics and machine learning (e.g. probability densities, quantiles, etc).  Please bring your own laptop, with either R version 3.4.4 or 3.5 installed. On MAC you will also need to install XQuartz. Please install the mgcViz package from CRAN and use devtools::install_github("mfasiolo/mgcFam") to install mgcFam. For some of the exercises you might also need the following packages from CRAN: languageR , gamair and e1071 . 

Generalized Additive Models (GAMs) models are an extension of traditional parametric regression models, which have proved highly useful for both predictive and inferential purposes in a wide variety of scientific and commercial applications. One reason behind the popularity of GAMs is that they strike an interesting balance between flexibility and interpretability, while being able to handle large data sets. The mgcv R package is arguably the stateoftheart tool for fitting such models, hence the first half of this tutorial will introduce GAMs and mgcv, in the context of electricity demand forecasting. The second part of the tutorial will show how traditional GAMs can be extended to quantile GAMs, and how the latter can be fitted using the qgam R package. By the end of the tutorial the attendees should be able to build, fit and visualize traditional or quantile GAM models, using a combination of the mgcv, qgam and mgcViz R packages. This tutorial is aimed at a broad audience of statistical modellers, interested in using GAMs for predictive or inferential purposes. The models which will be presented in the tutorial have a very wide range of applicability, hence they should be of interest to practitioners in business intelligence, ecology, linguistics, epidemiology and geoscience to name a few. 

Maria Prokofieva  Follow Me: Introduction to social media analysis in R  P7  The tutorial will aim at the broad range of participants from various backgrounds (business, academics, etc.)  TBA  
The tutorial will review a range of R packages in social media analysis in R and will aim at teaching general principles of working with social media platforms and analysing information there. The social media platform covered are Facebook, Twitter, Instagram and Youtube. Topics covered during the tutorial include: 1. Structure of the social media data (e.g. userrelated data, posting related data, hashtags) 2. Benefits and challenges working with social media data (textual/nontextual information, large data volume, API limitations, 3. Connecting to a social media platform (e.g. authentication) and downloading data 4. Data analysis of the profile information (e.g. followers, likes, dislikes, favorites  platform dependent) 5. Data analysis of textual information (e.g. user posts, comments, dynamics, sentiment analysis, word clouds, etc.) 6. Visualisation of the social media data. 

Tong He  xgboost and MXNet  P11  TBA  laptop running OSX/Windows/Linux with recent R release, preferably the latest R 3.5.0, and the R packages xgboost and mxnet. mxnet installation is bit tricky, see details for [linux](https://mxnet.incubator.apache.org/install/index.html?platform=Linux&language=R&processor=CPU), [OSX](https://mxnet.incubator.apache.org/install/index.html?platform=MacOS&language=R&processor=CPU), [Windows](https://mxnet.incubator.apache.org/install/index.html?platform=Windows&language=R&processor=CPU)  
TBA 

Dirk Eddelbuettel  Extending R with C++: Motivation, Introduction and Examples  P8  Beginning to intermediate users of R who want to go further and farther  Now, Rcpp is a fairly big topic, and it requires working compiler setup. This tends to be somewhere between easieronsome and moretediousonother systems with Windows arguably the most difficult. We say a bit more about this in the Rcpp FAQ [1]  and we do not need more than R itself needs when compiling packages with C/C++/Fortran code is needed. I have found in the past that I cannot simply assume /everybody/ gets this without help, so I tend to do a bit 'lecture' and handson exercise. We will see if I manage to shift the balance a little. So if feel adventurous and want to take this on, I recommend: (1) a decent editor and environment; RStudio fits the bill for most people (2) a working compiler: Linux and macOS generally have it (though macOS keeps changing, and I don't use it myself so reach out to resources such as (3) CRAN packages Rcpp (of course) and RcppArmadillo A simple test to see if you are good, is to use Rcpp::evalCpp() on an expression: R> Rcpp::evalCpp("6 * 7") [1] 42 This actually creates a miniscule C++ routine around the expression, and will only show the expected result if the setup is working. When things fails, the RStudio IDE tends to show a few hints so try that. If all this sounds insurmountable, do not despair. I still recommend the tutorial as I think we should find time to bend your laptop to do all this during conference breaks.  
Rcpp has become the principal venue for extending R with compiled code. It makes it easy to extend R with C or C++ spanning the range from simple oneliners to larger routines and bindings of entire external libraries. We will motivate and introduce Rcpp as a natural extension to R that provides an easytouse and powerful interface. Helper functions and tools including RStudio will be used to easy creation of R extensions. Several examples will introduce basic use cases including writing code with RcppArmadillo which is the most widelyused package on top of Rcpp. This provides a natual bridge to the more recent RcppMLPACK package (which combines the MLPACK machine learning library with the Armadillo linear algebra library) from which we will study one or two examples. 

Charles Gray  Are you R Curious? (** This is a FREE tutorial.)  P9  Beginning users  Your laptop and enthusiasm (installation guide is on https://github.com/softloud/rcurious/blob/master/explore/onboarding.Rmd )  
Been meaning to quit excel and learn R for ages but not managed to find the time? Or maybe you are just not quite sure what this R thing that everyone is talking about is? This is the workshop for you. Or, have you had someone say, “It’s easy to use R. Just type R.” or “Oh, I just use the lm() function.” and thought, huh? Sometimes R users can trivialise the process of getting started. In this workshop, we aim to equip new R users with the confidence to problemsolve their way through getting set up with R and RStudio, and importing and exploring data in R. Developed through discussions amongst RLadies who also teach and communicate, we aim to visit the biggest potential pitfalls of your first data analysis in R. From installation issues with packages, to different data structures, to beginning exploratory data analysis and visualisation in R. 