Lightning Talk Schedule

The talks will take place on 11-13 July 2018 (click the interested talk for its abstract). Lightning talks will be 5 minutes, with room for discussion at the end of the session. Information for presenters is here.

Time Session Presenter Venue Title Keywords Chair
Time Session Presenter Venue Title Keywords Chair
14:00 Lightning talk Tom Elliott AUD Historical data based priors for better bus arrival-time prediction models, data mining, applications, space/time, big data Thomas Lin Pedersen
We have been developing a real-time bus arrival-time prediction framework, which so far relies solely on real-time data. However, we believe we can improve predictions (especially long-range, 20+ minutes) by incorporating historical data into the priors. This is especially useful in locations with infrequent buses, or before and after peak hour when travel times increase and decrease, respectively. Using a years' worth of GPS location data from buses in Auckland, New Zealand, we explore various models to develop time-dependent priors for bus travel time.
14:05 Lightning talk Jono Tuke AUD Pachinko prediction models, data mining Thomas Lin Pedersen
Social media is a great way for people to meet, chat and organise lunch, but it can also be used to meet, chat and organise protests. In this talk, I will explain how to use a Bayesian framework to predict social unrest from social media using Twitter as an example. So far so good, Bayesian modelling of Twitter data - but this is not that talk. How do you explain your modelling to the end-user? Not only that, but get them involved in the modelling?So let me show you how we used jam jars, coloured marbles, and the idea of Pachinko to explain our models to our collaborators, and how this started a conversation that lead to better models.
14:10 Lightning talk Elvina Viennet AUD Uncertainty and sensitivity analyses: application to modelling the reproduction number of an infectious disease models, performance Thomas Lin Pedersen
The usefulness of any model depends on the accuracy and reliability of its output. Uncertainty (UA) and sensitivity analyses (SA) are two approaches integral to the modelling process. UA enables to describe the range of possible outputs that derives from uncertainty in inputs, while SA enables a description of how sensitive the outcome variables are to inputs variation. These analyses allow for the identification of which parameters are important in explaining the outcome variable.Let’s consider a model for epidemic potential of Zika virus in Australia. To build this model we included input parameters that dictate the dynamics of disease transmission, leading to an output variable that describes how likely an outbreak can occur at time (t). We undertook the simple steps of i) UA, using a latin hypercube sampling method to generate 100 000 samples of the epidemic potential, and of ii) SA using the Partial Rank Correlation Coefficient analysis, to determine the statistical relationships between each input parameter and the epidemic potential while the other input parameters are kept constant.An overview of UA and SA in the context of infectious diseases modelling will be presented.
14:15 Lightning talk Mark Wohlers AUD Deep Learning with Keras R to model Spectral Data. algorithms, models Thomas Lin Pedersen
As deep learning has become more and more popular, TensorFlow has emerged as one of the dominant frameworks in the space. Keras, which runs on top of TensorFlow as well as other popular frameworks, is a high-level Python-based API with a user-friendly interface, large community, and clear documentation including an ever increasing number of examples. Recently the Keras R interface became available, making training deep learning models in R very accessible by connecting to this large resource. This R package also makes installing dependencies relatively straightforward, although some manual installations are still required.We give an overview of our experiences in learning to train deep learning models in R, starting with more basic examples before moving to applying on our own Near-infrared spectroscopy (NIRS)-based dataset. This involved firstly training Convolutional Neural Networks, as suggested by Bjerrum et al. (2017) and then experimenting with other architectures such as LSTM networks.The implementation was carried out using Keras with the TensorFlow gpu backend while optimisation of the various hyperparameters used the RBayesianOptimization package.
14:20 Lightning talk Saras Windecker AUD The zoon R package for reproducible and shareable species distribution modelling models, reproducibility, community/education Thomas Lin Pedersen
The rapid growth of species distribution modelling (SDM) as an ecological discipline has resulted in a large and diverse set of methods and software for constructing and evaluating SDMs. The disjointed nature of the current SDM research environment hinders evaluation of new methods, synthesis of current knowledge and the dissemination of new methods to SDM users. The zoon r package aims to overcome these problems by providing a modular framework for constructing reproducible SDM workflows. Zoon modules are interoperable snippets of r code, each carrying a SDM method that zoon combines into a single analysis object. Rather than defining these modules, zoon draws modules from an open, version-controlled online repository. zoon makes it easy for SDM researchers to contribute modules to this repository, enabling others to rapidly deploy new methods in their own workflows or to compare alternative methods. Each workflow object created by zoon is a rerunnable record of the data, code and results of an entire SDM analysis. This can then be easily shared, scrutinised, reproduced and extended by the whole SDM research community.
14:25 Lightning talk Stephanie Kobakian AUD taipan: Woman faces machine image analysis Thomas Lin Pedersen
Image surveys made easy in R. This talk will demonstrate the use of the taipan R package. The package allows you to create a shiny app survey from a list of questions and a set of images. The app users will be able to answer questions regarding specific highlighted areas of the image and a range of question types are supported.
14:30 Lightning talk Saskia Freytag AUD Can interactive shiny tools transform biological research? web app, bioinformatics, interfaces Thomas Lin Pedersen
The amount of biological data is increasing in excess of Moore’s law and the complexity of the data is astonishing. The number of Bioinformaticians, or people who specialize in the analysis of this type of data are insufficient, giving rise to lags in project progression leading to frustration for both bioinformatician and biologist. Interactive tools providing first pass exploratory analysis of novel datasets or dedicated statistical analysis of databases could create much needed relief and expedite analysis in general. Indeed, since the creation of the R package shiny many interactive Bioinformatics tools aimed at biologists have been created. For example, shinyGEO lets users explore and perform rudimentary analysis on all gene expression datasets stored in the public repository Gene Expression Omnibus. However, there are still many hurdles both in the creation and the uptake of such tools. As an author of two publicly available shiny tools, I want to share my experiences in building applications, designing their interfaces and their ultimate dissemination. I want to also touch on existing perceptions of such tools in both communities and how these might impede change
14:35 Lightning talk Roberts Jessie AUD Visualising Uncertainty for Decision Makers web app, bioinformatics, interfaces Thomas Lin Pedersen
Uncertainty is pervasive in every aspect of life. Yet despite its inescapable presence, it is under represented in the presentation of scientific and data derived insights to the non-expert decision makers. In this lightning talk I present one potential approach to visualising uncertainty within cancer mapping.
14:40 Lightning talk Chen Amy Tzu-Yu AUD Shiny - New York Pre-Kindergarten Explorer NA Thomas Lin Pedersen
Created by Shiny. Consists of interactive map and visualizations for New York City pre-kindergartens. The project provides a comprehensive guide for parents who do not understand Universal Pre-K Program or want to quickly find the closest and most suitable school for their children. This Pre-K guide gives information about each pre-k's location, contacts, meal plans, playspace, extended day care options, and enrollment restrictions. Also, this website allows the public to visualize whether there are enough seats for eligible kids in each borough.
Time Session Presenter Venue Title Keywords Chair
14:00 Lightning talk Yan Holtz AUD Getting rich quick with R & Cryptocurrencies? applications, streaming data Adrian Barnett
Cryptocurrency has been a hot topic recently, with the Bitcoin price reaching 20k dollars in December 2017. I recently used R to recover the price of 5 cryptocurrencies for 2 months, on 5 platforms, at 10 second intervals. I then created a bot that automatically performed arbitrage: the simultaneous buying and selling of currency in different platforms in order to take advantage of differing prices.Both processes took advantage of the R potential for data visualization: the use of Dygraph (for interactive time series) and Flexdashboard (to create dashboard from R Markdown) enable the automatic generation of clean daily reports. The talk would first describe the technical process which allows the harvesting of data from both the public and private platform APIs. It would then describe how to visualize this information efficiently. It would also describe the price differences between platforms and will reveal whether arbitrage is a beneficial process for cryptocurrency.
14:05 Lightning talk Ozan Cinar AUD poolR: Combining Dependent p-values bioinformatics Adrian Barnett
Combining the p-values of a series of hypothesis tests is a task that is used in a variety of research fields. There are several well-known techniques for this purpose, such as Fisher’s and Stouffer’s methods. Furthermore, multiple testing methods, such as the Bonferroni correction can also be used for this purpose, by using the most significant corrected p-value from the set as the combined p-value. However, one important aspect of these methods (except for the Bonferroni correction) is that they assume independence among the tests and hence p-values to be combined. This assumption is known to be violated in some contexts. Therefore, the correlations among the p-values should be addressed appropriately. We present a new R package, poolR, that can be used for combining p-values. The package implements five methods for combining p-values. Furthermore, these methods can be modified in order to account for the dependence structure among the p-values. These modifications include adjustments based on the effective number of tests and the use of empirically-derived null distributions. Moreover, generalizations of Fisher’s and Stouffer’s methods for dependent tests are also implemented.
14:10 Lightning talk Tamal Kumar De AUD Handling Missing Data in Single-Case Experiments community/education, simulation, single-case, missing data Adrian Barnett
Single-case experiments have become increasingly popular in educational and behavioral research. However, analysis of single-case data is often complicated by missing or incomplete observations. This complication may lead to experiments no longer “meeting standards” set by, for example, the What Works Clearinghouse. Hence, it is important to determine optimal strategies for dealing with missing data.We conducted a simulation study in R to compare three randomization test strategies of handling missing data: (1) randomizing a missing data marker and calculating all reference statistics only for available data points; (2) estimating the missing data points by using minimum mean square error linear interpolation; and (3) multiple imputation methods based on resampling the available data points. We simulated datasets for phase designs, alternating treatments designs, and multiple baseline designs. Missingness was introduced in the simulated datasets by using multiple probabilities of data being “Missing Completely At Random”. The three strategies are compared in terms of Type I error rate and statistical power and are compared to the operating characteristics of the complete dataset.
14:15 Lightning talk Dennis Wollersheim AUD What can you do with drugs? Exploring Pharmaceutical Benefit Scheme prescription drug usage using R visualisation, algorithms, models, databases, big data Adrian Barnett
In Australia, we have good prescription drug usage data, due to the Pharmaceutical Benefit Scheme, a population wide government insurance program. For example, the government recently released under creative commons license ten years of complete drug purchasing data for 10% of the population. The data is also quite clean, having been processed through a government payment system, and therefore easy to use. R has a diverse, powerful and concise toolset, and is very useful to exploit this dataset. It is a joy to be able to start with SQL queries on a postgres dataset, pipe through various transformations, resulting in highly specific output forms, such as tables and graphs, maps and time series. This workflow is replicable, archivable, expandable, and self documenting. Remaining challenges mostly relate to dataset complexity, and domain specific problems, such as how to use the intermittent drug usage signal to isolate and characterise the different varieties of continuous drug usage, and more generally, how to make use of wide heterogeneity of medical information. This talk will demonstrate what we have figured out in this area.
14:20 Lightning talk Laurent Thibault AUD npbr: A Package for Nonparametric Boundary Regression in R nonparametric statistics Adrian Barnett
The package npbr is the first free specialized software for data edge and frontier analysis in the statistical literature. It provides a variety of functions for the best known and most innovative approaches to nonparametric boundary estimation. The selected methods are concerned with empirical, smoothed, unrestricted as well as constrained fits under both single and multiple shape constraints. They also cover data envelopment techniques as well as robust approaches to outliers. The routines included in npbr are user friendly and afford a large degree of flexibility in the estimation specifications. They provide smoothing parameter selection for the modern local linear and polynomial spline methods as well asfor some promising extreme value techniques. Also, they seamlessly allow for Monte Carlo comparisons among the implemented estimation procedures. This package will be very useful for statisticians and applied researchers interested in employing nonparametric boundary regression models. Its use is illustrated with a number of empirical applications and simulated examples.
14:25 Lightning talk Robert King AUD R's early growth: easy dissemination of new stats history Adrian Barnett
Why did R grow so rapidly in its early days? This is a reflection on the history of R, particularly on the reasons for its early growth. R made it easy to code and distribute new statistical techniques. To what extent does R retain this advantage today?
14:30 Lightning talk Pearse Alan Ryou AUD Exploiting the natural structure of a large spatial dataset to cut down on processing time: an example from the Great Barrier Reef history Adrian Barnett
The mantra of “working smarter, not harder” is reminiscent of the bad advice you might have received from your high school careers counsellor. However, when working with large spatial datasets, it pays to take the saying to heart. In this talk, I present the example of a geoprocessing problem I faced with data on the Great Barrier Reef. I had to delineate reef passages between all pairs of neighbouring reefs on the edge of the continental shelf, and find the average depth and slope of each of these passages. I present my methodology for exploiting the natural structure of the reef and the passages to drastically reduce the time required for each operation in R. Though the example is highly specialised, I hope it will help you to develop your own ideas for exploiting the natural spatial structure of otherwise prohibitively large datasets that you may be faced with.
14:35 Lightning talk Maschette Dale AUD Round peg in a square hole: making Southern Ocean maps NA Adrian Barnett
Polar maps are often depicted in publications as circles; despite this there is little functionality in R packages to crop raster layers into circles. Here we present SOmap; a package to produce basic round Southern Hemisphere maps in a polar projection.
14:40 Lightning talk Sparks Adam H. AUD Establishing a community for supporting and fostering adoption of reproducible research using R NA Adrian Barnett