Enabling Bayesian inference for big data

In the last decade or so, there has been a dramatic increase in storage facilities and the possibility of processing huge amounts of data. This has made large high-quality data sets widely accessible for practitioners. This technology innovation seriously challenges inference methodology, in particular simulation algorithms commonly applied in Bayesian inference. These algorithms typically require repeated evaluations over the whole data set when fitting models, precluding their use in the age of so called big data.

The aim of this project is to develop simulation tools with the ability to scale to large data sets in terms of both many observations and many parameters. We speed up the computations by subsampling, i.e. by making use of repeated small randomly selected subsets of the data and note that the posterior estimate we obtain is the same as that if the full data set was used each time. We use Hamiltonian Monte Carlo with derivatives obtained also by subsampling to be able to estimate models with a large number of parameters as well as observations.

Our research can be found in the following links:

Published:

http://amstat.tandfonline.com/doi/abs/10.1080/10618600.2017.1307117

Under revision:

https://arxiv.org/abs/1404.4178

https://arxiv.org/abs/1603.08232

https://arxiv.org/abs/1708.00955