Hamiltonian Monte Carlo with Energy Conserving Subsampling

Bayesian methods for statistical models rely on computing the posterior distribution of the parameters given the data, which typically does not correspond to a known distribution. The main challenges of Bayesian computation is to approximate this unknown distribution efficiently as the number of observations and parameters grow.

Hamiltonian Monte Carlo (HMC) is a modern tool to sample from complicated, high dimensional distributions and is currently considered as one of the state-of-the-art methods for posterior sampling. However, HMC is computationally expensive for datasets with many observations and previous research has shown that speeding up HMC using naive data subsampling destroys its ability to sample efficiently.

ACEMS researchers found a way to make HMC compatible with data subsampling without deteriorating the sampling efficiency of the algorithm. Their work[1] was recently published by the Journal of Machine Learning Research, which is a top journal in the field. The new method, named “Hamiltonian Monte Carlo with Energy Conserving Subsampling”, is shown to achieve similar efficiency and scalability as HMC but is much faster. It is also more accurate than current machine learning methods that utilize subsampling. This work advances the team’s previous work[2, 3] significantly, provides a fast and reliable method to estimate models with many more parameters.

The research group has recently completed a paper that applies this work to develop an efficient sequential Monte Carlo sampler for large datasets[4].

Project Researchers

Link to publication

Quiroz, M., Kohn R., Villani M., & Tran M N. (2019).  Speeding Up MCMC by Efficient Data Subsampling. Journal of the American Statistical Association. 114(526), 831-843. doi: 10.1080/01621459.2018.1448827
Dang, K-D., Quiroz M., Kohn R., Tran M-N., & Villani M. (2019).  Hamiltonian Monte Carlo with Energy Conserving Subsampling. Journal of Machine Learning Research. 20(100), 1-31.