Bayesian variable selection for sparse solutions.

In this project we consider the challenging task of developing fully Bayesian sparse analyses for the situations when the numbers of predictors is larger than observations for complex responses and covariates grouped by blocks with the sparsity for blocks and cases. It is well established that incorporation of prior knowledge on the structure existing in the data for potential grouping of the covariates is key to more accurate prediction and improved interpretability. In genomics, genes within the same pathway have similar functions and act together in regulating a biological system. These genes can add up to have a larger effect and therefore can be detected as a group (i.e., at a pathway or gene set level). Incorporation of this grouping structure is becoming increasingly common.

We will develop general Bayesian hierarchical models for variable selection with a group structure. Properties of the posterior median estimator such as an oracle property for group variable selection will be investigated as well as asymptotic distributions. Efficient Gibbs sampling algorithms for our Bayesian models will be derived. We can illustrate the methods with challenging genetic data sets using gene expression data as the responses and SNPs as explanatory variables.