\[ \newcommand{\argmin}{\operatorname{argmin}} \newcommand{\sign}{\operatorname{sign}} \newcommand{\diag}[1]{\operatorname{diag}(#1)} \newcommand{\prox}[2]{\operatorname{prox}_{#1}\left(#2\right)} \]

Introduction

We’re going to demonstrate some of the material from the BayesMAP paper 1 for the Lasso problem, given by \[ \begin{gather*} \argmin_x \left\{ F(x) = f(x) + \lambda \phi_q(x) \right\} \end{gather*} \] with \(\phi_q(x_j) = \log(1+|x_j|/q)\) and \(f(x) = \frac{1}{2} \|A x - b\|^2\).

We use the Proximal Gradient algorithm (described here) to estimate the \(\argmin\).

The basic quantities for this estimation problem are implemented by the following:

## Loading required package: knitr
l2.loss.fun = function(x, env) {
  return(sum(crossprod(env$A %*% x - env$y)/2))
}

l2.loss.fun.grad = function(x, env) {
  AtA = mget("AtA", env=list2env(env), mode="numeric", 
            list(function(x) crossprod(env$A)))[[1]]
  Aty = mget("Aty", env=list2env(env), mode="numeric", 
                   list(function(x) crossprod(env$A,env$y)))[[1]]
  AtAx = AtA %*% x
  return(AtAx - Aty)
}

l2.pareto.obj.fun = function(x, env) {
  l2.loss.fun(x, env) + env$lambda * sum(log(1+abs(x)/env$q))
}

l2.pareto.obj.fun.grad = function(x, env) { 
  return(l2.loss.fun.grad(x, env) - env$lambda/(env$q-x))
}

mm.quad.xi = function(x, env) {
  AtA = mget("AtA", env=list2env(env), mode="numeric", 
            list(function(x) crossprod(env$A)))[[1]]
  Aty = mget("Aty", env=list2env(env), mode="numeric", 
                   list(function(x) crossprod(env$A,env$y)))[[1]]
  AtAx = AtA %*% x
  x.grad = AtAx - Aty
  return(x.grad)
}

double.pareto.prox = function(x, lambda, env) {
  d = pmax(env$q * abs(x) - lambda, 0)
  return(sign(x)/2 * (abs(x) - env$q + sqrt((env$q - abs(x))^2 + 4*d)))
}

obj.fun = l2.pareto.obj.fun
obj.fun.grad = l2.pareto.obj.fun.grad

Simulate a Signal

\(x^*\) is the true signal, so that \(A x^* = y\), and sparsity is induced by sampling 30 non-zero \(N(0,1)\) terms.

Our observation matrix \(A\) is 100x300, comprised of \(\operatorname{N}(0,1/300)\) entries, and signal noise is added to \(y\) such that \(A x^* = y^* + \epsilon\) where \(\epsilon \sim \operatorname{N}(0, \epsilon)\) and \(\epsilon=\) 6.9959 × 10-6.

We determine the Lipschitz bound \(L_f\) to be 10.3453 and set \(\lambda=0.0664\) and the double-Pareto parameter \(q=1\).

Comparisons

We compute a baseline minimum objective value by solving the following subgradient problem:

opt.obj = optim(prob.env$x.true, fn=obj.fun, gr=obj.fun.grad, 
                prob.env, method="BFGS",
                control=list('maxit'=5000, 'trace'=1))

The resulting objective value is 0.9849.

The Gamma Lasso has a similar form and thus is a good candidate for comparision. We use the gamlr package in what follows.

require(gamlr)
## Loading required package: gamlr
## Loading required package: Matrix
gamlr.results = gamlr(prob.env$A, prob.env$y, family="gaussian", nlambda=100, lambda.start=prob.env$lambda)
x.gamlr = gamlr.results$beta[,dim(gamlr.results$beta)[2]]
obj.fun.gamlr = obj.fun(x.gamlr, prob.env)

Algorithms

First, we source the code for our generic algorithms, e.g. Nesterov acceleration, back-tracking line-search, etc., (see the algos page for more information):

Plots

plot of chunk plot_resultsplot of chunk plot_results


  1. “BayesMAP” Walter Dempsey, Nicholas Polson, Brandon Willard