\[ \newcommand{\argmin}{\operatorname{argmin}} \newcommand{\sign}{\operatorname{sign}} \newcommand{\diag}[1]{\operatorname{diag}(#1)} \newcommand{\prox}[2]{\operatorname{prox}_{#1}\left(#2\right)} \]

Introduction

We’re going to demonstrate some of the material from the BayesMAP paper 1 for the Lasso problem, given by \[ \begin{gather*} \argmin_x \left\{ F(x) = f(x) + \lambda \phi_b(x) \right\} \end{gather*} \] with \(\phi(x) = \|x\|^1\) and \(f(x)=\frac{1}{2} \|A x - y\|^2\).

We use the Proximal Gradient algorithm (described here) to estimate the \(\argmin\).

The basic quantities for this estimation problem are implemented by the following:

## Loading required package: knitr
l2.loss.fun = function(x, env) {
  return(sum(crossprod(env$A %*% x - env$y)/2))
}

l2.loss.fun.grad = function(x, env) {
  AtA = mget("AtA", env=list2env(env), mode="numeric", 
            list(function(x) crossprod(env$A)))[[1]]
  Aty = mget("Aty", env=list2env(env), mode="numeric", 
                   list(function(x) crossprod(env$A,env$y)))[[1]]
  AtAx = AtA %*% x
  return(AtAx - Aty)
}

l2.l1.obj.fun = function(x, env) {
  return(l2.loss.fun(x,env) + env$lambda * sum(abs(x)))
}

l2.l1.obj.fun.grad = function(x, env) {
  return(l2.loss.fun.grad(x,env) + env$lambda * sign(x))
}

mm.quad.xi = function(x, env) {
  AtA = mget("AtA", env=list2env(env), mode="numeric", 
            list(function(x) crossprod(env$A)))[[1]]
  Aty = mget("Aty", env=list2env(env), mode="numeric", 
                   list(function(x) crossprod(env$A,env$y)))[[1]]
  AtAx = AtA %*% x
  x.grad = AtAx - Aty
  return(x.grad)
}

soft.prox = function(x, lambda, ...) {
  return(sign(x) * pmax(abs(x)-lambda, 0))
}

obj.fun = l2.l1.obj.fun
obj.fun.grad = l2.l1.obj.fun.grad

Simulate a Signal

\(x^*\) is the true signal, so that \(A x^* = y\), and sparsity is induced by sampling 30 non-zero \(N(0,1)\) terms.

Our observation matrix \(A\) is 100x300, comprised of \(\operatorname{N}(0,1/300)\) entries, and signal noise is added to \(y\) such that \(A x^* = y^* + \epsilon\) where \(\epsilon \sim \operatorname{N}(0, \epsilon^2)\) and \(\epsilon=\) 0.0026.

We determine the Lipschitz bound \(L_f\) to be 10.3453 and set \(\lambda=0.0664\). Effectively, we’ll be shrinking noise/signals below 0.0064.

Comparisons

We compute a baseline minimum objective value by solving the following subgradient problem:

opt.obj = optim(prob.env$x.true, fn=obj.fun, gr=obj.fun.grad, 
                prob.env, method="BFGS",
                control=list('maxit'=5000, 'trace'=1))

The resulting objective value is 1.1317.

Implementation

First, we source the code for our generic algorithms, e.g. Nesterov acceleration, back-tracking line-search, etc., (see the algos page for more information):

Plots

plot of chunk plot_resultsplot of chunk plot_results


  1. “BayesMAP” Walter Dempsey, Nicholas Polson, Brandon Willard