\[ \newcommand{\argmin}{\operatorname{argmin}} \newcommand{\sign}{\operatorname{sign}} \newcommand{\diag}[1]{\operatorname{diag}(#1)} \newcommand{\prox}[2]{\operatorname{prox}_{#1}\left(#2\right)} \]
We’re going to demonstrate some of the material from the BayesMAP paper 1 for the Lasso problem, given by \[ \begin{gather*} \argmin_x \left\{ F(x) = f(x) + \lambda \phi_b(x) \right\} \end{gather*} \] with \(\phi(x) = \|x\|^1\) and \(f(x)=\frac{1}{2} \|A x - y\|^2\).
We use the Proximal Gradient algorithm (described here) to estimate the \(\argmin\).
The basic quantities for this estimation problem are implemented by the following:
## Loading required package: knitr
l2.loss.fun = function(x, env) {
return(sum(crossprod(env$A %*% x - env$y)/2))
}
l2.loss.fun.grad = function(x, env) {
AtA = mget("AtA", env=list2env(env), mode="numeric",
list(function(x) crossprod(env$A)))[[1]]
Aty = mget("Aty", env=list2env(env), mode="numeric",
list(function(x) crossprod(env$A,env$y)))[[1]]
AtAx = AtA %*% x
return(AtAx - Aty)
}
l2.l1.obj.fun = function(x, env) {
return(l2.loss.fun(x,env) + env$lambda * sum(abs(x)))
}
l2.l1.obj.fun.grad = function(x, env) {
return(l2.loss.fun.grad(x,env) + env$lambda * sign(x))
}
mm.quad.xi = function(x, env) {
AtA = mget("AtA", env=list2env(env), mode="numeric",
list(function(x) crossprod(env$A)))[[1]]
Aty = mget("Aty", env=list2env(env), mode="numeric",
list(function(x) crossprod(env$A,env$y)))[[1]]
AtAx = AtA %*% x
x.grad = AtAx - Aty
return(x.grad)
}
soft.prox = function(x, lambda, ...) {
return(sign(x) * pmax(abs(x)-lambda, 0))
}
obj.fun = l2.l1.obj.fun
obj.fun.grad = l2.l1.obj.fun.grad
\(x^*\) is the true signal, so that \(A x^* = y\), and sparsity is induced by sampling 30 non-zero \(N(0,1)\) terms.
Our observation matrix \(A\) is 100x300, comprised of \(\operatorname{N}(0,1/300)\) entries, and signal noise is added to \(y\) such that \(A x^* = y^* + \epsilon\) where \(\epsilon \sim \operatorname{N}(0, \epsilon^2)\) and \(\epsilon=\) 0.0026.
We determine the Lipschitz bound \(L_f\) to be 10.3453 and set \(\lambda=0.0664\). Effectively, we’ll be shrinking noise/signals below 0.0064.
We compute a baseline minimum objective value by solving the following subgradient problem:
opt.obj = optim(prob.env$x.true, fn=obj.fun, gr=obj.fun.grad,
prob.env, method="BFGS",
control=list('maxit'=5000, 'trace'=1))
The resulting objective value is 1.1317.
First, we source the code for our generic algorithms, e.g. Nesterov acceleration, back-tracking line-search, etc., (see the algos page for more information):
“BayesMAP” Walter Dempsey, Nicholas Polson, Brandon Willard↩