\[ \newcommand{\argmin}{\operatorname{argmin}} \newcommand{\sign}{\operatorname{sign}} \newcommand{\diag}[1]{\operatorname{diag}(#1)} \newcommand{\prox}[2]{\operatorname{prox}_{#1}\left(#2\right)} \]
We’re going to demonstrate some of the material from the BayesMAP paper 1 for the Lasso problem, given by \[ \begin{gather*} \argmin_x \left\{ F(x) = f(x) + \lambda \phi_q(x) \right\} \end{gather*} \] with \(\phi_q(x_j) = \log(1+|x_j|/q)\) and \(f(x) = \frac{1}{2} \|A x - b\|^2\).
We use the Proximal Gradient algorithm (described here) to estimate the \(\argmin\).
The basic quantities for this estimation problem are implemented by the following:
## Loading required package: knitr
l2.loss.fun = function(x, env) {
return(sum(crossprod(env$A %*% x - env$y)/2))
}
l2.loss.fun.grad = function(x, env) {
AtA = mget("AtA", env=list2env(env), mode="numeric",
list(function(x) crossprod(env$A)))[[1]]
Aty = mget("Aty", env=list2env(env), mode="numeric",
list(function(x) crossprod(env$A,env$y)))[[1]]
AtAx = AtA %*% x
return(AtAx - Aty)
}
l2.pareto.obj.fun = function(x, env) {
l2.loss.fun(x, env) + env$lambda * sum(log(1+abs(x)/env$q))
}
l2.pareto.obj.fun.grad = function(x, env) {
return(l2.loss.fun.grad(x, env) - env$lambda/(env$q-x))
}
mm.quad.xi = function(x, env) {
AtA = mget("AtA", env=list2env(env), mode="numeric",
list(function(x) crossprod(env$A)))[[1]]
Aty = mget("Aty", env=list2env(env), mode="numeric",
list(function(x) crossprod(env$A,env$y)))[[1]]
AtAx = AtA %*% x
x.grad = AtAx - Aty
return(x.grad)
}
double.pareto.prox = function(x, lambda, env) {
d = pmax(env$q * abs(x) - lambda, 0)
return(sign(x)/2 * (abs(x) - env$q + sqrt((env$q - abs(x))^2 + 4*d)))
}
obj.fun = l2.pareto.obj.fun
obj.fun.grad = l2.pareto.obj.fun.grad
\(x^*\) is the true signal, so that \(A x^* = y\), and sparsity is induced by sampling 30 non-zero \(N(0,1)\) terms.
Our observation matrix \(A\) is 100x300, comprised of \(\operatorname{N}(0,1/300)\) entries, and signal noise is added to \(y\) such that \(A x^* = y^* + \epsilon\) where \(\epsilon \sim \operatorname{N}(0, \epsilon)\) and \(\epsilon=\) 6.9959 × 10-6.
We determine the Lipschitz bound \(L_f\) to be 10.3453 and set \(\lambda=0.0664\) and the double-Pareto parameter \(q=1\).
We compute a baseline minimum objective value by solving the following subgradient problem:
opt.obj = optim(prob.env$x.true, fn=obj.fun, gr=obj.fun.grad,
prob.env, method="BFGS",
control=list('maxit'=5000, 'trace'=1))
The resulting objective value is 0.9849.
The Gamma Lasso has a similar form and thus is a good candidate for comparision. We use the gamlr
package in what follows.
require(gamlr)
## Loading required package: gamlr
## Loading required package: Matrix
gamlr.results = gamlr(prob.env$A, prob.env$y, family="gaussian", nlambda=100, lambda.start=prob.env$lambda)
x.gamlr = gamlr.results$beta[,dim(gamlr.results$beta)[2]]
obj.fun.gamlr = obj.fun(x.gamlr, prob.env)
First, we source the code for our generic algorithms, e.g. Nesterov acceleration, back-tracking line-search, etc., (see the algos page for more information):
“BayesMAP” Walter Dempsey, Nicholas Polson, Brandon Willard↩