Abstract
In this paper, we consider the problem of training structured neural networks (NN) with nonsmooth regularization (e.g. `1 -norm) and constraints (e.g. interval constraints). We formulate training as a constrained nonsmooth nonconvex optimization problem, and propose a convergent proximal-type stochastic gradient descent (Prox-SGD) algorithm. We show that under properly selected learning rates, with probability 1, every limit point of the sequence generated by the proposed ProxSGD algorithm is a stationary point. Finally, to support the theoretical analysis and demonstrate the flexibility of Prox-SGD, we show by extensive numerical tests how Prox-SGD can be used to train either sparse or binary neural networks through an adequate selection of the regularization function and constraint set.