Abstract:
Training a one-node neural network with the ReLU activation function via optimization, which we refer to as the ON-ReLU problem, is a fundamental problem in machine learn...Show MoreMetadata
Abstract:
Training a one-node neural network with the ReLU activation function via optimization, which we refer to as the ON-ReLU problem, is a fundamental problem in machine learning. In this paper, we begin by proving the NP-hardness of the ON-ReLU problem. We then present an approximation algorithm to solve the ON-ReLU problem, whose running time is O(nk) where n is the number of samples, and k is a predefined integral constant as an algorithm parameter. We analyze the performance of this algorithm under two regimes and show that: (1) given any arbitrary set of training samples, the algorithm guarantees an (n/k)-approximation for the ON-ReLU problem - to the best of our knowledge, this is the first time that an algorithm guarantees an approximation ratio for arbitrary data scenario; thus, in the ideal case (i.e., when the training error is zero) the approximation algorithm achieves the globally optimal solution for the ON-ReLU problem; and (2) given training sample with Gaussian noise, the same approximation algorithm achieves a much better asymptotic approximation ratio which is independent of the number of samples n. Extensive numerical studies show that our approximation algorithm can perform better than the gradient descent algorithm. Our numerical results also show that the solution of the approximation algorithm can provide a good initialization for gradient descent, which can significantly improve the performance.performance.
Published in: IEEE Transactions on Signal Processing ( Volume: 68)