Abstract:
Deep Neural Networks (DNN) which employ multiple convolutional layers have shown remarkable accuracy in image recognition, image reconstruction, audio classification and ...Show MoreMetadata
Abstract:
Deep Neural Networks (DNN) which employ multiple convolutional layers have shown remarkable accuracy in image recognition, image reconstruction, audio classification and other machine learning applications. However, a theoretical framework to explain the internal mechanism of these DNNs still remains elusive. There have been attempts to use Information Theory to crack open the DNN “black box” by showing that information is squeezed through an Information Bottleneck (IB) formed by the layers of the DNN. IB analysis in the literature is mostly based on fully connected networks and the analysis of convolutional neural networks remains extremely sparse. In analyzing the IB behavior of DNNs, the inputs and outputs of each layer are vectorized and the spatial and temporal properties of the images are ignored while computing the mutual information. In this work, we analyze DNNs which consist of convolutional layers. Each convolutional kernel along with the corresponding activation function is viewed as an Information Channel (IC). We use the spatiotemporal properties of the images to compute the Mutual Information (MI) between the channel input and output and demonstrate that the DNNs generalize and learn by reducing the Shannon capacity of these ICs while maximizing the accuracy of prediction.
Published in: 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP)
Date of Conference: 17-20 September 2023
Date Added to IEEE Xplore: 23 October 2023
ISBN Information: