Abstract:
Optimizing techniques for neural network architectures aimed at the edge are complex and intricate, which makes them non-universal. Edge computing and artificial intellig...Show MoreMetadata
Abstract:
Optimizing techniques for neural network architectures aimed at the edge are complex and intricate, which makes them non-universal. Edge computing and artificial intelligence overlap to enhance data security by enabling data processing at the source, mitigating any risk during data transfer. As data security concerns are growing among world governments, AI on edge has become a highly relevant field of modern research. There is a strict need to harness the power of Convolutional Neural Networks (CNNs) and Transformer networks on resource-constrained edge devices. Although many pruning and quantization techniques have been proposed for CNNs, they may not be directly applied to transformers due to the different computation patterns. This paper will explore the implications of two fundamental techniques: pruning and quantization. We will conduct a comparative analysis to explore the applicability of optimization techniques in Transformers, originally designed for CNNs, for real-world edge deployment. Experimental results show that significant improvement in compression ratio can be achieved while the accuracy of the transformer models is maintained.
Date of Conference: 19-22 May 2024
Date Added to IEEE Xplore: 02 July 2024
ISBN Information: