Conferences >2024 IEEE International Sympo...

Optimized Transformer Models: ℓ′ BERT with CNN-like Pruning and Quantization

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Optimizing techniques for neural network architectures aimed at the edge are complex and intricate, which makes them non-universal. Edge computing and artificial intellig...Show More

Metadata

Abstract:

Optimizing techniques for neural network architectures aimed at the edge are complex and intricate, which makes them non-universal. Edge computing and artificial intelligence overlap to enhance data security by enabling data processing at the source, mitigating any risk during data transfer. As data security concerns are growing among world governments, AI on edge has become a highly relevant field of modern research. There is a strict need to harness the power of Convolutional Neural Networks (CNNs) and Transformer networks on resource-constrained edge devices. Although many pruning and quantization techniques have been proposed for CNNs, they may not be directly applied to transformers due to the different computation patterns. This paper will explore the implications of two fundamental techniques: pruning and quantization. We will conduct a comparative analysis to explore the applicability of optimization techniques in Transformers, originally designed for CNNs, for real-world edge deployment. Experimental results show that significant improvement in compression ratio can be achieved while the accuracy of the transformer models is maintained.

Published in: 2024 IEEE International Symposium on Circuits and Systems (ISCAS)

Date of Conference: 19-22 May 2024

Date Added to IEEE Xplore: 02 July 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/ISCAS58744.2024.10558045

Conference Location: Singapore, Singapore

Funding Agency:

Contents

References is not available for this document.

Optimized Transformer Models: ℓ′ BERT with CNN-like Pruning and Quantization

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Optimized Transformer Models: ℓ′ BERT with CNN-like Pruning and Quantization

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?