Equiangular Basis Vectors: A Novel Paradigm for Classification Tasks

Shen, Yang; Sun, Xuhao; Wei, Xiu-Shen; Xu, Anqi; Gao, Lingyan

doi:10.1007/s11263-024-02189-2

Equiangular Basis Vectors: A Novel Paradigm for Classification Tasks

Published: 30 July 2024

Volume 133, pages 372–397, (2025)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Yang Shen¹,
Xuhao Sun¹,
Xiu-Shen Wei ORCID: orcid.org/0000-0002-8200-1845²,
Anqi Xu³ &
…
Lingyan Gao⁴

300 Accesses
Explore all metrics

Abstract

In this paper, we propose Equiangular Basis Vectors (EBVs) as a novel training paradigm of deep learning for image classification tasks. Differing from prominent training paradigms, e.g., k-way classification layers (mapping the learned representations to the label space) and deep metric learning (quantifying sample similarity), our method generates normalized vector embeddings as "predefined classifiers", which act as the fixed learning targets corresponding to different categories. By minimizing the spherical distance of the embedding of an input between its categorical EBV in training, the predictions can be obtained by identifying the categorical EBV with the smallest distance during inference. More importantly, by directly adding EBVs corresponding to newly added categories of equal status on the basis of existing EBVs, our method exhibits strong scalability to deal with the large increase of training categories in open-environment machine learning. In experiments, we evaluate EBVs on diverse computer vision tasks with large-scale real-world datasets, including classification on ImageNet-1K, object detection on COCO, semantic segmentation on ADE20K, etc. We further collected a dataset consisting of 100,000 categories to validate the superior performance of EBVs when handling a large number of categories. Comprehensive experiments validate both the effectiveness and scalability of our EBVs. Our method won the first place in the 2022 DIGIX Global AI Challenge, code along with all associated logs are open-source and available at https://github.com/aassxun/Equiangular-Basis-Vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical Image Representation Using Deep Network

Few-Shot Classification with Semantic Augmented Activators

Tensor-driven low-rank discriminant analysis for image set classification

Article 08 September 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

Image data in Tables 1, 3 and 4 were extracted from ImageNet-1K (Deng et al., 2009). Image data in Table 2 were extracted from ImageNet-1K (Deng et al., 2009), CUB200-2011 (Wah et al., 2011) and Aircraft (Maji et al., 2013). Image data in Tables 5 and 6 were extracted from COCO 2017 (Lin et al., 2014). Image data in Tables 7 and 8 were extracted from ADE20K (Zhou et al., 2019). Image data in Table 9 were extracted from the citizen science website ${\textrm{iNaturalist}}$ (www.inaturalist.org). Image data in Table 10 were extracted from CIFAR-10 and CIFAR-100 (Cao et al., 2019). Image data in Table 11 were extracted from iNaturalist 2018 (Van Horn et al., 2018). Image data in Tables 12 and 13 were extracted from ISUN (Xu et al., 2015), Place365 (Zhou et al., 2017), Texture (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), LSUN-Crop (Yu et al., 2015) and LSUN-Resize (Yu et al., 2015).

Notes

References

Bao, H., Dong, L., Piao, S., & Wei, F. (2021). Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254
Bellet, A., Habrard, A., & Sebban, M. (2013). A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709
Cao, K., Wei, C., Gaidon, A., Arechiga, N., & Ma, T. (2019). Learning imbalanced datasets with label-distribution-aware margin loss. Advances in Neural Information Processing Systems, pp. 1567–1578
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., & Joulin, A. (2021). Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 9650–9660
Chapelle, O., Haffner, P., & Vapnik, V. N. (1999). Support vector machines for histogram-based image classification. IEEE Transactions on Neural Networks, 10(5), 1055–1064.
Article MATH Google Scholar
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, pp. 1597–1607
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., & Lin, D. (2019). MMDetection: Open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., & Vedaldi, A. (2014). Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3606–3613
Contributors, M. (2020). MMSegmentation: Open MMLab semantic segmentation toolbox and benchmark. Available online: https://github.com/open-mmlab/mmsegmentation (Retrieved on 18 May 2022)
Cortes, C., Mohri, M., & Rostamizadeh, A. (2012). Algorithms for learning kernels based on centered alignment. Journal of Machine Learning Research, 13, 795–828.
MathSciNet MATH Google Scholar
De Boer, P. T., Kroese, D. P., Mannor, S., & Rubinstein, R. Y. (2005). A tutorial on the cross-entropy method. Annals of Operations Research, 134(1), 19–67.
Article MathSciNet MATH Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition, pp. 248–255
Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4690–4699
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Elad, M. (2010). Sparse and redundant representations: from theory to applications in signal and image processing, vol. 2. Springer
Ericson, T., & Zinoviev, V. (2001). Codes on Euclidean spheres. Elsevier
Glazyrin, A., & Yu, W. H. (2018). Upper bounds for s-distance sets and equiangular lines. Advances in Mathematics, 330, 810–833.
Article MathSciNet MATH Google Scholar
Gretton, A., Fukumizu, K., Teo, C., Song, L., Schölkopf, B., & Smola, A. (2007). A kernel statistical test of independence. Advances in neural information processing systems, pp. 585–592
Guo, Y., Wang, X., Chen, Y., & Yu, S.X. (2022). Clipped hyperbolic classifiers are super-hyperbolic classifiers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11–20
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16000–16009
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In IEEE international conference on computer vision, pp. 2961–2969
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531
Hu, G., Xu, Z., Wang, G., Zeng, B., Liu, Y., & Lei, Y. (2021). Forecasting energy consumption of long-distance oil products pipeline based on improved fruit fly optimization algorithm and support vector regression. Energy, 224, 120153.
Jiang, Z., Tidor, J., Yao, Y., Zhang, S., & Zhao, Y. (2021). Equiangular lines with a fixed angle. Annals of Mathematics, 194(3), 729–743.
Article MathSciNet MATH Google Scholar
Johannes, H. (1948). Equilateral point-sets in elliptic two- and three-dimensional spaces. Nieuw Arch. Wiskunde, 22(2), 355–362.
MathSciNet MATH Google Scholar
Kaya, M., & Bilge, H. Ş. (2019). Symmetry. Deep metric learning: A survey, 11(9), 1066.
MATH Google Scholar
Kingma, D.P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Kirillov, A., Girshick, R., He, K., & Dollár, P. (2019). Panoptic feature pyramid networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6399–6408
Kolesnikov, A., Beyer, L., Zhai, X., Puigcerver, J., Yung, J., Gelly, S., & Houlsby, N. (2020). Big Transfer (BiT): General visual representation learning. In European Conference Computer Vision pp. 491–507. Springer
Kornblith, S., Norouzi, M., Lee, H., & Hinton, G. (2019). Similarity of neural network representations revisited. In International conference on machine learning, pp. 3519–3529
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90.
Article MATH Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Article MATH Google Scholar
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C.L. (2014). Microsoft COCO: Common objects in context. In European Conference Computer Vision., pp. 740–755
Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al. (2022). Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12009–12019
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision, pp. 10012–10022
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A ConvNet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11976–11986
Liu, W., Wang, X., Owens, J., & Li, Y. (2020). Energy-based out-of-distribution detection. Adv. Neural Inform. Process. Syst., 33, 21464–21475.
MATH Google Scholar
Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101
Lu, D., & Weng, Q. (2007). A survey of image classification methods and techniques for improving classification performance. International Journal of Remote Sensing, 28(5), 823–870.
Article MATH Google Scholar
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151
Maxwell, A. E., Warner, T. A., & Fang, F. (2018). Implementation of machine-learning classification in remote sensing: An applied review. International Journal of Remote Sensing, 39(9), 2784–2817.
Article MATH Google Scholar
McCallum, A., Freitag, D., & Pereira, F.C. (2000). Maximum entropy markov models for information extraction and segmentation. In: International Conference on Machine Learning., pp. 591–598
Mensink, T., Verbeek, J., Perronnin, F., & Csurka, G. (2013). Distance-based image classification: Generalizing to new classes at near-zero cost. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11), 2624–2637.
Article MATH Google Scholar
Mettes, P., Van der Pol, E., & Snoek, C. (2019). Hyperspherical prototype networks. Advances in neural information processing systems
Müller, S.G., & Hutter, F. (2021). Trivialaugment: Tuning-free yet state-of-the-art data augmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 774–782
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A.Y. (2011). Reading digits in natural images with unsupervised feature learning. Adv. Neural Inform. Process. Syst. Worksh., pp. 1–9
Pernici, F., Bruni, M., Baecchi, C., & Del Bimbo, A. (2021). Regular polytope networks. Adv. Neural Inform. Process. Syst., pp. 4373–4387
Ranjan, R., Castillo, C.D., & Chellappa, R. (2017). L2-constrained softmax loss for discriminative face verification. arXiv preprint arXiv:1703.09507
Rao, H., Leung, C., & Miao, C. (2023). Hierarchical skeleton meta-prototype contrastive learning with hard skeleton mining for unsupervised person re-identification. Int. J. Comput. Vis. pp. 1–23
Rawat, W., & Wang, Z. (2017). Deep convolutional neural networks for image classification: A comprehensive review. Neural Computation, 29(9), 2352–2449.
Article MathSciNet MATH Google Scholar
Renes, J. M., Blume-Kohout, R., Scott, A. J., & Caves, C. M. (2004). Symmetric informationally complete quantum measurements. Journal of Mathematical Physics, 45(6), 2171–2180.
Article MathSciNet MATH Google Scholar
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics pp. 400–407
Rudin, W. (1953). Principles of mathematical analysis
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, C. A., & Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Article MathSciNet Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D.(2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision., pp. 618–626
Shen, Y., Sun, X., & Wei, X.S. (2023). Equiangular basis vectors. arXiv preprint arXiv:2303.11637
Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. Advances in Neural Information Processing Systems, pp. 4080–4090
Strohmer, T., & Heath, R. W., Jr. (2003). Grassmannian frames with applications to coding and communication. Applied and Computational Harmonic Analysis, 14(3), 257–275.
Article MathSciNet MATH Google Scholar
Szegedy, C., Toshev, A., & Erhan, D. (2013). Deep neural networks for object detection. Advances in Neural Information Processing Systems., pp. 2553–2561
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In The IEEE / CVF Computer Vision and Pattern Recognition Conference., pp. 2818–2826
Tammes, P. M. L. (1930). On the origin of number and arrangement of the places of exit on the surface of pollen-grains. Recueil Des Travaux Botaniques Néerlandais, 27(1), 1–84.
Google Scholar
Tulyakov, S., Jaeger, S., Govindaraju, V., & Doermann, D. (2008). Review of classifier combination methods. Machine learning in document analysis and recognition. pp. 361–386
van Lint, J. H., & Seidel, J. J. (1966). Equilateral point sets in elliptic geometry. Indagationes Mathematicae, 28(3), 335–348.
Article MathSciNet MATH Google Scholar
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research9(11)
Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P., & Belongie, S. (2018). The iNaturalist species classification and detection dataset. In: The IEEE / CVF Computer Vision and Pattern Recognition Conference, pp. 8769–8778
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, pp. 6000–6010
Vryniotis, V. (2021). How to train State-of-The-Art models using TorchVision’s latest primitives. https://pytorch.org/blog/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives/
Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The Caltech-UCSD birds-200-2011 dataset. Tech. Report CNS-TR-2011-001
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., & Wu, Y. (2014). Learning fine-grained image similarity with deep ranking. In The IEEE / CVF Computer Vision and Pattern Recognition Conference, pp. 1386–1393
Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., Li, Z., & Liu, W. (2018). Cosface: Large margin cosine loss for deep face recognition. In: The IEEE / CVF Computer Vision and Pattern Recognition Conference, pp. 5265–5274
Wang, F., Xiang, X., Cheng, J., & Yuille, A.L. (2017). Normface: L2 hypersphere embedding for face verification. In: ACM International Conference Multimedia, pp. 1041–1049
Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10(2), 207–244.
MATH Google Scholar
Wei, X. S., Song, Y. Z., Mac Aodha, O., Wu, J., Peng, Y., Tang, J., Yang, J., & Belongie, S. (2022). Fine-grained image analysis with deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12), 8927–8948.
Article Google Scholar
Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In European Conference Computer Vision, pp. 499–515
Wightman, R., Touvron, H., & Jégou, H. (2021). ResNet strikes back: An improved training procedure in timm. arXiv preprint arXiv:2110.00476
Wu, Z., Xiong, Y., Yu, S.X., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. In The IEEE / CVF Computer Vision and Pattern Recognition Conference, pp. 3733–3742
Xiao, T., Liu, Y., Zhou, B., Jiang, Y., & Sun, J. (2018). Unified perceptual parsing for scene understanding. In European Conference Computer Vision., pp. 418–434
Xu, P., Ehinger, K.A., Zhang, Y., Finkelstein, A., Kulkarni, S.R., & Xiao, J. (2015). Turkergaze: Crowdsourcing saliency with webcam based eye tracking. arXiv preprint arXiv:1504.06755
Xu, W., Xian, Y., Wang, J., Schiele, B., & Akata, Z. (2022). Attribute prototype network for any-shot learning. International Journal of Computer Vision, 130(7), 1735–1753.
Article MATH Google Scholar
Yang, Y., Xie, L., Chen, S., Li, X., Lin, Z., & Tao, D. (2022). Do we really need a learnable classifier at the end of deep neural network? arXiv preprint arXiv:2203.09081
Ye, H. J., Zhan, D. C., Jiang, Y., & Zhou, Z. H. (2022). Heterogeneous few-shot model rectification with semantic mapping. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11), 3878–3891.
Article MATH Google Scholar
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., & Xiao, J. (2015). Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., & Yoo, Y. (2019). CutMix: Regularization strategy to train strong classifiers with localizable features. In International Conference on Computer Vision, pp. 6023–6032
Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. arXiv preprint arXiv:1605.07146
Zhai, X., Kolesnikov, A., Houlsby, N., & Beyer, L. (2022). Scaling vision transformers. In The IEEE / CVF Computer Vision and Pattern Recognition Conference pp. 12104–12113
Zhang, H., Cisse, M., Dauphin, Y.N., & Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412
Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2020). Random erasing data augmentation. In: AAAI, pp. 13001–13008
Zhou, Z. H. (2016). Learnware: On the future of machine learning. Frontiers of Computer Science,10(4), 589–590.
Zhou, B., Cui, Q., Wei, X.S., & Chen, Z.M. (2020). BBN: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In The IEEE / CVF Computer Vision and Pattern Recognition Conference, pp. 9719–9728
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., & Torralba, A. (2017). Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence,40(6), 1452–1464.
Zhou, H.Y., Lu, C., Chen, C., Yang, S., & Yu, Y. (2023). A unified visual information preservation framework for self-supervised pre-training in medical image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7), 8020–8035.
Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., & Torralba, A. (2019). Semantic understanding of scenes through the ADE20k dataset. International Journal of Computer Vision, 127(3), 302–321.

Download references

Acknowledgements

The authors would like to thank the editor and the anonymous reviewers for their critical and constructive comments and suggestions. This work was supported by National Key R &D Program of China (2021YFA1001100), National Natural Science Foundation of China under Grant (62272231), and the Fundamental Research Funds for the Central Universities (4009002401).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Yang Shen & Xuhao Sun
School of Computer Science and Engineering and Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Southeast University, Nanjing, China
Xiu-Shen Wei
University of Toronto, Toronto, Canada
Anqi Xu
Qingdao AInnovation Technology Group Co., Ltd, Qingdao, China
Lingyan Gao

Authors

Yang Shen
View author publications
You can also search for this author in PubMed Google Scholar
Xuhao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xiu-Shen Wei
View author publications
You can also search for this author in PubMed Google Scholar
Anqi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lingyan Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiu-Shen Wei.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by Zhouchen Lin.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Proof of Why A Unit Hypersphere is Effective: Following Sects. 3.3 and 3.4, we have M sample-label pairs in N classes. $\varvec{W}\in \mathbb {R}^{d\times N}$ denotes the EBVs matrix while $\hat{\varvec{w}}_{i}\in \mathbb {R}^{d}$ represents each categorical EBV, where $i \in \{1,2,\ldots ,N\}$ and $\Vert \hat{\varvec{w}}_{i}\Vert =1$. Since we have already assumed that all samples are well-separated, we directly use $\hat{\varvec{w}}_{i}$ to represent the i-th class’ feature. Please allow me to emphasize once again that the categorical EBV for each class remains unchanged throughout the training process, which distinguishes it from those prototype-based methods.

As we assume that every class has the same sample number, therefore, the definition of the softmax loss is:

$$\begin{aligned} \mathcal {L}_{\texttt {softmax}} = -\frac{1}{N}\sum _{i=1}^N{\log \frac{e^{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{i}}}{\sum _{j=1}^N{e^{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{j}}}}}\,. \end{aligned}$$

(15)

Additionally, as we adopt a temperature hyper-parameter $\tau $ within the softmax loss, then its formulation is change into:

$$\begin{aligned} \mathcal {L}_{\mathcal {S}} = -\frac{1}{N}\sum _{i=1}^N{\log \frac{e^{\frac{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{i}}{\tau }}}{\sum _{j=1}^N {e^{\frac{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{j}}{\tau }}}}}\,. \end{aligned}$$

(16)

By dividing $e^{\frac{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{i}}{\tau }} = e^{\frac{1}{\tau }}$ from both the numerator and denominator:

$$\begin{aligned} \begin{aligned} \mathcal {L}_\mathcal {S}&= -\frac{1}{N}\sum _{i=1}^N {\log \frac{1}{1 + \sum _{j=1,j\ne i}^N {e^{\frac{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{j}}{\tau } - \frac{1}{\tau }}}}}\\&= \frac{1}{N}\sum _{i=1}^N {\log \left( {1 + \sum _{j=1,j\ne i}^N {e^{\frac{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{j} - 1 }{\tau }}}} \right) }\,. \end{aligned} \end{aligned}$$

(17)

Evidently, $f(x) = e^x$ is a convex function and $\frac{1}{N}\sum _{i=1}^N {e^{x_i}} \ge e^{\frac{1}{N}\sum _{i=1}^{N}{x_i}}$. Then we have:

$$\begin{aligned} \begin{aligned} \mathcal {L}_\mathcal {S} \ge \frac{1}{N}\sum _{i=1}^N {\log \left( {1 + (N-1) e^{\frac{1}{N-1} \sum \limits _{j=1,j\ne i}^N{( \frac{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{j} - 1 }{\tau } )}}}\right) }. \end{aligned} \end{aligned}$$

(18)

The equality holds if and only if all $\frac{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{j}}{\tau }, 1\le i < j \le N$ have the same value, i.e., features from different classes have the same distance. As it has been proved that with a fixed common angle, the maximum number of equiangular lines is linearly correlated with the dimension d as $d \rightarrow \infty $ (Jiang et al., 2021). Since we have consider a large number of categories, which is much bigger than the dimension d of feature, (e.g., the class number equals 1000 while the dimension of feature equals 100 in Table 3, the class number equals 100,000 while the dimension of feature equals 5000 in Table 9. Note that ‘EBVs Dimension’ equals the dimension of feature.) this equality actually cannot hold in practice. Following (Wang et al., 2017), we then take feature dimension into consideration and improve the previous inequality.

Similar with $f(x) = e^x$, the softplus function $s(x) = \log (1 + C e^x)$ is also a convex function when $C>0$, so that $\frac{1}{N}\sum _{i=1}^N{\log (1 + C e^{x_i})} \ge \log (1 + C e^{\frac{1}{N}\sum _{i=1}^{N}{x_i}})$, then we have:

$$\begin{aligned} \begin{aligned}&\mathcal {L}_\mathcal {S} \ge \log \left( 1+ \left( N-1 \right) e^{\frac{1}{N(N-1)}\sum \limits _{i=1}^N\sum \limits _{j=1,j\ne i}^n{(\frac{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{j} - 1}{\tau })}} \right) \\&= \log \left( 1+ \left( N-1 \right) e^{\left( \frac{1}{N(N-1)}\sum \limits _{i=1}^N\sum \limits _{j=1,j\ne i}^N{\frac{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{j}}{\tau }}\right) - \frac{1}{\tau }} \right) \,. \end{aligned} \end{aligned}$$

(19)

This equality holds if and only if $\forall \varvec{w}_i$, the sums of distances to other class’ weight $\sum _{j=1,j\ne i}^N{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{j}}$ are all the same.

Note that

$$\begin{aligned} \left| \left| \sum _{i=1}^N{\varvec{w}_i}\right| \right| _2^2 = N + \sum _{i=1}^N\sum _{j=1,j\ne i}^N{\varvec{w}_i^{\top } \varvec{w}_j}, \end{aligned}$$

(20)

so

$$\begin{aligned} \sum _{i=1}^N\sum _{j=1,j\ne i}^N{\frac{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{j}}{\tau }} \ge -\frac{N}{\tau }. \end{aligned}$$

(21)

The equality holds if and only if $\sum _{i=1}^N{\varvec{w}_i}=\textbf{0}$. Thus,

$$\begin{aligned} \begin{aligned} \mathcal {L}_\mathcal {S}&\ge \log \left( 1+ \left( N-1 \right) e^{-\frac{N}{\tau N(N-1)}-\frac{1}{\tau }}\right) \\&=\log \left( 1+ \left( N-1 \right) e^{-\frac{N}{\tau (N-1)}}\right) . \end{aligned} \end{aligned}$$

(22)

Taking $N=1000$ as an example, we have mentioned that $\tau $ is set as 0.07 in Sect. 4.1.1. Therefore the lower bound is around 0.00062. That is, the temperature hyper-parameter has already dealt with the problem that the softmax loss will be trapped at a very high value on training set if we normalize the features and weights to 1, and it will be fine to keep the predefined EBVs on the surface of a normalized hypersphere.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shen, Y., Sun, X., Wei, XS. et al. Equiangular Basis Vectors: A Novel Paradigm for Classification Tasks. Int J Comput Vis 133, 372–397 (2025). https://doi.org/10.1007/s11263-024-02189-2

Download citation

Received: 19 October 2023
Accepted: 12 July 2024
Published: 30 July 2024
Issue Date: January 2025
DOI: https://doi.org/10.1007/s11263-024-02189-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Equiangular Basis Vectors: A Novel Paradigm for Classification Tasks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hierarchical Image Representation Using Deep Network

Few-Shot Classification with Semantic Augmented Activators

Tensor-driven low-rank discriminant analysis for image set classification

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Equiangular Basis Vectors: A Novel Paradigm for Classification Tasks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hierarchical Image Representation Using Deep Network

Few-Shot Classification with Semantic Augmented Activators

Tensor-driven low-rank discriminant analysis for image set classification

Explore related subjects

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A

Appendix A

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation