Abstract
In this paper, we propose Equiangular Basis Vectors (EBVs) as a novel training paradigm of deep learning for image classification tasks. Differing from prominent training paradigms, e.g., k-way classification layers (mapping the learned representations to the label space) and deep metric learning (quantifying sample similarity), our method generates normalized vector embeddings as "predefined classifiers", which act as the fixed learning targets corresponding to different categories. By minimizing the spherical distance of the embedding of an input between its categorical EBV in training, the predictions can be obtained by identifying the categorical EBV with the smallest distance during inference. More importantly, by directly adding EBVs corresponding to newly added categories of equal status on the basis of existing EBVs, our method exhibits strong scalability to deal with the large increase of training categories in open-environment machine learning. In experiments, we evaluate EBVs on diverse computer vision tasks with large-scale real-world datasets, including classification on ImageNet-1K, object detection on COCO, semantic segmentation on ADE20K, etc. We further collected a dataset consisting of 100,000 categories to validate the superior performance of EBVs when handling a large number of categories. Comprehensive experiments validate both the effectiveness and scalability of our EBVs. Our method won the first place in the 2022 DIGIX Global AI Challenge, code along with all associated logs are open-source and available at https://github.com/aassxun/Equiangular-Basis-Vectors.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
Image data in Tables 1, 3 and 4 were extracted from ImageNet-1K (Deng et al., 2009). Image data in Table 2 were extracted from ImageNet-1K (Deng et al., 2009), CUB200-2011 (Wah et al., 2011) and Aircraft (Maji et al., 2013). Image data in Tables 5 and 6 were extracted from COCO 2017 (Lin et al., 2014). Image data in Tables 7 and 8 were extracted from ADE20K (Zhou et al., 2019). Image data in Table 9 were extracted from the citizen science website \({\textrm{iNaturalist}}\) (www.inaturalist.org). Image data in Table 10 were extracted from CIFAR-10 and CIFAR-100 (Cao et al., 2019). Image data in Table 11 were extracted from iNaturalist 2018 (Van Horn et al., 2018). Image data in Tables 12 and 13 were extracted from ISUN (Xu et al., 2015), Place365 (Zhou et al., 2017), Texture (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), LSUN-Crop (Yu et al., 2015) and LSUN-Resize (Yu et al., 2015).
References
Bao, H., Dong, L., Piao, S., & Wei, F. (2021). Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254
Bellet, A., Habrard, A., & Sebban, M. (2013). A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709
Cao, K., Wei, C., Gaidon, A., Arechiga, N., & Ma, T. (2019). Learning imbalanced datasets with label-distribution-aware margin loss. Advances in Neural Information Processing Systems, pp. 1567–1578
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., & Joulin, A. (2021). Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 9650–9660
Chapelle, O., Haffner, P., & Vapnik, V. N. (1999). Support vector machines for histogram-based image classification. IEEE Transactions on Neural Networks, 10(5), 1055–1064.
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, pp. 1597–1607
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., & Lin, D. (2019). MMDetection: Open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., & Vedaldi, A. (2014). Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3606–3613
Contributors, M. (2020). MMSegmentation: Open MMLab semantic segmentation toolbox and benchmark. Available online: https://github.com/open-mmlab/mmsegmentation (Retrieved on 18 May 2022)
Cortes, C., Mohri, M., & Rostamizadeh, A. (2012). Algorithms for learning kernels based on centered alignment. Journal of Machine Learning Research, 13, 795–828.
De Boer, P. T., Kroese, D. P., Mannor, S., & Rubinstein, R. Y. (2005). A tutorial on the cross-entropy method. Annals of Operations Research, 134(1), 19–67.
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition, pp. 248–255
Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4690–4699
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Elad, M. (2010). Sparse and redundant representations: from theory to applications in signal and image processing, vol. 2. Springer
Ericson, T., & Zinoviev, V. (2001). Codes on Euclidean spheres. Elsevier
Glazyrin, A., & Yu, W. H. (2018). Upper bounds for s-distance sets and equiangular lines. Advances in Mathematics, 330, 810–833.
Gretton, A., Fukumizu, K., Teo, C., Song, L., Schölkopf, B., & Smola, A. (2007). A kernel statistical test of independence. Advances in neural information processing systems, pp. 585–592
Guo, Y., Wang, X., Chen, Y., & Yu, S.X. (2022). Clipped hyperbolic classifiers are super-hyperbolic classifiers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11–20
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16000–16009
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In IEEE international conference on computer vision, pp. 2961–2969
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531
Hu, G., Xu, Z., Wang, G., Zeng, B., Liu, Y., & Lei, Y. (2021). Forecasting energy consumption of long-distance oil products pipeline based on improved fruit fly optimization algorithm and support vector regression. Energy, 224, 120153.
Jiang, Z., Tidor, J., Yao, Y., Zhang, S., & Zhao, Y. (2021). Equiangular lines with a fixed angle. Annals of Mathematics, 194(3), 729–743.
Johannes, H. (1948). Equilateral point-sets in elliptic two- and three-dimensional spaces. Nieuw Arch. Wiskunde, 22(2), 355–362.
Kaya, M., & Bilge, H. Ş. (2019). Symmetry. Deep metric learning: A survey, 11(9), 1066.
Kingma, D.P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Kirillov, A., Girshick, R., He, K., & Dollár, P. (2019). Panoptic feature pyramid networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6399–6408
Kolesnikov, A., Beyer, L., Zhai, X., Puigcerver, J., Yung, J., Gelly, S., & Houlsby, N. (2020). Big Transfer (BiT): General visual representation learning. In European Conference Computer Vision pp. 491–507. Springer
Kornblith, S., Norouzi, M., Lee, H., & Hinton, G. (2019). Similarity of neural network representations revisited. In International conference on machine learning, pp. 3519–3529
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C.L. (2014). Microsoft COCO: Common objects in context. In European Conference Computer Vision., pp. 740–755
Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al. (2022). Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12009–12019
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision, pp. 10012–10022
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A ConvNet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11976–11986
Liu, W., Wang, X., Owens, J., & Li, Y. (2020). Energy-based out-of-distribution detection. Adv. Neural Inform. Process. Syst., 33, 21464–21475.
Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101
Lu, D., & Weng, Q. (2007). A survey of image classification methods and techniques for improving classification performance. International Journal of Remote Sensing, 28(5), 823–870.
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151
Maxwell, A. E., Warner, T. A., & Fang, F. (2018). Implementation of machine-learning classification in remote sensing: An applied review. International Journal of Remote Sensing, 39(9), 2784–2817.
McCallum, A., Freitag, D., & Pereira, F.C. (2000). Maximum entropy markov models for information extraction and segmentation. In: International Conference on Machine Learning., pp. 591–598
Mensink, T., Verbeek, J., Perronnin, F., & Csurka, G. (2013). Distance-based image classification: Generalizing to new classes at near-zero cost. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11), 2624–2637.
Mettes, P., Van der Pol, E., & Snoek, C. (2019). Hyperspherical prototype networks. Advances in neural information processing systems
Müller, S.G., & Hutter, F. (2021). Trivialaugment: Tuning-free yet state-of-the-art data augmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 774–782
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A.Y. (2011). Reading digits in natural images with unsupervised feature learning. Adv. Neural Inform. Process. Syst. Worksh., pp. 1–9
Pernici, F., Bruni, M., Baecchi, C., & Del Bimbo, A. (2021). Regular polytope networks. Adv. Neural Inform. Process. Syst., pp. 4373–4387
Ranjan, R., Castillo, C.D., & Chellappa, R. (2017). L2-constrained softmax loss for discriminative face verification. arXiv preprint arXiv:1703.09507
Rao, H., Leung, C., & Miao, C. (2023). Hierarchical skeleton meta-prototype contrastive learning with hard skeleton mining for unsupervised person re-identification. Int. J. Comput. Vis. pp. 1–23
Rawat, W., & Wang, Z. (2017). Deep convolutional neural networks for image classification: A comprehensive review. Neural Computation, 29(9), 2352–2449.
Renes, J. M., Blume-Kohout, R., Scott, A. J., & Caves, C. M. (2004). Symmetric informationally complete quantum measurements. Journal of Mathematical Physics, 45(6), 2171–2180.
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics pp. 400–407
Rudin, W. (1953). Principles of mathematical analysis
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, C. A., & Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D.(2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision., pp. 618–626
Shen, Y., Sun, X., & Wei, X.S. (2023). Equiangular basis vectors. arXiv preprint arXiv:2303.11637
Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. Advances in Neural Information Processing Systems, pp. 4080–4090
Strohmer, T., & Heath, R. W., Jr. (2003). Grassmannian frames with applications to coding and communication. Applied and Computational Harmonic Analysis, 14(3), 257–275.
Szegedy, C., Toshev, A., & Erhan, D. (2013). Deep neural networks for object detection. Advances in Neural Information Processing Systems., pp. 2553–2561
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In The IEEE / CVF Computer Vision and Pattern Recognition Conference., pp. 2818–2826
Tammes, P. M. L. (1930). On the origin of number and arrangement of the places of exit on the surface of pollen-grains. Recueil Des Travaux Botaniques Néerlandais, 27(1), 1–84.
Tulyakov, S., Jaeger, S., Govindaraju, V., & Doermann, D. (2008). Review of classifier combination methods. Machine learning in document analysis and recognition. pp. 361–386
van Lint, J. H., & Seidel, J. J. (1966). Equilateral point sets in elliptic geometry. Indagationes Mathematicae, 28(3), 335–348.
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research9(11)
Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P., & Belongie, S. (2018). The iNaturalist species classification and detection dataset. In: The IEEE / CVF Computer Vision and Pattern Recognition Conference, pp. 8769–8778
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, pp. 6000–6010
Vryniotis, V. (2021). How to train State-of-The-Art models using TorchVision’s latest primitives. https://pytorch.org/blog/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives/
Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The Caltech-UCSD birds-200-2011 dataset. Tech. Report CNS-TR-2011-001
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., & Wu, Y. (2014). Learning fine-grained image similarity with deep ranking. In The IEEE / CVF Computer Vision and Pattern Recognition Conference, pp. 1386–1393
Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., Li, Z., & Liu, W. (2018). Cosface: Large margin cosine loss for deep face recognition. In: The IEEE / CVF Computer Vision and Pattern Recognition Conference, pp. 5265–5274
Wang, F., Xiang, X., Cheng, J., & Yuille, A.L. (2017). Normface: L2 hypersphere embedding for face verification. In: ACM International Conference Multimedia, pp. 1041–1049
Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10(2), 207–244.
Wei, X. S., Song, Y. Z., Mac Aodha, O., Wu, J., Peng, Y., Tang, J., Yang, J., & Belongie, S. (2022). Fine-grained image analysis with deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12), 8927–8948.
Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In European Conference Computer Vision, pp. 499–515
Wightman, R., Touvron, H., & Jégou, H. (2021). ResNet strikes back: An improved training procedure in timm. arXiv preprint arXiv:2110.00476
Wu, Z., Xiong, Y., Yu, S.X., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. In The IEEE / CVF Computer Vision and Pattern Recognition Conference, pp. 3733–3742
Xiao, T., Liu, Y., Zhou, B., Jiang, Y., & Sun, J. (2018). Unified perceptual parsing for scene understanding. In European Conference Computer Vision., pp. 418–434
Xu, P., Ehinger, K.A., Zhang, Y., Finkelstein, A., Kulkarni, S.R., & Xiao, J. (2015). Turkergaze: Crowdsourcing saliency with webcam based eye tracking. arXiv preprint arXiv:1504.06755
Xu, W., Xian, Y., Wang, J., Schiele, B., & Akata, Z. (2022). Attribute prototype network for any-shot learning. International Journal of Computer Vision, 130(7), 1735–1753.
Yang, Y., Xie, L., Chen, S., Li, X., Lin, Z., & Tao, D. (2022). Do we really need a learnable classifier at the end of deep neural network? arXiv preprint arXiv:2203.09081
Ye, H. J., Zhan, D. C., Jiang, Y., & Zhou, Z. H. (2022). Heterogeneous few-shot model rectification with semantic mapping. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11), 3878–3891.
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., & Xiao, J. (2015). Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., & Yoo, Y. (2019). CutMix: Regularization strategy to train strong classifiers with localizable features. In International Conference on Computer Vision, pp. 6023–6032
Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. arXiv preprint arXiv:1605.07146
Zhai, X., Kolesnikov, A., Houlsby, N., & Beyer, L. (2022). Scaling vision transformers. In The IEEE / CVF Computer Vision and Pattern Recognition Conference pp. 12104–12113
Zhang, H., Cisse, M., Dauphin, Y.N., & Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412
Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2020). Random erasing data augmentation. In: AAAI, pp. 13001–13008
Zhou, Z. H. (2016). Learnware: On the future of machine learning. Frontiers of Computer Science,10(4), 589–590.
Zhou, B., Cui, Q., Wei, X.S., & Chen, Z.M. (2020). BBN: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In The IEEE / CVF Computer Vision and Pattern Recognition Conference, pp. 9719–9728
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., & Torralba, A. (2017). Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence,40(6), 1452–1464.
Zhou, H.Y., Lu, C., Chen, C., Yang, S., & Yu, Y. (2023). A unified visual information preservation framework for self-supervised pre-training in medical image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7), 8020–8035.
Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., & Torralba, A. (2019). Semantic understanding of scenes through the ADE20k dataset. International Journal of Computer Vision, 127(3), 302–321.
Acknowledgements
The authors would like to thank the editor and the anonymous reviewers for their critical and constructive comments and suggestions. This work was supported by National Key R &D Program of China (2021YFA1001100), National Natural Science Foundation of China under Grant (62272231), and the Fundamental Research Funds for the Central Universities (4009002401).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by Zhouchen Lin.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
Proof of Why A Unit Hypersphere is Effective: Following Sects. 3.3 and 3.4, we have M sample-label pairs in N classes. \(\varvec{W}\in \mathbb {R}^{d\times N}\) denotes the EBVs matrix while \(\hat{\varvec{w}}_{i}\in \mathbb {R}^{d}\) represents each categorical EBV, where \(i \in \{1,2,\ldots ,N\}\) and \(\Vert \hat{\varvec{w}}_{i}\Vert =1\). Since we have already assumed that all samples are well-separated, we directly use \(\hat{\varvec{w}}_{i}\) to represent the i-th class’ feature. Please allow me to emphasize once again that the categorical EBV for each class remains unchanged throughout the training process, which distinguishes it from those prototype-based methods.
As we assume that every class has the same sample number, therefore, the definition of the softmax loss is:
Additionally, as we adopt a temperature hyper-parameter \(\tau \) within the softmax loss, then its formulation is change into:
By dividing \(e^{\frac{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{i}}{\tau }} = e^{\frac{1}{\tau }}\) from both the numerator and denominator:
Evidently, \(f(x) = e^x\) is a convex function and \(\frac{1}{N}\sum _{i=1}^N {e^{x_i}} \ge e^{\frac{1}{N}\sum _{i=1}^{N}{x_i}}\). Then we have:
The equality holds if and only if all \(\frac{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{j}}{\tau }, 1\le i < j \le N\) have the same value, i.e., features from different classes have the same distance. As it has been proved that with a fixed common angle, the maximum number of equiangular lines is linearly correlated with the dimension d as \(d \rightarrow \infty \) (Jiang et al., 2021). Since we have consider a large number of categories, which is much bigger than the dimension d of feature, (e.g., the class number equals 1000 while the dimension of feature equals 100 in Table 3, the class number equals 100,000 while the dimension of feature equals 5000 in Table 9. Note that ‘EBVs Dimension’ equals the dimension of feature.) this equality actually cannot hold in practice. Following (Wang et al., 2017), we then take feature dimension into consideration and improve the previous inequality.
Similar with \(f(x) = e^x\), the softplus function \(s(x) = \log (1 + C e^x)\) is also a convex function when \(C>0\), so that \(\frac{1}{N}\sum _{i=1}^N{\log (1 + C e^{x_i})} \ge \log (1 + C e^{\frac{1}{N}\sum _{i=1}^{N}{x_i}})\), then we have:
This equality holds if and only if \(\forall \varvec{w}_i\), the sums of distances to other class’ weight \(\sum _{j=1,j\ne i}^N{\hat{\varvec{w}}_{i}^{\top } \hat{\varvec{w}}_{j}}\) are all the same.
Note that
so
The equality holds if and only if \(\sum _{i=1}^N{\varvec{w}_i}=\textbf{0}\). Thus,
Taking \(N=1000\) as an example, we have mentioned that \(\tau \) is set as 0.07 in Sect. 4.1.1. Therefore the lower bound is around 0.00062. That is, the temperature hyper-parameter has already dealt with the problem that the softmax loss will be trapped at a very high value on training set if we normalize the features and weights to 1, and it will be fine to keep the predefined EBVs on the surface of a normalized hypersphere.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shen, Y., Sun, X., Wei, XS. et al. Equiangular Basis Vectors: A Novel Paradigm for Classification Tasks. Int J Comput Vis 133, 372–397 (2025). https://doi.org/10.1007/s11263-024-02189-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-024-02189-2