Dynamic feature weighting for multi-label classification problems

Dialameh, Maryam; Hamzeh, Ali

doi:10.1007/s13748-021-00237-3

Dynamic feature weighting for multi-label classification problems

Regular Paper
Published: 09 March 2021

Volume 10, pages 283–295, (2021)
Cite this article

Progress in Artificial Intelligence Aims and scope Submit manuscript

310 Accesses
1 Citation
Explore all metrics

Abstract

This paper proposes a dynamic feature weighting approach for multi-label classification problems. The choice of dynamic weights plays a vital role in such problems because the assigned weight to each feature might be dependent on the query. To take this dependency into account, we optimize our previously proposed dynamic weighting function through a non-convex formulation, resulting in several interesting properties. Moreover, by minimizing the proposed objective function, the samples with similar label sets get closer to each other while getting far away from the dissimilar ones. In order to learn the parameters of the weighting functions, we propose an iterative gradient descent algorithm that minimizes the traditional leave-one-out error rate. We further embed the learned weighting function into one of the popular multi-label classifiers, namely ML-kNN, and evaluate its performance over a set of benchmark datasets. Moreover, a distributed implementation of the proposed method on Spark is suggested to address the computational complexity on large-scale datasets. Finally, we compare the obtained results with several related state-of-the-art methods. The experimental results illustrate that the proposed method consistently achieves superior performances compared to others.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Label Correlation Based Weighting Feature Selection Approach for Multi-label Data

Feature weighting to tackle label dependencies in multi-label stacking nearest neighbor

Article 07 January 2021

Cost-sensitive feature selection on multi-label data via neighborhood granularity and label enhancement

Article 30 October 2020

References

Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. In: European Conference on Machine Learning, pp. 217–226 (2004)
Kazawa, H., Izumitani, T., Taira, H., Maeda, E.: Maximal margin labeling for multi-topic text categorization. In: Advances in Neural Information Processing Systems, pp. 649–656 (2005)
Liu, S.M., Chen, J.-H.: A multi-label classification based approach for sentiment classification. Expert Syst. Appl. 42(3), 1083–1093 (2015)
Article Google Scholar
Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W.: Cnn-rnn: a unified framework for multi-label image classification. In: Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, pp. 2285–2294 (2016)
Sucar, L.E., Bielza, C., Morales, E.F., Hernandez-Leal, P., Zaragoza, J.H., Larrañaga, P.: Multi-label classification with Bayesian network-based chain classifiers. Pattern Recogn. Lett. 41, 14–22 (2014)
Article Google Scholar
Kumar, V., Pujari, A.K., Padmanabhan, V., Sahu, S.K., Kagita, V.R.: Multi-label classification using hierarchical embedding. Expert Syst. Appl. 91, 263–269 (2018)
Article Google Scholar
Bhatia, K., Jain, H., Kar, P., Varma M., Jain, P.: Sparse local embeddings for extreme multi-label classification. In: Advances in Neural Information Processing Systems, pp. 730–738 (2015).
Tong, X., Ozturk, P., Gu, M.: Dynamic feature weighting in nearest neighbor classifiers. In: Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on, vol. 4, pp. 2406–2411 (2004)
Dialameh, M., Jahromi, M.Z.: A general feature-weighting function for classification problems. Expert Syst. Appl. 72, 177–188 (2017)
Article Google Scholar
Dialameh, M., Jahromi, M.Z.: Dynamic feature weighting for imbalanced data sets. In: Signal Processing and Intelligent Systems Conference (SPIS), 2015, pp. 31–36 (2015)
SpolaôR, N., Cherman, E.A., Monard, M.C., Lee, H.D.: A comparison of multi-label feature selection methods using the problem transformation approach. Electron. Notes Theor. Comput. Sci. 292, 135–151 (2013)
Article Google Scholar
Kong, D., Ding, C., Huang, H., Zhao, H.: Multi-label relief and f-statistic feature selections for image annotation. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 2352–2359 (2012).
Lee, J., Kim, D.-W.: Memetic feature selection algorithm for multi-label classification. Inf. Sci. (Ny) 293, 80–96 (2015)
Article Google Scholar
Cai, Z., Zhu, W.: Multi-label feature selection via feature manifold learning and sparsity regularization. Int. J. Mach. Learn. Cybern. 9(8), 1321–1334 (2018)
Article Google Scholar
Hu, J., Li, Y., Gao, W., Zhang, P.: Robust multi-label feature selection with dual-graph regularization. Knowl-Based Syst. 203:106126 (2020). https://doi.org/10.1016/j.knosys.2020.106126
Zhang, J., Luo, Z., Li, C., Zhou, C., Li, S.: Manifold regularized discriminative feature selection for multi-label learning. Pattern Recogn. 95, 136–150 (2019)
Article Google Scholar
Sun, L., Feng, S., Wang, T., Lang, C., Jin, Y.: Partial multi-label learning by low-rank and sparse decomposition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5016–5023 (2019)
Zhu, P., Xu, Q., Hu, Q., Zhang, C., Zhao, H.: Multi-label feature selection with missing labels. Pattern Recogn. 74, 488–502 (2018)
Article Google Scholar
Huang, J., Qin, F., Zheng, X., Cheng, Z., Yuan, Z., Zhang, W., Huang, Q.: Improving multi-label classification with missing labels by learning label-specific features. Inf. Sci. (NY) 492, 124–146 (2019)
Article MathSciNet Google Scholar
Gonzalez-Lopez, J., Ventura, S., Cano, A.: Distributed multi-label feature selection using individual mutual information measures. Knowledge-Based Syst. 188, 105052 (2020)
Article Google Scholar
Gonzalez-Lopez, J., Ventura, S., Cano, A.: Distributed selection of continuous features in multilabel classification using mutual information. IEEE Trans. Neural Netw. Learn. Syst. 31(7), 2280–2293 (2020)
MathSciNet Google Scholar
Sun, Z., Zhang, J., Dai, L., Li, C., Zhou, C., Xin, J., Li, S.: Mutual information based multi-label feature selection via constrained convex optimization. Neurocomputing 329, 447–456 (2019)
Article Google Scholar
Zhang, M.-L., Zhou, Z.-H.: A k-nearest neighbor based algorithm for multi-label classification. In: Granular Computing, 2005 IEEE International Conference on, vol. 2, pp. 718–721 (2005)
Gonzalez-Lopez, J., Ventura, S., Cano, A.: Distributed nearest neighbor classification for large-scale multi-label data on spark. Futur. Gener. Comput. Syst. 87, 66–82 (2018)
Article Google Scholar
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333 (2011)
Article MathSciNet Google Scholar
Yu, Z., Hao, H., Zhang, W., Dai, H.: A classifier chain algorithm with K-means for multi-label classification on clouds. J. Signal Process. Syst. 86(2–3), 337–346 (2017)
Article Google Scholar
Gweon, H., Schonlau, M., Steiner, S.H.: Nearest labelset using double distances for multi-label classification. PeerJ Comput. Sci. 5, e242 (2019)
Article Google Scholar
Calvo-Zaragoza, J., Valero-Mas, J.J., Rico-Juan, J.R.: Improving kNN multi-label classification in prototype selection scenarios using class proposals. Pattern Recogn. 48(5), 1608–1622 (2015)
Article Google Scholar
Gouk, H., Pfahringer, B., Cree, M.J.: Learning distance metrics for multi-label classification. In: 8th Asian Conference on Machine Learning, vol. 63, pp. 318–333 (2016)
Xu, J.: Multi-label weighted k-nearest neighbor classifier with adaptive weight estimation. In: International Conference on Neural Information Processing, pp. 79–88 (2011).
Reyes, O., Morell, C., Ventura, S.: Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context. Neurocomputing 161, 168–182 (2015)
Article Google Scholar
Yang, Y., Ding, M.: Decision function with probability feature weighting based on Bayesian network for multi-label classification. Neural Comput. Appl. 31(9), 4819–4828 (2019)
Article Google Scholar
Paredes, R., Vidal, E.: Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recogn. 39(2), 180–188 (2006)
Article Google Scholar
Zhang, M.-L., Zhou, Z.-H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Article Google Scholar
Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Article Google Scholar
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D.B., Amde, M., Owen, S.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)
MathSciNet MATH Google Scholar
Shi, S., Chu, X., Li, B.: MG-WFBP: efficient data communication for distributed synchronous SGD algorithms. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 172–180 (2019)
Lian, X., Zhang, W., Zhang, C., Liu, J.: Asynchronous decentralized parallel stochastic gradient descent. In: International Conference on Machine Learning, pp. 3043–3052 (2018)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer, Berlin (2009)
Mattiussi, C., Waibel, M., Floreano, D.: Measures of diversity for populations and distances between individuals with highly reorganizable genomes. Evol. Comput. 12(4), 495–515 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
Maryam Dialameh & Ali Hamzeh

Authors

Maryam Dialameh
View author publications
You can also search for this author in PubMed Google Scholar
Ali Hamzeh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maryam Dialameh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1

The derivative of the $J$-function with respect to ${\text{MC}}_{{\text{j}}}^{2}$ (jth row of ${\text{MC}}^{2}$) is as follows:

$$ \frac{\partial J}{{\partial {\text{MC}}_{j}^{2} }} = \frac{1}{M}\mathop \sum \limits_{x \in X} {\mathbb{S}}_{b}^{^{\prime}} \left( {u_{x} } \right)\frac{{\partial u_{x} }}{{\partial {\text{MC}}_{j}^{2} }} $$

(21)

$$ \frac{{\partial u_{x} }}{{\partial {\text{MC}}_{j}^{2} }} = \frac{1}{{2d^{2} \left( {x,x^{ \ne } } \right)}}\left( {\frac{1}{{u_{x} }}\left( {\varphi_{j}^{ = } } \right)^{2} \odot \frac{{\partial f\left( {\varphi^{ = } } \right)}}{{\partial {\text{MC}}_{j}^{2} }} - u_{x} \left( {\varphi_{j}^{ \ne } } \right)^{2} \odot \frac{{\partial f\left( {\varphi^{ \ne } } \right)}}{{\partial {\text{MC}}_{j}^{2} }}} \right) $$

(22)

$$ \begin{aligned} \frac{{\partial f\left( {\varphi^{ = } } \right)}}{{\partial {\text{MC}}_{j}^{2} }} & = \frac{ - 1}{V}\left[ {{\text{MB}}_{j,1}^{2} {\text{FW}}_{j,1} \left( {\varphi_{j}^{ = } } \right)\left( {1 - {\mathcal{S}}_{j,1}^{2} \left( {\varphi_{j}^{ = } } \right)} \right), \ldots ,{\text{MB}}_{j,V}^{2} {\text{FW}}_{j,V} \left( {\varphi_{j}^{ = } } \right)\left( {1 - {\mathcal{S}}_{j,V}^{2} \left( {\varphi_{j}^{ = } } \right)} \right)} \right]^{{\text{T}}} \\ \frac{{\partial f\left( {\varphi^{ \ne } } \right)}}{{\partial {\text{MC}}_{j}^{2} }} & = \frac{ - 1}{V}\left[ {{\text{MB}}_{j,1}^{2} {\text{FW}}_{j,1} \left( {\varphi_{j}^{ \ne } } \right)\left( {1 - {\mathcal{S}}_{j,1}^{2} \left( {\varphi_{j}^{ \ne } } \right)} \right), \ldots ,{\text{MB}}_{j,V}^{2} {\text{FW}}_{j,V} \left( {\varphi_{j}^{ \ne } } \right)\left( {1 - {\mathcal{S}}_{j,V}^{2} \left( {\varphi_{j}^{ \ne } } \right)} \right)} \right]^{{\text{T}}} \\ \end{aligned} $$

(23)

The derivative of the $J$-function with respect to ${\text{MB}}_{j}^{1}$ (jth row of ${\text{MB}}^{1}$) is as follows:

$$ \frac{\partial J}{{\partial M{\mathbf{B}}_{j}^{1} }} = \frac{1}{M}\mathop \sum \limits_{x \in X} {\mathbb{S}}_{{\text{b}}}^{^{\prime}} \left( {u_{x} } \right)\frac{{\partial u_{x} }}{{\partial M{\mathbf{B}}_{j}^{1} }} $$

(24)

$$ \frac{{\partial u_{x} }}{{\partial M{\mathbf{B}}_{j}^{1} }} = \frac{1}{{2d^{2} \left( {x,x^{ \ne } } \right)}}\left( {\frac{1}{{u_{x} }}\left( {\varphi_{j}^{ = } } \right)^{2} \odot \frac{{\partial f\left( {\varphi^{ = } } \right)}}{{\partial M{\mathbf{B}}_{j}^{1} }} - u_{x} \left( {\varphi_{j}^{ \ne } } \right)^{2} \odot \frac{{\partial f\left( {\varphi^{ \ne } } \right)}}{{\partial M{\mathbf{B}}_{j}^{1} }}} \right) $$

(25)

$$ \frac{{\partial f\left( {\varphi^{ = } } \right)}}{{\partial M{\mathbf{B}}_{j}^{1} }} = \frac{ - 1}{V}\left[ {\left( {{\text{MC}}_{j,1}^{1} - \varphi_{j}^{ = } } \right){\mathcal{S}}_{j,1}^{1} \left( {\varphi_{j}^{ = } } \right)\left( {1 - {\mathcal{S}}_{j,1}^{1} \left( {\varphi_{j}^{ = } } \right)} \right){\mathcal{S}}_{j,1}^{2} \left( {\varphi_{j}^{ = } } \right), \ldots ,\left( {{\text{MC}}_{{j,{\mathbf{V}}}}^{1} - \varphi_{j}^{ = } } \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{1} \left( {\varphi_{j}^{ = } } \right)\left( {1 - {\mathcal{S}}_{{j,{\mathbf{V}}}}^{1} \left( {\varphi_{j}^{ = } } \right)} \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{2} \left( {\varphi_{j}^{ = } } \right)} \right]^{{\text{T}}} $$

(26)

$$ \frac{{\partial f\left( {\varphi^{ \ne } } \right)}}{{\partial M{\mathbf{B}}_{j}^{1} }} = \frac{ - 1}{V}\left[ {\left( {{\text{MC}}_{j,1}^{1} - \varphi_{j}^{ \ne } } \right){\mathcal{S}}_{j,1}^{1} \left( {\varphi_{j}^{ \ne } } \right)\left( {1 - {\mathcal{S}}_{j,1}^{1} \left( {\varphi_{j}^{ \ne } } \right)} \right){\mathcal{S}}_{{{\text{j}},1}}^{2} \left( {\varphi_{j}^{ \ne } } \right), \ldots ,\left( {{\text{MC}}_{{j,{\mathbf{V}}}}^{1} - \varphi_{j}^{ \ne } } \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{1} \left( {\varphi_{j}^{ \ne } } \right)\left( {1 - {\mathcal{S}}_{{j,{\mathbf{V}}}}^{1} \left( {\varphi_{j}^{ \ne } } \right)} \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{2} \left( {\varphi_{j}^{ \ne } } \right)} \right]^{{\text{T}}} $$

(27)

The derivative of the $J$-function with respect to ${\text{MB}}_{{\text{j}}}^{2}$ (jth row of ${\text{MB}}^{2}$) is as follows:

$$ \frac{\partial J}{{\partial M{\mathbf{B}}_{j}^{2} }} = \frac{1}{M}\mathop \sum \limits_{x \in X} {\mathbb{S}}_{b}^{^{\prime}} \left( {u_{x} } \right)\frac{{\partial u_{x} }}{{\partial M{\mathbf{B}}_{j}^{2} }} $$

(28)

$$ \frac{{\partial u_{x} }}{{\partial M{\mathbf{B}}_{j}^{2} }} = \frac{1}{{2d^{2} \left( {x,x^{ \ne } } \right)}}\left( {\frac{1}{{u_{x} }}\left( {\varphi_{j}^{ = } } \right)^{2} \odot \frac{{\partial f\left( {\varphi^{ = } } \right)}}{{\partial M{\mathbf{B}}_{j}^{2} }} - u_{x} \left( {\varphi_{j}^{ \ne } } \right)^{2} \odot \frac{{\partial f\left( {\varphi^{ \ne } } \right)}}{{\partial M{\mathbf{B}}_{j}^{2} }}} \right) $$

(29)

$$ \frac{{\partial f\left( {\varphi^{ = } } \right)}}{{\partial M{\mathbf{B}}_{{\text{j}}}^{2} }} = \frac{ - 1}{V}\left[ {\left( {{\text{MC}}_{j,1}^{2} - \varphi_{j}^{ = } } \right){\mathcal{S}}_{j,1}^{2} \left( {\varphi_{j}^{ = } } \right)\left( {1 - {\mathcal{S}}_{j,1}^{2} \left( {\varphi_{j}^{ = } } \right)} \right){\mathcal{S}}_{j,1}^{1} \left( {\varphi_{j}^{ = } } \right), \ldots ,\left( {{\text{MC}}_{{j,{\mathbf{V}}}}^{2} - \varphi_{j}^{ = } } \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{2} \left( {\varphi_{j}^{ = } } \right)\left( {1 - {\mathcal{S}}_{{j,{\mathbf{V}}}}^{2} \left( {\varphi_{j}^{ = } } \right)} \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{1} \left( {\varphi_{j}^{ = } } \right)} \right]^{{\text{T}}} $$

(30)

$$ \frac{{\partial f\left( {\varphi^{ \ne } } \right)}}{{\partial M{\mathbf{B}}_{{\text{j}}}^{2} }} = \frac{ - 1}{V}\left[ {\left( {{\text{MC}}_{j,1}^{2} - \varphi_{j}^{ \ne } } \right){\mathcal{S}}_{j,1}^{2} \left( {\varphi_{j}^{ \ne } } \right)\left( {1 - {\mathcal{S}}_{j,1}^{2} \left( {\varphi_{j}^{ \ne } } \right)} \right){\mathcal{S}}_{j,1}^{1} \left( {\varphi_{j}^{ \ne } } \right), \ldots ,\left( {{\text{MC}}_{{j,{\mathbf{V}}}}^{2} - \varphi_{j}^{ \ne } } \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{2} \left( {\varphi_{j}^{ \ne } } \right)\left( {1 - {\mathcal{S}}_{{j,{\mathbf{V}}}}^{2} \left( {\varphi_{j}^{ \ne } } \right)} \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{1} \left( {\varphi_{j}^{ \ne } } \right)} \right]^{{\text{T}}}. $$

(31)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dialameh, M., Hamzeh, A. Dynamic feature weighting for multi-label classification problems. Prog Artif Intell 10, 283–295 (2021). https://doi.org/10.1007/s13748-021-00237-3

Download citation

Received: 03 December 2019
Accepted: 13 February 2021
Published: 09 March 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s13748-021-00237-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic feature weighting for multi-label classification problems

Abstract

Access this article

Similar content being viewed by others

A Label Correlation Based Weighting Feature Selection Approach for Multi-label Data

Feature weighting to tackle label dependencies in multi-label stacking nearest neighbor

Cost-sensitive feature selection on multi-label data via neighborhood granularity and label enhancement

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic feature weighting for multi-label classification problems

Abstract

Access this article

Similar content being viewed by others

A Label Correlation Based Weighting Feature Selection Approach for Multi-label Data

Feature weighting to tackle label dependencies in multi-label stacking nearest neighbor

Cost-sensitive feature selection on multi-label data via neighborhood granularity and label enhancement

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix 1

Appendix 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation