Skip to main content
Log in

Dynamic feature weighting for multi-label classification problems

  • Regular Paper
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

This paper proposes a dynamic feature weighting approach for multi-label classification problems. The choice of dynamic weights plays a vital role in such problems because the assigned weight to each feature might be dependent on the query. To take this dependency into account, we optimize our previously proposed dynamic weighting function through a non-convex formulation, resulting in several interesting properties. Moreover, by minimizing the proposed objective function, the samples with similar label sets get closer to each other while getting far away from the dissimilar ones. In order to learn the parameters of the weighting functions, we propose an iterative gradient descent algorithm that minimizes the traditional leave-one-out error rate. We further embed the learned weighting function into one of the popular multi-label classifiers, namely ML-kNN, and evaluate its performance over a set of benchmark datasets. Moreover, a distributed implementation of the proposed method on Spark is suggested to address the computational complexity on large-scale datasets. Finally, we compare the obtained results with several related state-of-the-art methods. The experimental results illustrate that the proposed method consistently achieves superior performances compared to others.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. In: European Conference on Machine Learning, pp. 217–226 (2004)

  2. Kazawa, H., Izumitani, T., Taira, H., Maeda, E.: Maximal margin labeling for multi-topic text categorization. In: Advances in Neural Information Processing Systems, pp. 649–656 (2005)

  3. Liu, S.M., Chen, J.-H.: A multi-label classification based approach for sentiment classification. Expert Syst. Appl. 42(3), 1083–1093 (2015)

    Article  Google Scholar 

  4. Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W.: Cnn-rnn: a unified framework for multi-label image classification. In: Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, pp. 2285–2294 (2016)

  5. Sucar, L.E., Bielza, C., Morales, E.F., Hernandez-Leal, P., Zaragoza, J.H., Larrañaga, P.: Multi-label classification with Bayesian network-based chain classifiers. Pattern Recogn. Lett. 41, 14–22 (2014)

    Article  Google Scholar 

  6. Kumar, V., Pujari, A.K., Padmanabhan, V., Sahu, S.K., Kagita, V.R.: Multi-label classification using hierarchical embedding. Expert Syst. Appl. 91, 263–269 (2018)

    Article  Google Scholar 

  7. Bhatia, K., Jain, H., Kar, P., Varma M., Jain, P.: Sparse local embeddings for extreme multi-label classification. In: Advances in Neural Information Processing Systems, pp. 730–738 (2015).

  8. Tong, X., Ozturk, P., Gu, M.: Dynamic feature weighting in nearest neighbor classifiers. In: Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on, vol. 4, pp. 2406–2411 (2004)

  9. Dialameh, M., Jahromi, M.Z.: A general feature-weighting function for classification problems. Expert Syst. Appl. 72, 177–188 (2017)

    Article  Google Scholar 

  10. Dialameh, M., Jahromi, M.Z.: Dynamic feature weighting for imbalanced data sets. In: Signal Processing and Intelligent Systems Conference (SPIS), 2015, pp. 31–36 (2015)

  11. SpolaôR, N., Cherman, E.A., Monard, M.C., Lee, H.D.: A comparison of multi-label feature selection methods using the problem transformation approach. Electron. Notes Theor. Comput. Sci. 292, 135–151 (2013)

    Article  Google Scholar 

  12. Kong, D., Ding, C., Huang, H., Zhao, H.: Multi-label relief and f-statistic feature selections for image annotation. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 2352–2359 (2012).

  13. Lee, J., Kim, D.-W.: Memetic feature selection algorithm for multi-label classification. Inf. Sci. (Ny) 293, 80–96 (2015)

    Article  Google Scholar 

  14. Cai, Z., Zhu, W.: Multi-label feature selection via feature manifold learning and sparsity regularization. Int. J. Mach. Learn. Cybern. 9(8), 1321–1334 (2018)

    Article  Google Scholar 

  15. Hu, J., Li, Y., Gao, W., Zhang, P.: Robust multi-label feature selection with dual-graph regularization. Knowl-Based Syst. 203:106126 (2020). https://doi.org/10.1016/j.knosys.2020.106126

  16. Zhang, J., Luo, Z., Li, C., Zhou, C., Li, S.: Manifold regularized discriminative feature selection for multi-label learning. Pattern Recogn. 95, 136–150 (2019)

    Article  Google Scholar 

  17. Sun, L., Feng, S., Wang, T., Lang, C., Jin, Y.: Partial multi-label learning by low-rank and sparse decomposition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5016–5023 (2019)

  18. Zhu, P., Xu, Q., Hu, Q., Zhang, C., Zhao, H.: Multi-label feature selection with missing labels. Pattern Recogn. 74, 488–502 (2018)

    Article  Google Scholar 

  19. Huang, J., Qin, F., Zheng, X., Cheng, Z., Yuan, Z., Zhang, W., Huang, Q.: Improving multi-label classification with missing labels by learning label-specific features. Inf. Sci. (NY) 492, 124–146 (2019)

    Article  MathSciNet  Google Scholar 

  20. Gonzalez-Lopez, J., Ventura, S., Cano, A.: Distributed multi-label feature selection using individual mutual information measures. Knowledge-Based Syst. 188, 105052 (2020)

    Article  Google Scholar 

  21. Gonzalez-Lopez, J., Ventura, S., Cano, A.: Distributed selection of continuous features in multilabel classification using mutual information. IEEE Trans. Neural Netw. Learn. Syst. 31(7), 2280–2293 (2020)

    MathSciNet  Google Scholar 

  22. Sun, Z., Zhang, J., Dai, L., Li, C., Zhou, C., Xin, J., Li, S.: Mutual information based multi-label feature selection via constrained convex optimization. Neurocomputing 329, 447–456 (2019)

    Article  Google Scholar 

  23. Zhang, M.-L., Zhou, Z.-H.: A k-nearest neighbor based algorithm for multi-label classification. In: Granular Computing, 2005 IEEE International Conference on, vol. 2, pp. 718–721 (2005)

  24. Gonzalez-Lopez, J., Ventura, S., Cano, A.: Distributed nearest neighbor classification for large-scale multi-label data on spark. Futur. Gener. Comput. Syst. 87, 66–82 (2018)

    Article  Google Scholar 

  25. Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333 (2011)

    Article  MathSciNet  Google Scholar 

  26. Yu, Z., Hao, H., Zhang, W., Dai, H.: A classifier chain algorithm with K-means for multi-label classification on clouds. J. Signal Process. Syst. 86(2–3), 337–346 (2017)

    Article  Google Scholar 

  27. Gweon, H., Schonlau, M., Steiner, S.H.: Nearest labelset using double distances for multi-label classification. PeerJ Comput. Sci. 5, e242 (2019)

    Article  Google Scholar 

  28. Calvo-Zaragoza, J., Valero-Mas, J.J., Rico-Juan, J.R.: Improving kNN multi-label classification in prototype selection scenarios using class proposals. Pattern Recogn. 48(5), 1608–1622 (2015)

    Article  Google Scholar 

  29. Gouk, H., Pfahringer, B., Cree, M.J.: Learning distance metrics for multi-label classification. In: 8th Asian Conference on Machine Learning, vol. 63, pp. 318–333 (2016)

  30. Xu, J.: Multi-label weighted k-nearest neighbor classifier with adaptive weight estimation. In: International Conference on Neural Information Processing, pp. 79–88 (2011).

  31. Reyes, O., Morell, C., Ventura, S.: Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context. Neurocomputing 161, 168–182 (2015)

    Article  Google Scholar 

  32. Yang, Y., Ding, M.: Decision function with probability feature weighting based on Bayesian network for multi-label classification. Neural Comput. Appl. 31(9), 4819–4828 (2019)

    Article  Google Scholar 

  33. Paredes, R., Vidal, E.: Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recogn. 39(2), 180–188 (2006)

    Article  Google Scholar 

  34. Zhang, M.-L., Zhou, Z.-H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)

    Article  Google Scholar 

  35. Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)

    Article  Google Scholar 

  36. Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D.B., Amde, M., Owen, S.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)

    MathSciNet  MATH  Google Scholar 

  37. Shi, S., Chu, X., Li, B.: MG-WFBP: efficient data communication for distributed synchronous SGD algorithms. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 172–180 (2019)

  38. Lian, X., Zhang, W., Zhang, C., Liu, J.: Asynchronous decentralized parallel stochastic gradient descent. In: International Conference on Machine Learning, pp. 3043–3052 (2018)

  39. Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer, Berlin (2009)

  40. Mattiussi, C., Waibel, M., Floreano, D.: Measures of diversity for populations and distances between individuals with highly reorganizable genomes. Evol. Comput. 12(4), 495–515 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maryam Dialameh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1

Appendix 1

The derivative of the \(J\)-function with respect to \({\text{MC}}_{{\text{j}}}^{2}\) (jth row of \({\text{MC}}^{2}\)) is as follows:

$$ \frac{\partial J}{{\partial {\text{MC}}_{j}^{2} }} = \frac{1}{M}\mathop \sum \limits_{x \in X} {\mathbb{S}}_{b}^{^{\prime}} \left( {u_{x} } \right)\frac{{\partial u_{x} }}{{\partial {\text{MC}}_{j}^{2} }} $$
(21)
$$ \frac{{\partial u_{x} }}{{\partial {\text{MC}}_{j}^{2} }} = \frac{1}{{2d^{2} \left( {x,x^{ \ne } } \right)}}\left( {\frac{1}{{u_{x} }}\left( {\varphi_{j}^{ = } } \right)^{2} \odot \frac{{\partial f\left( {\varphi^{ = } } \right)}}{{\partial {\text{MC}}_{j}^{2} }} - u_{x} \left( {\varphi_{j}^{ \ne } } \right)^{2} \odot \frac{{\partial f\left( {\varphi^{ \ne } } \right)}}{{\partial {\text{MC}}_{j}^{2} }}} \right) $$
(22)
$$ \begin{aligned} \frac{{\partial f\left( {\varphi^{ = } } \right)}}{{\partial {\text{MC}}_{j}^{2} }} & = \frac{ - 1}{V}\left[ {{\text{MB}}_{j,1}^{2} {\text{FW}}_{j,1} \left( {\varphi_{j}^{ = } } \right)\left( {1 - {\mathcal{S}}_{j,1}^{2} \left( {\varphi_{j}^{ = } } \right)} \right), \ldots ,{\text{MB}}_{j,V}^{2} {\text{FW}}_{j,V} \left( {\varphi_{j}^{ = } } \right)\left( {1 - {\mathcal{S}}_{j,V}^{2} \left( {\varphi_{j}^{ = } } \right)} \right)} \right]^{{\text{T}}} \\ \frac{{\partial f\left( {\varphi^{ \ne } } \right)}}{{\partial {\text{MC}}_{j}^{2} }} & = \frac{ - 1}{V}\left[ {{\text{MB}}_{j,1}^{2} {\text{FW}}_{j,1} \left( {\varphi_{j}^{ \ne } } \right)\left( {1 - {\mathcal{S}}_{j,1}^{2} \left( {\varphi_{j}^{ \ne } } \right)} \right), \ldots ,{\text{MB}}_{j,V}^{2} {\text{FW}}_{j,V} \left( {\varphi_{j}^{ \ne } } \right)\left( {1 - {\mathcal{S}}_{j,V}^{2} \left( {\varphi_{j}^{ \ne } } \right)} \right)} \right]^{{\text{T}}} \\ \end{aligned} $$
(23)

The derivative of the \(J\)-function with respect to \({\text{MB}}_{j}^{1}\) (jth row of \({\text{MB}}^{1}\)) is as follows:

$$ \frac{\partial J}{{\partial M{\mathbf{B}}_{j}^{1} }} = \frac{1}{M}\mathop \sum \limits_{x \in X} {\mathbb{S}}_{{\text{b}}}^{^{\prime}} \left( {u_{x} } \right)\frac{{\partial u_{x} }}{{\partial M{\mathbf{B}}_{j}^{1} }} $$
(24)
$$ \frac{{\partial u_{x} }}{{\partial M{\mathbf{B}}_{j}^{1} }} = \frac{1}{{2d^{2} \left( {x,x^{ \ne } } \right)}}\left( {\frac{1}{{u_{x} }}\left( {\varphi_{j}^{ = } } \right)^{2} \odot \frac{{\partial f\left( {\varphi^{ = } } \right)}}{{\partial M{\mathbf{B}}_{j}^{1} }} - u_{x} \left( {\varphi_{j}^{ \ne } } \right)^{2} \odot \frac{{\partial f\left( {\varphi^{ \ne } } \right)}}{{\partial M{\mathbf{B}}_{j}^{1} }}} \right) $$
(25)
$$ \frac{{\partial f\left( {\varphi^{ = } } \right)}}{{\partial M{\mathbf{B}}_{j}^{1} }} = \frac{ - 1}{V}\left[ {\left( {{\text{MC}}_{j,1}^{1} - \varphi_{j}^{ = } } \right){\mathcal{S}}_{j,1}^{1} \left( {\varphi_{j}^{ = } } \right)\left( {1 - {\mathcal{S}}_{j,1}^{1} \left( {\varphi_{j}^{ = } } \right)} \right){\mathcal{S}}_{j,1}^{2} \left( {\varphi_{j}^{ = } } \right), \ldots ,\left( {{\text{MC}}_{{j,{\mathbf{V}}}}^{1} - \varphi_{j}^{ = } } \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{1} \left( {\varphi_{j}^{ = } } \right)\left( {1 - {\mathcal{S}}_{{j,{\mathbf{V}}}}^{1} \left( {\varphi_{j}^{ = } } \right)} \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{2} \left( {\varphi_{j}^{ = } } \right)} \right]^{{\text{T}}} $$
(26)
$$ \frac{{\partial f\left( {\varphi^{ \ne } } \right)}}{{\partial M{\mathbf{B}}_{j}^{1} }} = \frac{ - 1}{V}\left[ {\left( {{\text{MC}}_{j,1}^{1} - \varphi_{j}^{ \ne } } \right){\mathcal{S}}_{j,1}^{1} \left( {\varphi_{j}^{ \ne } } \right)\left( {1 - {\mathcal{S}}_{j,1}^{1} \left( {\varphi_{j}^{ \ne } } \right)} \right){\mathcal{S}}_{{{\text{j}},1}}^{2} \left( {\varphi_{j}^{ \ne } } \right), \ldots ,\left( {{\text{MC}}_{{j,{\mathbf{V}}}}^{1} - \varphi_{j}^{ \ne } } \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{1} \left( {\varphi_{j}^{ \ne } } \right)\left( {1 - {\mathcal{S}}_{{j,{\mathbf{V}}}}^{1} \left( {\varphi_{j}^{ \ne } } \right)} \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{2} \left( {\varphi_{j}^{ \ne } } \right)} \right]^{{\text{T}}} $$
(27)

The derivative of the \(J\)-function with respect to \({\text{MB}}_{{\text{j}}}^{2}\) (jth row of \({\text{MB}}^{2}\)) is as follows:

$$ \frac{\partial J}{{\partial M{\mathbf{B}}_{j}^{2} }} = \frac{1}{M}\mathop \sum \limits_{x \in X} {\mathbb{S}}_{b}^{^{\prime}} \left( {u_{x} } \right)\frac{{\partial u_{x} }}{{\partial M{\mathbf{B}}_{j}^{2} }} $$
(28)
$$ \frac{{\partial u_{x} }}{{\partial M{\mathbf{B}}_{j}^{2} }} = \frac{1}{{2d^{2} \left( {x,x^{ \ne } } \right)}}\left( {\frac{1}{{u_{x} }}\left( {\varphi_{j}^{ = } } \right)^{2} \odot \frac{{\partial f\left( {\varphi^{ = } } \right)}}{{\partial M{\mathbf{B}}_{j}^{2} }} - u_{x} \left( {\varphi_{j}^{ \ne } } \right)^{2} \odot \frac{{\partial f\left( {\varphi^{ \ne } } \right)}}{{\partial M{\mathbf{B}}_{j}^{2} }}} \right) $$
(29)
$$ \frac{{\partial f\left( {\varphi^{ = } } \right)}}{{\partial M{\mathbf{B}}_{{\text{j}}}^{2} }} = \frac{ - 1}{V}\left[ {\left( {{\text{MC}}_{j,1}^{2} - \varphi_{j}^{ = } } \right){\mathcal{S}}_{j,1}^{2} \left( {\varphi_{j}^{ = } } \right)\left( {1 - {\mathcal{S}}_{j,1}^{2} \left( {\varphi_{j}^{ = } } \right)} \right){\mathcal{S}}_{j,1}^{1} \left( {\varphi_{j}^{ = } } \right), \ldots ,\left( {{\text{MC}}_{{j,{\mathbf{V}}}}^{2} - \varphi_{j}^{ = } } \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{2} \left( {\varphi_{j}^{ = } } \right)\left( {1 - {\mathcal{S}}_{{j,{\mathbf{V}}}}^{2} \left( {\varphi_{j}^{ = } } \right)} \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{1} \left( {\varphi_{j}^{ = } } \right)} \right]^{{\text{T}}} $$
(30)
$$ \frac{{\partial f\left( {\varphi^{ \ne } } \right)}}{{\partial M{\mathbf{B}}_{{\text{j}}}^{2} }} = \frac{ - 1}{V}\left[ {\left( {{\text{MC}}_{j,1}^{2} - \varphi_{j}^{ \ne } } \right){\mathcal{S}}_{j,1}^{2} \left( {\varphi_{j}^{ \ne } } \right)\left( {1 - {\mathcal{S}}_{j,1}^{2} \left( {\varphi_{j}^{ \ne } } \right)} \right){\mathcal{S}}_{j,1}^{1} \left( {\varphi_{j}^{ \ne } } \right), \ldots ,\left( {{\text{MC}}_{{j,{\mathbf{V}}}}^{2} - \varphi_{j}^{ \ne } } \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{2} \left( {\varphi_{j}^{ \ne } } \right)\left( {1 - {\mathcal{S}}_{{j,{\mathbf{V}}}}^{2} \left( {\varphi_{j}^{ \ne } } \right)} \right){\mathcal{S}}_{{j,{\mathbf{V}}}}^{1} \left( {\varphi_{j}^{ \ne } } \right)} \right]^{{\text{T}}}. $$
(31)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dialameh, M., Hamzeh, A. Dynamic feature weighting for multi-label classification problems. Prog Artif Intell 10, 283–295 (2021). https://doi.org/10.1007/s13748-021-00237-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-021-00237-3

Keywords

Navigation