Skip to main content

Partial Multi-label Learning via Constraint Clustering

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1965))

Included in the following conference series:

  • 435 Accesses

Abstract

Multi-label learning (MLL) refers to a learning task where each instance is associated with a set of labels. However, in most real-world applications, the labeling process is very expensive and time consuming. Partially multi-label learning (PML) refers to MLL where only a part of the labels are correctly annotated and the rest are false positive labels. The main purpose of PML is to learn and predict unseen multi-label data with less annotation cost. To address the ambiguities in the label set, existing popular PML research attempts to extract the label confidence for each candidate label. These methods mainly perform disambiguation by considering the correlation among labels or/and features. However, in PML because of noisy labels, the true correlation among labels is corrupted. These methods can be easily misled by noisy false-positive labels. In this paper, we propose Partial Multi-Label learning method via Constraint Clustering (PML-CC) to address PML based on the underlying structure of data. PML-CC gradually extracts high-confidence labels and then uses them to extract the rest labels. To find the high-confidence labels, it solves PML as a clustering task while considering extracted information from previous steps as constraints. In each step, PML-CC updates the extracted labels and uses them to extract the other labels. Experimental results show that our method successfully tackles PML tasks and outperforms the state-of-the-art methods on artificial and real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://mulan.sourceforge.netdatasets.html/ and https://meka.sourceforge.net/datasets.

References

  1. Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)

    Article  Google Scholar 

  2. Chen, A.I.A.: Fast distributed first-order methods. Ph.D. thesis, Massachusetts Institute of Technology (2012)

    Google Scholar 

  3. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  4. Gong, X., Yuan, D., Bao, W.: Partial multi-label learning via large margin nearest neighbour embeddings (2022)

    Google Scholar 

  5. Kolen, J.F., Hutcheson, T.: Reducing the time complexity of the fuzzy c-means algorithm. IEEE Trans. Fuzzy Syst. 10(2), 263–267 (2002)

    Article  Google Scholar 

  6. Li, Z., Lyu, G., Feng, S.: Partial multi-label learning via multi-subspace representation. In: IJCAI, pp. 2612–2618 (2020)

    Google Scholar 

  7. Lyu, G., Feng, S., Li, Y.: Partial multi-label learning via probabilistic graph matching mechanism. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 105–113 (2020)

    Google Scholar 

  8. Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)

    Article  MathSciNet  Google Scholar 

  9. Siahroudi, S.K., Kudenko, D.: An effective single-model learning for multi-label data. Expert Syst. Appl. 232, 120887 (2023)

    Article  Google Scholar 

  10. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  11. Sun, L., Feng, S., Wang, T., Lang, C., Jin, Y.: Partial multi-label learning by low-rank and sparse decomposition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5016–5023 (2019)

    Google Scholar 

  12. Wang, H., Liu, W., Zhao, Y., Zhang, C., Hu, T., Chen, G.: Discriminative and correlative partial multi-label learning. In: IJCAI, pp. 3691–3697 (2019)

    Google Scholar 

  13. Wang, R., Kwong, S., Wang, X., Jia, Y.: Active k-labelsets ensemble for multi-label classification. Pattern Recogn. 109, 107583 (2021)

    Article  Google Scholar 

  14. Xie, M.K., Huang, S.J.: Partial multi-label learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  15. Xie, M.K., Huang, S.J.: Partial multi-label learning with noisy label identification. IEEE Trans. Pattern Anal. Mach. Intell. 44, 3676–3687 (2021)

    Google Scholar 

  16. Xie, M.K., Sun, F., Huang, S.J.: Partial multi-label learning with meta disambiguation. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 1904–1912 (2021)

    Google Scholar 

  17. Xu, N., Liu, Y.P., Geng, X.: Label enhancement for label distribution learning. IEEE Trans. Knowl. Data Eng. 33(4), 1632–1643 (2019)

    Article  Google Scholar 

  18. Xu, N., Liu, Y.P., Geng, X.: Partial multi-label learning with label distribution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 6510–6517 (2020)

    Google Scholar 

  19. Yan, Y., Guo, Y.: Adversarial partial multi-label learning with label disambiguation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 10568–10576 (2021)

    Google Scholar 

  20. Yan, Y., Li, S., Feng, L.: Partial multi-label learning with mutual teaching. Knowl.-Based Syst. 212, 106624 (2021)

    Article  Google Scholar 

  21. Yu, G., et al.: Feature-induced partial multi-label learning. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 1398–1403. IEEE (2018)

    Google Scholar 

  22. Zhang, M., Fang, J.: Partial multi-label learning via credible label elicitation. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3587–3599 (2021)

    Article  Google Scholar 

  23. Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)

    Article  MATH  Google Scholar 

  24. Zhao, P., Zhao, S., Zhao, X., Liu, H., Ji, X.: Partial multi-label learning based on sparse asymmetric label correlations. Knowl.-Based Syst. 245, 108601 (2022)

    Article  Google Scholar 

Download references

Acknowledgment

This work has been partially supported by the Volkswagen foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sajjad Kamali Siahroudi .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Proof of Formula

In this section, the detail of the optimization of Eq. (1) is given. The goal of this optimization is to find the optimal value for cluster centers (\([Z]_{k\times m\times L}\)) and the fuzzy membership (\([U]_{k\times m \times L}\)). Where k is the number of classes for each label. Since each label is a binary class we set \(k=2\). m is the size of the feature and L is the number of labels. For making the optimization procedure easier for the reader, the procedure is described for a single label. Thus we consider cluster centers (\([Z]_{2\times m}\)) and the fuzzy membership (\([U]_{2\times m}\)) for only one label. For the rest of the labels, we repeat the procedure. The Eq. (1) does not have a close form solution. To solve this problem an alternating optimization approach is used. Equation (1) is a constraint non-linear optimization form. By using Lagrange multipliers the following function is obtained.

$$\begin{aligned} {\begin{matrix} &{}J (U,Z) = \sum _{j=1}^{2}\sum _{i=1}^{N}U_{ij}d_{ij}(x_i,z_j) +A_1 \sum _{j=1}^{2}\sum _{i=1}^{N}U_{ij}\log U_{ij} + A_2\sum _{m|X_m \in C^i}\sum _{n| X_n \in C^i \atop m\ne n}\sum _{k=1}^{2}\sum _{p=1 \atop p\ne k}^{K}U_{mk}U_{np}\\ &{} +A_3\sum _{m|X_m \in C^i}\sum _{n| X_n \in C^j \atop i\ne j}\sum _{k=1}^{2}U_{mk}U_{nk} +\sum _{i=1}^N \lambda _i ( \sum _{j=1}^2 U_{ij}-1) \\ \end{matrix}} \end{aligned}$$
(11)

Lemma 1. The optimal value for \(U_{ij}\) when Z are fixed is equal to :

$$\begin{aligned} U_{ij}= \dfrac{\exp \left( \frac{-d_{ij}(x_i,z_j)-A_2\psi _{ij}-A_3\varPsi _{ij}}{A_1}\right) }{\sum _{p=1}^2\exp \left( \frac{-d_{ip}(x_i,z_p)-A_2\psi _{ip}-A_3\varPsi _{ip}}{A_1}\right) } \end{aligned}$$
(12)

Proof. To find the optimal value for each \(U_{ij}\) we take derivative of Eq. (11) respect to each \(U_{ij}\) and set it to zero as follows:

$$\begin{aligned} {\begin{matrix} &{}\frac{\partial J (U,Z)}{\partial U_{ij}}=0\\ &{}d_{ij}(x_i,z_j)+A_1(1+logU_{ij})+A_2(\sum \limits _{n|X_i \in C^m, X_n \in C^p \atop m\ne p} \sum \limits _{k=1 \atop \ne j}^{2} U_{nk})+A_3(\sum \limits _{n|X_i,X_n \in C^m}U_{nj})+\lambda _i=0 \end{matrix}} \end{aligned}$$
(13)

By setting \(\psi _{ij}\) and \(\varPsi _{ij} \) as follows:

$$\begin{aligned} {\begin{matrix} &\psi _{ij}= \sum \limits _{n|X_i \in C^m, X_n \in C^m \atop i\ne n} \sum \limits _{k=1 \atop k\ne j}^{K} U_{nk},\quad \varPsi _{ij} = \sum \limits _{n|X_i \in C^m ,X_n \in C^p \atop m\ne p}U_{nj } \end{matrix}} \end{aligned}$$
(14)

By solving Eq. (13), \(U_{ij}\) will be obtained as follows:

$$\begin{aligned} {\begin{matrix} U_{ij}=\exp (-1)\exp \left( \frac{-d_{ij}(x_i,z_j)-A_2\psi _{ij}-A_3\varPsi _{ij}}{A_1}\right) \exp (\frac{-\lambda _i}{A_1}) \end{matrix}} \end{aligned}$$
(15)

Since \(\sum _{j=1}^2=1\) the Lagrange multiplier can obtained as follows:

$$\begin{aligned} {\begin{matrix} &{}\sum _{j=1}^2\exp (-1)\exp \left( \frac{-d_{ij}(x_i,z_j)-A_2\psi _{ij}-A_3\varPsi _{ij}}{A_1}\right) \exp (\frac{-\lambda _i}{A_1})=\\ &{}\exp (-1)\exp (\frac{-\lambda _i}{A_1})\sum _{j=1}^2\exp \left( \frac{-d_{ij}(x_i,z_j)-A_2\psi _{ij}-A_3\varPsi _{ij}}{A_1}\right) =1\\ &{}=>\exp (\frac{-\lambda _i}{A_1})=\frac{1}{\exp (-1)\sum _{j=1}^2\exp \left( \frac{-d_{ij}(x_i,z_j)-A_2\psi _{ij}-A_3\varPsi _{ij}}{A_1}\right) } \end{matrix}} \end{aligned}$$
(16)

By substituting Eq. (16) in Eq. (15) the closed form solution for uij (Eq. (12)) will be obtained and completes the proof of lemma.

Lemma 2. If the U (fuzzy memberships) are fixed, the optimal value for Z (cluster centers) are equal to equation (17).

$$\begin{aligned} Z_{jp}=\frac{\sum _{i=1}^{N}U_{ij}x_{ip}}{\sum _{i=1}^{N} U_{ij}} \end{aligned}$$
(17)

Proof. Again the alternative approach is used. First, The U are fixed then the optimal values for Z is obtained by taking derivative of Eq. (11) respect to each cluster center and set it to zero.

$$\begin{aligned} {\begin{matrix} &\frac{\partial J (U,Z)}{\partial Z_{jp}}=0\quad => \sum _{i=1}^{N}2U_{ij}(Z_{jp}-x_{ip})=0\quad => Z_{jp}=\frac{\sum _{i=1}^{N}U_{ij}x_{ip}}{\sum _{i=1}^{N} U_{ij}} \end{matrix}} \end{aligned}$$
(18)

Lemma 3. U and Z are local optimum of J(UZ) if \(Z_{ij}\) and \(U_{ij}\) are calculated using Eq. (17) and (12) and \(A_1,A_2,A_3 >0\)

Proof. Let J(U) be J(UZ) when Z are fixed, J(Z) be J(UZ) when U are fixed and \(A_1,A_2,A_3 >0\). Then, the Hessian H(J(Z)) and H(J(U)) matrices are calculated as follows:

$$\begin{aligned} {\begin{matrix} h_{fg,ij}(J(U))=\frac{\partial }{\partial _{fg}} \frac{\partial J(U)}{\partial U_{ij}}= \Bigl \{ _{0, \qquad \quad otherwise}^{\frac{A_1}{U_{ij}},\qquad if f=i,g=j} \end{matrix}} \end{aligned}$$
(19)
$$\begin{aligned} {\begin{matrix} h_{fg,il}(J(Z))=\frac{\partial }{\partial _{fg}} \frac{\partial J(Z)}{\partial Z_{ip}}= \Bigl \{ _{0, \quad \quad \qquad otherwise}^{\sum _{j=1}^{2}2U_{ij},\quad if f=i,g=p} \end{matrix}} \end{aligned}$$
(20)

Equation (19) and (20) shows H(J(Z)) and H(J(U)) are diagonal matrices. Since \(A_1>0\) and \( 0< U_{ij} \le 1\), the Hessian matrices are positive definite. Thus Eq. (12) and (17) are sufficient conditions to minimize J(U) and J(Z).

1.2 Additional Excremental Result

Tables 4,5,6 show the performance of our proposed method in the term of ranking loss and coverage respectively.

Table 4. The result of our proposed method and other competitors in term of ranking loss on real-world and synthetic datasets (mean±standard deviation)
Table 5. The result of our proposed method and other competitors in term of one error on real-world and synthetic datasets (mean±standard deviation).
Table 6. The result of PML-CC and other competitors in term of coverage on real-world and synthetic datasets (mean±standard deviation).

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Siahroudi, S.K., Kudenko, D. (2024). Partial Multi-label Learning via Constraint Clustering. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1965. Springer, Singapore. https://doi.org/10.1007/978-981-99-8145-8_35

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8145-8_35

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8144-1

  • Online ISBN: 978-981-99-8145-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics