Partial Multi-label Learning via Constraint Clustering

Siahroudi, Sajjad Kamali; Kudenko, Daniel

doi:10.1007/978-981-99-8145-8_35

Sajjad Kamali Siahroudi¹⁰ &
Daniel Kudenko¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1965))

Included in the following conference series:

International Conference on Neural Information Processing

435 Accesses

Abstract

Multi-label learning (MLL) refers to a learning task where each instance is associated with a set of labels. However, in most real-world applications, the labeling process is very expensive and time consuming. Partially multi-label learning (PML) refers to MLL where only a part of the labels are correctly annotated and the rest are false positive labels. The main purpose of PML is to learn and predict unseen multi-label data with less annotation cost. To address the ambiguities in the label set, existing popular PML research attempts to extract the label confidence for each candidate label. These methods mainly perform disambiguation by considering the correlation among labels or/and features. However, in PML because of noisy labels, the true correlation among labels is corrupted. These methods can be easily misled by noisy false-positive labels. In this paper, we propose Partial Multi-Label learning method via Constraint Clustering (PML-CC) to address PML based on the underlying structure of data. PML-CC gradually extracts high-confidence labels and then uses them to extract the rest labels. To find the high-confidence labels, it solves PML as a clustering task while considering extracted information from previous steps as constraints. In each step, PML-CC updates the extracted labels and uses them to extract the other labels. Experimental results show that our method successfully tackles PML tasks and outperforms the state-of-the-art methods on artificial and real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://mulan.sourceforge.netdatasets.html/ and https://meka.sourceforge.net/datasets.

References

Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)
Article Google Scholar
Chen, A.I.A.: Fast distributed first-order methods. Ph.D. thesis, Massachusetts Institute of Technology (2012)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Gong, X., Yuan, D., Bao, W.: Partial multi-label learning via large margin nearest neighbour embeddings (2022)
Google Scholar
Kolen, J.F., Hutcheson, T.: Reducing the time complexity of the fuzzy c-means algorithm. IEEE Trans. Fuzzy Syst. 10(2), 263–267 (2002)
Article Google Scholar
Li, Z., Lyu, G., Feng, S.: Partial multi-label learning via multi-subspace representation. In: IJCAI, pp. 2612–2618 (2020)
Google Scholar
Lyu, G., Feng, S., Li, Y.: Partial multi-label learning via probabilistic graph matching mechanism. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 105–113 (2020)
Google Scholar
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)
Article MathSciNet Google Scholar
Siahroudi, S.K., Kudenko, D.: An effective single-model learning for multi-label data. Expert Syst. Appl. 232, 120887 (2023)
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Sun, L., Feng, S., Wang, T., Lang, C., Jin, Y.: Partial multi-label learning by low-rank and sparse decomposition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5016–5023 (2019)
Google Scholar
Wang, H., Liu, W., Zhao, Y., Zhang, C., Hu, T., Chen, G.: Discriminative and correlative partial multi-label learning. In: IJCAI, pp. 3691–3697 (2019)
Google Scholar
Wang, R., Kwong, S., Wang, X., Jia, Y.: Active k-labelsets ensemble for multi-label classification. Pattern Recogn. 109, 107583 (2021)
Article Google Scholar
Xie, M.K., Huang, S.J.: Partial multi-label learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Xie, M.K., Huang, S.J.: Partial multi-label learning with noisy label identification. IEEE Trans. Pattern Anal. Mach. Intell. 44, 3676–3687 (2021)
Google Scholar
Xie, M.K., Sun, F., Huang, S.J.: Partial multi-label learning with meta disambiguation. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 1904–1912 (2021)
Google Scholar
Xu, N., Liu, Y.P., Geng, X.: Label enhancement for label distribution learning. IEEE Trans. Knowl. Data Eng. 33(4), 1632–1643 (2019)
Article Google Scholar
Xu, N., Liu, Y.P., Geng, X.: Partial multi-label learning with label distribution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 6510–6517 (2020)
Google Scholar
Yan, Y., Guo, Y.: Adversarial partial multi-label learning with label disambiguation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 10568–10576 (2021)
Google Scholar
Yan, Y., Li, S., Feng, L.: Partial multi-label learning with mutual teaching. Knowl.-Based Syst. 212, 106624 (2021)
Article Google Scholar
Yu, G., et al.: Feature-induced partial multi-label learning. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 1398–1403. IEEE (2018)
Google Scholar
Zhang, M., Fang, J.: Partial multi-label learning via credible label elicitation. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3587–3599 (2021)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Article MATH Google Scholar
Zhao, P., Zhao, S., Zhao, X., Liu, H., Ji, X.: Partial multi-label learning based on sparse asymmetric label correlations. Knowl.-Based Syst. 245, 108601 (2022)
Article Google Scholar

Download references

Acknowledgment

This work has been partially supported by the Volkswagen foundation.

Author information

Authors and Affiliations

Leibniz University Hannover, L3S Research Center, Hannover, Germany
Sajjad Kamali Siahroudi & Daniel Kudenko

Authors

Sajjad Kamali Siahroudi
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Kudenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sajjad Kamali Siahroudi .

Editor information

Editors and Affiliations

School of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Appendix

1.1 Proof of Formula

In this section, the detail of the optimization of Eq. (1) is given. The goal of this optimization is to find the optimal value for cluster centers ($[Z]_{k\times m\times L}$) and the fuzzy membership ($[U]_{k\times m \times L}$). Where k is the number of classes for each label. Since each label is a binary class we set $k=2$. m is the size of the feature and L is the number of labels. For making the optimization procedure easier for the reader, the procedure is described for a single label. Thus we consider cluster centers ($[Z]_{2\times m}$) and the fuzzy membership ($[U]_{2\times m}$) for only one label. For the rest of the labels, we repeat the procedure. The Eq. (1) does not have a close form solution. To solve this problem an alternating optimization approach is used. Equation (1) is a constraint non-linear optimization form. By using Lagrange multipliers the following function is obtained.

$$\begin{aligned} {\begin{matrix} &{}J (U,Z) = \sum _{j=1}^{2}\sum _{i=1}^{N}U_{ij}d_{ij}(x_i,z_j) +A_1 \sum _{j=1}^{2}\sum _{i=1}^{N}U_{ij}\log U_{ij} + A_2\sum _{m|X_m \in C^i}\sum _{n| X_n \in C^i \atop m\ne n}\sum _{k=1}^{2}\sum _{p=1 \atop p\ne k}^{K}U_{mk}U_{np}\\ &{} +A_3\sum _{m|X_m \in C^i}\sum _{n| X_n \in C^j \atop i\ne j}\sum _{k=1}^{2}U_{mk}U_{nk} +\sum _{i=1}^N \lambda _i ( \sum _{j=1}^2 U_{ij}-1) \\ \end{matrix}} \end{aligned}$$

(11)

Lemma 1. The optimal value for $U_{ij}$ when Z are fixed is equal to :

$$\begin{aligned} U_{ij}= \dfrac{\exp \left( \frac{-d_{ij}(x_i,z_j)-A_2\psi _{ij}-A_3\varPsi _{ij}}{A_1}\right) }{\sum _{p=1}^2\exp \left( \frac{-d_{ip}(x_i,z_p)-A_2\psi _{ip}-A_3\varPsi _{ip}}{A_1}\right) } \end{aligned}$$

(12)

Proof. To find the optimal value for each $U_{ij}$ we take derivative of Eq. (11) respect to each $U_{ij}$ and set it to zero as follows:

$$\begin{aligned} {\begin{matrix} &{}\frac{\partial J (U,Z)}{\partial U_{ij}}=0\\ &{}d_{ij}(x_i,z_j)+A_1(1+logU_{ij})+A_2(\sum \limits _{n|X_i \in C^m, X_n \in C^p \atop m\ne p} \sum \limits _{k=1 \atop \ne j}^{2} U_{nk})+A_3(\sum \limits _{n|X_i,X_n \in C^m}U_{nj})+\lambda _i=0 \end{matrix}} \end{aligned}$$

(13)

By setting $\psi _{ij}$ and $\varPsi _{ij} $ as follows:

$$\begin{aligned} {\begin{matrix} &\psi _{ij}= \sum \limits _{n|X_i \in C^m, X_n \in C^m \atop i\ne n} \sum \limits _{k=1 \atop k\ne j}^{K} U_{nk},\quad \varPsi _{ij} = \sum \limits _{n|X_i \in C^m ,X_n \in C^p \atop m\ne p}U_{nj } \end{matrix}} \end{aligned}$$

(14)

By solving Eq. (13), $U_{ij}$ will be obtained as follows:

$$\begin{aligned} {\begin{matrix} U_{ij}=\exp (-1)\exp \left( \frac{-d_{ij}(x_i,z_j)-A_2\psi _{ij}-A_3\varPsi _{ij}}{A_1}\right) \exp (\frac{-\lambda _i}{A_1}) \end{matrix}} \end{aligned}$$

(15)

Since $\sum _{j=1}^2=1$ the Lagrange multiplier can obtained as follows:

$$\begin{aligned} {\begin{matrix} &{}\sum _{j=1}^2\exp (-1)\exp \left( \frac{-d_{ij}(x_i,z_j)-A_2\psi _{ij}-A_3\varPsi _{ij}}{A_1}\right) \exp (\frac{-\lambda _i}{A_1})=\\ &{}\exp (-1)\exp (\frac{-\lambda _i}{A_1})\sum _{j=1}^2\exp \left( \frac{-d_{ij}(x_i,z_j)-A_2\psi _{ij}-A_3\varPsi _{ij}}{A_1}\right) =1\\ &{}=>\exp (\frac{-\lambda _i}{A_1})=\frac{1}{\exp (-1)\sum _{j=1}^2\exp \left( \frac{-d_{ij}(x_i,z_j)-A_2\psi _{ij}-A_3\varPsi _{ij}}{A_1}\right) } \end{matrix}} \end{aligned}$$

(16)

By substituting Eq. (16) in Eq. (15) the closed form solution for uij (Eq. (12)) will be obtained and completes the proof of lemma.

Lemma 2. If the U (fuzzy memberships) are fixed, the optimal value for Z (cluster centers) are equal to equation (17).

$$\begin{aligned} Z_{jp}=\frac{\sum _{i=1}^{N}U_{ij}x_{ip}}{\sum _{i=1}^{N} U_{ij}} \end{aligned}$$

(17)

Proof. Again the alternative approach is used. First, The U are fixed then the optimal values for Z is obtained by taking derivative of Eq. (11) respect to each cluster center and set it to zero.

$$\begin{aligned} {\begin{matrix} &\frac{\partial J (U,Z)}{\partial Z_{jp}}=0\quad => \sum _{i=1}^{N}2U_{ij}(Z_{jp}-x_{ip})=0\quad => Z_{jp}=\frac{\sum _{i=1}^{N}U_{ij}x_{ip}}{\sum _{i=1}^{N} U_{ij}} \end{matrix}} \end{aligned}$$

(18)

Lemma 3. U and Z are local optimum of J(U, Z) if $Z_{ij}$ and $U_{ij}$ are calculated using Eq. (17) and (12) and $A_1,A_2,A_3 >0$

Proof. Let J(U) be J(U, Z) when Z are fixed, J(Z) be J(U, Z) when U are fixed and $A_1,A_2,A_3 >0$. Then, the Hessian H(J(Z)) and H(J(U)) matrices are calculated as follows:

$$\begin{aligned} {\begin{matrix} h_{fg,ij}(J(U))=\frac{\partial }{\partial _{fg}} \frac{\partial J(U)}{\partial U_{ij}}= \Bigl \{ _{0, \qquad \quad otherwise}^{\frac{A_1}{U_{ij}},\qquad if f=i,g=j} \end{matrix}} \end{aligned}$$

(19)

$$\begin{aligned} {\begin{matrix} h_{fg,il}(J(Z))=\frac{\partial }{\partial _{fg}} \frac{\partial J(Z)}{\partial Z_{ip}}= \Bigl \{ _{0, \quad \quad \qquad otherwise}^{\sum _{j=1}^{2}2U_{ij},\quad if f=i,g=p} \end{matrix}} \end{aligned}$$

(20)

Equation (19) and (20) shows H(J(Z)) and H(J(U)) are diagonal matrices. Since $A_1>0$ and $ 0< U_{ij} \le 1$, the Hessian matrices are positive definite. Thus Eq. (12) and (17) are sufficient conditions to minimize J(U) and J(Z).

1.2 Additional Excremental Result

Tables 4,5,6 show the performance of our proposed method in the term of ranking loss and coverage respectively.

Table 4. The result of our proposed method and other competitors in term of ranking loss on real-world and synthetic datasets (mean±standard deviation)

Full size table

Table 5. The result of our proposed method and other competitors in term of one error on real-world and synthetic datasets (mean±standard deviation).

Full size table

Table 6. The result of PML-CC and other competitors in term of coverage on real-world and synthetic datasets (mean±standard deviation).

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Siahroudi, S.K., Kudenko, D. (2024). Partial Multi-label Learning via Constraint Clustering. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1965. Springer, Singapore. https://doi.org/10.1007/978-981-99-8145-8_35

Download citation

DOI: https://doi.org/10.1007/978-981-99-8145-8_35
Published: 27 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8144-1
Online ISBN: 978-981-99-8145-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Partial Multi-label Learning via Constraint Clustering

Abstract

Access this chapter

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Proof of Formula

1.2 Additional Excremental Result

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation