Matrix Factorization for Identifying Noisy Labels of Multi-label Instances

Chen, Xia; Yu, Guoxian; Domeniconi, Carlotta; Wang, Jun; Zhang, Zili

doi:10.1007/978-3-319-97310-4_58

Matrix Factorization for Identifying Noisy Labels of Multi-label Instances

Xia Chen ORCID: orcid.org/0000-0002-8223-5641¹⁵,
Guoxian Yu ORCID: orcid.org/0000-0002-1667-6705¹⁵,
Carlotta Domeniconi¹⁶,
Jun Wang ORCID: orcid.org/0000-0002-5890-0365¹⁵ &
…
Zili Zhang¹⁵

Conference paper
First Online: 27 July 2018

3711 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11013))

Abstract

Current effort on multi-label learning generally assumes that the given labels are noise-free. However, obtaining noise-free labels is quite difficult and often impractical. In this paper, we study how to identify a subset of relevant labels from a set of candidate ones given as annotations to instances, and introduce a matrix factorization based method called MF-INL. It first decomposes the original instance-label association matrix into two low-rank matrices using nonnegative matrix factorization with feature-based and label-based constraints to retain the geometric structure of instances and label correlations. MF-INL then reconstructs the association matrix using the product of the decomposed matrices, and identifies associations with the lowest confidence as noisy associations. An empirical study on real-world multi-label datasets with injected noisy labels shows that MF-INL can identify noisy labels more accurately than other related solutions and is robust to input parameters. We empirically demonstrate that both feature-based and label-based constraints contribute to boosting the performance of MF-INL.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://mulan.sourceforge.net/datasets-mlc.html.

References

Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. JMLR 7(11), 2399–2434 (2006)
MathSciNet MATH Google Scholar
Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. TPAMI 33(8), 1548–1560 (2011)
Article Google Scholar
Chen, Y., Lin, H.: Feature-aware label space dimension reduction for multi-label classification. In: NIPS, pp. 1529–1537 (2012)
Google Scholar
Cour, T., Sapp, B., Taskar, B.: Learning from partial labels. JMLR 12(5), 1501–1536 (2011)
MathSciNet MATH Google Scholar
Geng, X.: Label distribution learning. TKDE 28(7), 1734–1748 (2016)
Google Scholar
Gibaja, E., Ventura, S.: A tutorial on multilabel learning. ACM Comput. Surv. 47(3), 52 (2015)
Article Google Scholar
Hansen, P.C., Jensen, S.H.: FIR filter representations of reduced-rank noise reduction. IEEE Trans. Signal Process. 46(6), 1737–1741 (1998)
Article Google Scholar
Hüllermeier, E., Beringer, J.: Learning from ambiguously labeled examples. Intell. Data Anal. 10(5), 419–439 (2006)
MATH Google Scholar
Jiang, L., Wang, D., Cai, Z., Jiang, S., Yan, X.: Scaling up the accuracy of k-nearest-neighbour classifiers: a Naïve-Bayes hybrid. Int. J. Comput. Appl. 31(1), 36–43 (2009)
Google Scholar
Jiang, L., Cai, Z., Wang, D., Zhang, H.: Bayesian Citation-KNN with distance weighting. Int. J. Mach. Learn. Cybern. 5(2), 193–199 (2014)
Article Google Scholar
Jiang, L., Zhang, L., Li, C., Wu, J.: A correlation-based feature weighting filter for Naive Bayes. In: TKDE (2018). https://doi.org/10.1109/TKDE.2018.2836440
Konstantinides, K., Natarajan, B., Yovanof, G.S.: Noise estimation and filtering using block-based singular value decomposition. IEEE Trans. Image Process. 6(3), 479–483 (1997)
Article Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: NIPS, pp. 556–562 (2001)
Google Scholar
Li, C., Sheng, V.S., Jiang, L., Li, H.: Noise filtering to improve data and model quality for crowdsourcing. Knowl. Based Syst. 107, 96–103 (2016)
Article Google Scholar
Lin, Z., Ding, G., Hu, M., Wang, J.: Multi-label classification via feature-aware implicit label space encoding. In: ICML, pp. 325–333 (2014)
Google Scholar
Liu, L., Dietterich, T.G.: A conditional multinomial mixture model for superset label learning. In: NIPS, pp. 548–556 (2012)
Google Scholar
Meng, D., De La Torre, F.: Robust matrix factorization with unknown noise. In: ICCV, pp. 1337–1344 (2013)
Google Scholar
Nam, J., Kim, J., Mencía, E.L., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classificationłrevisiting neural networks. In: ECML, pp. 437–452 (2014)
Google Scholar
Sun, Y., Zhang, Y., Zhou, Z.: Multi-label learning with weak label. In: AAAI, pp. 593–598 (2010)
Google Scholar
Tai, F., Lin, H.: Multilabel classification with principal label space transformation. Neural Comput. 24(9), 2508–2542 (2012)
Article MathSciNet Google Scholar
Tang, C., Zhang, M.: Confidence-rated discriminative partial label learning. In: AAAI, pp. 2611–2617 (2017)
Google Scholar
Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative review. JMLR 10, 66–71 (2009)
Google Scholar
Wu, B., Lyu, S., Hu, B.G., Ji, Q.: Multi-label learning with missing labels for image annotation and facial action unit recognition. Pattern Recogn. 48(7), 2279–2289 (2015)
Article Google Scholar
Xu, C., Tao, D., Xu, C.: Robust extreme multi-label learning. In: KDD, pp. 1275–1284 (2016)
Google Scholar
Yeh, C., Wu, W., Ko, W., Wang, Y.F.: Learning deep latent space for multi-label classification. In: AAAI, pp. 2838–2844 (2017)
Google Scholar
Yu, F., Zhang, M.L.: Maximum margin partial label learning. Mach. Learn. 104(4), 573–593 (2017)
Article MathSciNet Google Scholar
Yu, G., Domeniconi, C., Rangwala, H., Zhang, G.: Protein function prediction using dependence maximization. In: ECML/PKDD, pp. 574–589 (2013)
Chapter Google Scholar
Yu, G., Zhang, G., Rangwala, H., Domeniconi, C., Yu, Z.: Protein function prediction using weak-label learning. In: ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp. 202–209 (2012)
Google Scholar
Zhang, J., Wu, X., Sheng, V.S.: Learning from crowdsourced labeled data: a survey. Artif. Intell. Rev. 46(4), 543–576 (2016)
Article Google Scholar
Zhang, L., Jiang, L., Li, C.: A new feature selection approach to Naive Bayes text classifiers. Int. J. Pattern Recogn. Artif. Intell. 30(02), 1650003 (2016)
Article MathSciNet Google Scholar
Zhang, M., Yu, F.: Solving the partial label learning problem: an instance-based approach. In: IJCAI, pp. 4048–4054 (2015)
Google Scholar
Zhang, M., Yu, F., Tang, C.: Disambiguation-free partial label learning. TKDE 29(10), 2155–2167 (2017)
Google Scholar
Zhang, M., Zhang, K.: Multi-label learning by exploiting label dependency. In: KDD, pp. 999–1008 (2010)
Google Scholar
Zhang, M., Zhou, B., Liu, X.: Partial label learning via feature-aware disambiguation. In: KDD, pp. 1335–1344 (2016)
Google Scholar
Zhang, M., Zhou, Z.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Article Google Scholar
Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. TKDE 26(8), 1819–1837 (2014)
Google Scholar

Download references

Acknowledgments

This work is supported by Natural Science Foundation of China (61741217 and 61402378), Natural Science Foundation of CQ CSTC (cstc2016jcyjA0351), Open Research Project of Hubei Key Laboratory of Intelligent Geo-Information Processing (KLIGIP-2017A05) and Chongqing Graduate Student Research Innovation Project [No. CYS18089].

Author information

Authors and Affiliations

College of Computer and Information Sciences, Southwest University, Chongqing, 400715, China
Xia Chen, Guoxian Yu, Jun Wang & Zili Zhang
Department of Computer Science, George Mason University, Fairfax, VA, 22030, USA
Carlotta Domeniconi

Authors

Xia Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guoxian Yu
View author publications
You can also search for this author in PubMed Google Scholar
Carlotta Domeniconi
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zili Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoxian Yu .

Editor information

Editors and Affiliations

Southeast University, Nanjing, China
Xin Geng
University of Tasmania, Hobart, Tasmania, Australia
Byeong-Ho Kang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, X., Yu, G., Domeniconi, C., Wang, J., Zhang, Z. (2018). Matrix Factorization for Identifying Noisy Labels of Multi-label Instances. In: Geng, X., Kang, BH. (eds) PRICAI 2018: Trends in Artificial Intelligence. PRICAI 2018. Lecture Notes in Computer Science(), vol 11013. Springer, Cham. https://doi.org/10.1007/978-3-319-97310-4_58

Download citation

DOI: https://doi.org/10.1007/978-3-319-97310-4_58
Published: 27 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97309-8
Online ISBN: 978-3-319-97310-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics