Skip to main content

Matrix Factorization for Identifying Noisy Labels of Multi-label Instances

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11013))

Abstract

Current effort on multi-label learning generally assumes that the given labels are noise-free. However, obtaining noise-free labels is quite difficult and often impractical. In this paper, we study how to identify a subset of relevant labels from a set of candidate ones given as annotations to instances, and introduce a matrix factorization based method called MF-INL. It first decomposes the original instance-label association matrix into two low-rank matrices using nonnegative matrix factorization with feature-based and label-based constraints to retain the geometric structure of instances and label correlations. MF-INL then reconstructs the association matrix using the product of the decomposed matrices, and identifies associations with the lowest confidence as noisy associations. An empirical study on real-world multi-label datasets with injected noisy labels shows that MF-INL can identify noisy labels more accurately than other related solutions and is robust to input parameters. We empirically demonstrate that both feature-based and label-based constraints contribute to boosting the performance of MF-INL.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://mulan.sourceforge.net/datasets-mlc.html.

References

  1. Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. JMLR 7(11), 2399–2434 (2006)

    MathSciNet  MATH  Google Scholar 

  2. Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. TPAMI 33(8), 1548–1560 (2011)

    Article  Google Scholar 

  3. Chen, Y., Lin, H.: Feature-aware label space dimension reduction for multi-label classification. In: NIPS, pp. 1529–1537 (2012)

    Google Scholar 

  4. Cour, T., Sapp, B., Taskar, B.: Learning from partial labels. JMLR 12(5), 1501–1536 (2011)

    MathSciNet  MATH  Google Scholar 

  5. Geng, X.: Label distribution learning. TKDE 28(7), 1734–1748 (2016)

    Google Scholar 

  6. Gibaja, E., Ventura, S.: A tutorial on multilabel learning. ACM Comput. Surv. 47(3), 52 (2015)

    Article  Google Scholar 

  7. Hansen, P.C., Jensen, S.H.: FIR filter representations of reduced-rank noise reduction. IEEE Trans. Signal Process. 46(6), 1737–1741 (1998)

    Article  Google Scholar 

  8. Hüllermeier, E., Beringer, J.: Learning from ambiguously labeled examples. Intell. Data Anal. 10(5), 419–439 (2006)

    MATH  Google Scholar 

  9. Jiang, L., Wang, D., Cai, Z., Jiang, S., Yan, X.: Scaling up the accuracy of k-nearest-neighbour classifiers: a Naïve-Bayes hybrid. Int. J. Comput. Appl. 31(1), 36–43 (2009)

    Google Scholar 

  10. Jiang, L., Cai, Z., Wang, D., Zhang, H.: Bayesian Citation-KNN with distance weighting. Int. J. Mach. Learn. Cybern. 5(2), 193–199 (2014)

    Article  Google Scholar 

  11. Jiang, L., Zhang, L., Li, C., Wu, J.: A correlation-based feature weighting filter for Naive Bayes. In: TKDE (2018). https://doi.org/10.1109/TKDE.2018.2836440

  12. Konstantinides, K., Natarajan, B., Yovanof, G.S.: Noise estimation and filtering using block-based singular value decomposition. IEEE Trans. Image Process. 6(3), 479–483 (1997)

    Article  Google Scholar 

  13. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: NIPS, pp. 556–562 (2001)

    Google Scholar 

  14. Li, C., Sheng, V.S., Jiang, L., Li, H.: Noise filtering to improve data and model quality for crowdsourcing. Knowl. Based Syst. 107, 96–103 (2016)

    Article  Google Scholar 

  15. Lin, Z., Ding, G., Hu, M., Wang, J.: Multi-label classification via feature-aware implicit label space encoding. In: ICML, pp. 325–333 (2014)

    Google Scholar 

  16. Liu, L., Dietterich, T.G.: A conditional multinomial mixture model for superset label learning. In: NIPS, pp. 548–556 (2012)

    Google Scholar 

  17. Meng, D., De La Torre, F.: Robust matrix factorization with unknown noise. In: ICCV, pp. 1337–1344 (2013)

    Google Scholar 

  18. Nam, J., Kim, J., Mencía, E.L., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classificationłrevisiting neural networks. In: ECML, pp. 437–452 (2014)

    Google Scholar 

  19. Sun, Y., Zhang, Y., Zhou, Z.: Multi-label learning with weak label. In: AAAI, pp. 593–598 (2010)

    Google Scholar 

  20. Tai, F., Lin, H.: Multilabel classification with principal label space transformation. Neural Comput. 24(9), 2508–2542 (2012)

    Article  MathSciNet  Google Scholar 

  21. Tang, C., Zhang, M.: Confidence-rated discriminative partial label learning. In: AAAI, pp. 2611–2617 (2017)

    Google Scholar 

  22. Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative review. JMLR 10, 66–71 (2009)

    Google Scholar 

  23. Wu, B., Lyu, S., Hu, B.G., Ji, Q.: Multi-label learning with missing labels for image annotation and facial action unit recognition. Pattern Recogn. 48(7), 2279–2289 (2015)

    Article  Google Scholar 

  24. Xu, C., Tao, D., Xu, C.: Robust extreme multi-label learning. In: KDD, pp. 1275–1284 (2016)

    Google Scholar 

  25. Yeh, C., Wu, W., Ko, W., Wang, Y.F.: Learning deep latent space for multi-label classification. In: AAAI, pp. 2838–2844 (2017)

    Google Scholar 

  26. Yu, F., Zhang, M.L.: Maximum margin partial label learning. Mach. Learn. 104(4), 573–593 (2017)

    Article  MathSciNet  Google Scholar 

  27. Yu, G., Domeniconi, C., Rangwala, H., Zhang, G.: Protein function prediction using dependence maximization. In: ECML/PKDD, pp. 574–589 (2013)

    Chapter  Google Scholar 

  28. Yu, G., Zhang, G., Rangwala, H., Domeniconi, C., Yu, Z.: Protein function prediction using weak-label learning. In: ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp. 202–209 (2012)

    Google Scholar 

  29. Zhang, J., Wu, X., Sheng, V.S.: Learning from crowdsourced labeled data: a survey. Artif. Intell. Rev. 46(4), 543–576 (2016)

    Article  Google Scholar 

  30. Zhang, L., Jiang, L., Li, C.: A new feature selection approach to Naive Bayes text classifiers. Int. J. Pattern Recogn. Artif. Intell. 30(02), 1650003 (2016)

    Article  MathSciNet  Google Scholar 

  31. Zhang, M., Yu, F.: Solving the partial label learning problem: an instance-based approach. In: IJCAI, pp. 4048–4054 (2015)

    Google Scholar 

  32. Zhang, M., Yu, F., Tang, C.: Disambiguation-free partial label learning. TKDE 29(10), 2155–2167 (2017)

    Google Scholar 

  33. Zhang, M., Zhang, K.: Multi-label learning by exploiting label dependency. In: KDD, pp. 999–1008 (2010)

    Google Scholar 

  34. Zhang, M., Zhou, B., Liu, X.: Partial label learning via feature-aware disambiguation. In: KDD, pp. 1335–1344 (2016)

    Google Scholar 

  35. Zhang, M., Zhou, Z.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)

    Article  Google Scholar 

  36. Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. TKDE 26(8), 1819–1837 (2014)

    Google Scholar 

Download references

Acknowledgments

This work is supported by Natural Science Foundation of China (61741217 and 61402378), Natural Science Foundation of CQ CSTC (cstc2016jcyjA0351), Open Research Project of Hubei Key Laboratory of Intelligent Geo-Information Processing (KLIGIP-2017A05) and Chongqing Graduate Student Research Innovation Project [No. CYS18089].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoxian Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, X., Yu, G., Domeniconi, C., Wang, J., Zhang, Z. (2018). Matrix Factorization for Identifying Noisy Labels of Multi-label Instances. In: Geng, X., Kang, BH. (eds) PRICAI 2018: Trends in Artificial Intelligence. PRICAI 2018. Lecture Notes in Computer Science(), vol 11013. Springer, Cham. https://doi.org/10.1007/978-3-319-97310-4_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-97310-4_58

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-97309-8

  • Online ISBN: 978-3-319-97310-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics