Skip to main content

Using Knowledge Graph to Handle Label Imperfection

  • Conference paper
  • First Online:
Trends and Applications in Knowledge Discovery and Data Mining (PAKDD 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8643))

Included in the following conference series:

  • 2300 Accesses

Abstract

The performance of classification tasks extremely relies on data quality, while in real world label noises inevitably exists because of data entry errors, transmit errors and subjectivity of taggers. Different methods have been proposed to deal with label imperfection, including robust algorithms by avoid overfitting, filtering mechanism to remove noises and correction mechanism to revise noises. In this paper, we propose an approach based on knowledge graph to perceive and correct the label errors in training data. Experiments on a medical Q&A data set reveal that our knowledge graph based approach can be effective on promoting classification performance and data quality. The results as well show our approach can work in a relatively high noise level and be applied in other data mining tasks demanding deep understanding.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.google.com/insidesearch/features/search/knowledge.html

  2. 2.

    http://www.120ask.com

  3. 3.

    http://baike.baidu.com

  4. 4.

    http://www.wikipedia.org

References

  1. Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  2. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  3. Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 481–492. ACM (2012)

    Google Scholar 

  4. Zhang, Y.: Contextualizing consumer health information searching: an analysis of questions in a social Q&A community. In: Proceedings of the 1st ACM International Health Informatics Symposium, pp. 210–219. ACM (2010)

    Google Scholar 

  5. Kunz, H., Schaaf, T.: General and specific formalization approach for a balanced scorecard: an expert system with application in health care. Expert Syst. Appl. 38(3), 1947–1955 (2011)

    Article  Google Scholar 

  6. Zeng, X., Martinez, T.R.: An algorithm for correcting mislabeled data. Intell. Data Anal. 5(6), 491–502 (2001)

    MATH  Google Scholar 

  7. Wilson, D.R., Martinez, T.R.: Instance pruning techniques. In: ICML, vol. 97, pp. 403–411 (1997)

    Google Scholar 

  8. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)

    Article  MATH  Google Scholar 

  9. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 2(3), 408–421 (1972)

    Article  MATH  Google Scholar 

  10. Aha, D.W., Kibler, D.F.: Noise-tolerant instance-based learning algorithms. In: IJCAI, Citeseer, pp. 794–799 (1989)

    Google Scholar 

  11. Brodley, C.E., Friedl, M.A.: Identifying and eliminating mislabeled training instances. In: AAAI/IAAI, Citeseer, vol. 1, pp. 799–805 (1996)

    Google Scholar 

  12. Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data (2011). arXiv preprint arXiv:1106.0219

  13. Teng, C.M.: Evaluating noise correction. In: Mizoguchi, R., Slaney, J.K. (eds.) PRICAI 2000. LNCS, vol. 1886, pp. 188–198. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  14. Teng, C.M.: Polishing blemishes: Issues in data correction. IEEE Intell. Syst. 19(2), 34–39 (2004)

    Article  Google Scholar 

  15. Teng, C.M.: A comparison of noise handling techniques. In: FLAIRS Conference, pp. 269–273 (2001)

    Google Scholar 

  16. Li, J., Zhang, K., et al.: Keyword extraction based on tf/idf for chinese news document. Wuhan Univ. J. Nat. Sci. 12(5), 917–921 (2007)

    Article  Google Scholar 

  17. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML, vol. 97, pp. 412–420 (1997)

    Google Scholar 

  18. McCallum, A., Nigam, K., et al.: A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization, Citeseer, vol. 752, pp. 41–48 (1998)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the NSFC (No. 61272099, 61261160502 and 61202025), Shanghai Excellent Academic Leaders Plan (No. 11XD1402900), the Program for Changjiang Scholars and Innovative Research Team in University of China (IRT1158, PCSIRT), the Scientific Innovation Act of STCSM (No. 13511504200), Singapore NRF (CREATE E2S2), and the EU FP7 CLIMBER project (No. PIRSES-GA-2012-318939).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Liu, Y., Li, H., Chen, Y. (2014). Using Knowledge Graph to Handle Label Imperfection. In: Peng, WC., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8643. Springer, Cham. https://doi.org/10.1007/978-3-319-13186-3_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13186-3_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13185-6

  • Online ISBN: 978-3-319-13186-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics