Skip to main content

Active Learning Based Entity Resolution Using Markov Logic

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9652))

Abstract

Entity resolution is a common data cleaning and data integration problem that involves determining which records in one or more data sets refer to the same real-world entities. It has numerous applications for commercial, academic and government organisations. For most practical entity resolution applications, training data does not exist which limits the type of classification models that can be applied. This also prevents complex techniques such as Markov logic networks from being used on real-world problems. In this paper we apply an active learning based technique to generate training data for a Markov logic network based entity resolution model and learn the weights for the formulae in a Markov logic network. We evaluate our technique on real-world data sets and show that we can generate balanced training data and learn and also learn approximate weights for the formulae in the Markov logic network.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Arasu, A., Götz, M., Kaushik, R.: On active learning of record matching packages. In: ACM SIGMOD, pp. 783–794, Indianapolis (2010)

    Google Scholar 

  2. Bellare, K., Iyengar, S., Parameswaran, A.G., Rastogi, V.: Active sampling for entity matching. In: ACM SIGKDD. ACM (2012)

    Google Scholar 

  3. Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM TKDD 1(1), 5 (2007)

    Article  Google Scholar 

  4. Christen, V.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Springer, Heidelberg (2012)

    Book  Google Scholar 

  5. Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. IEEE TKDE 24(9), 1537–1555 (2012)

    Google Scholar 

  6. Christen, P., Vatsalan, D., Fu, Z.: Advanced record linkage methods and privacy aspects for population reconstruction - a survey and case studies. In: Bloothooft, G., Christen, P., Mandemakers, K., Schraagen, M. (eds.) Population Reconstruction, pp. 87–110. Springer, Switzerland (2015)

    Chapter  Google Scholar 

  7. Dal Bianco, G., Galante, R., Gonalves, M., Canuto, S., Heuser, C.: A practical and effective sampling selection strategy for large scale deduplication. IEEE KDE 27(9), 2305–2319 (2015)

    Google Scholar 

  8. Du, J., Ling, C.: Active learning with human-like noisy oracle. In: IEEE ICDM, pp. 797–802 (2010)

    Google Scholar 

  9. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE TKDE 19(1), 1–16 (2007)

    Google Scholar 

  10. Fisher, J., Christen, P., Wang, Q., Rahm, V.: A clustering-based framework to control block sizes for entity resolution. In: ACM SIGKDD (2015)

    Google Scholar 

  11. Fu, Z., Christen, P., Zhou, J.: A graph matching method for historical census household linkage. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014, Part I. LNCS, vol. 8443, pp. 485–496. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  12. Hernandez, M.A., Stolfo, S.J.: Real-world data is dirty: Data cleansing and the merge/purge problem. DMKD 2(1), 9–37 (1998)

    Google Scholar 

  13. Huynh, T.N., Mooney, R.J.: Discriminative structure and parameter learning for Markov logic networks. In: ACM ICML (2008)

    Google Scholar 

  14. Huynh, T.N., Mooney, R.J.: Online max-margin weight learning for Markov logic networks. In: SDM, pp. 642–651 (2011)

    Google Scholar 

  15. Kalashnikov, D., Mehrotra, S.: Domain-independent data cleaning via analysis of entity-relationship graph. ACM TODS 31(2), 716–767 (2006)

    Article  Google Scholar 

  16. Kok, S., Domingos, P.: Learning the structure of Markov logic networks. In: ACM ICML (2005)

    Google Scholar 

  17. Köpcke, H., Rahm, E.: Frameworks for entity matching: a comparison. Data Knowl. Eng. 69(2), 197–210 (2010)

    Article  Google Scholar 

  18. MacKay, D.J.: Information-based objective functions for active data selection. Neural Comput. 4(4), 590–604 (1992)

    Article  Google Scholar 

  19. Mihalkova, L., Mooney, R.: Learning to disambiguate search queries from short sessions. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 111–127. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  20. On, B.W., Elmacioglu, E., Lee, D., Kang, J., Pei, J.: Improving grouped-entity resolution using quasi-cliques. In: IEEE ICDM, pp. 1008–1015 (2006)

    Google Scholar 

  21. Rastogi, V., Dalvi, N., Garofalakis, M.: Large-scale collective entity matching. VLDB Endowment 4, 208–218 (2011)

    Article  Google Scholar 

  22. Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1–2), 107–136 (2006)

    Article  Google Scholar 

  23. Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: ACM SIGKDD (2002)

    Google Scholar 

  24. Settles, B.: Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin, Madison (2010)

    Google Scholar 

  25. Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: ACL Empirical methods in NLP (2008)

    Google Scholar 

  26. Singla, P., Domingos, P.: Discriminative training of Markov logic networks. AAAI 5, 868–873 (2005)

    Google Scholar 

  27. Singla, P., Domingos, P.: Entity resolution with Markov logic. In: IEEE ICDM, pp. 572–582 (2006)

    Google Scholar 

  28. Wang, J., Kraska, T., Franklin, M.J., Feng, J.: CrowdER: crowdsourcing entity resolution. Proc. VLDB Endow. 5(11), 1483–1494 (2012)

    Article  Google Scholar 

  29. Wang, Q., Vatsalan, D., Christen, P.: Efficient interactive training selection for large-scale entity resolution. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS, vol. 9078, pp. 562–573. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey Fisher .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Fisher, J., Christen, P., Wang, Q. (2016). Active Learning Based Entity Resolution Using Markov Logic. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31750-2_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31749-6

  • Online ISBN: 978-3-319-31750-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics