Active Learning Based Entity Resolution Using Markov Logic

Fisher, Jeffrey; Christen, Peter; Wang, Qing

doi:10.1007/978-3-319-31750-2_27

Active Learning Based Entity Resolution Using Markov Logic

Jeffrey Fisher¹⁹,
Peter Christen¹⁹ &
Qing Wang¹⁹

Conference paper
First Online: 12 April 2016

3315 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9652))

Abstract

Entity resolution is a common data cleaning and data integration problem that involves determining which records in one or more data sets refer to the same real-world entities. It has numerous applications for commercial, academic and government organisations. For most practical entity resolution applications, training data does not exist which limits the type of classification models that can be applied. This also prevents complex techniques such as Markov logic networks from being used on real-world problems. In this paper we apply an active learning based technique to generate training data for a Markov logic network based entity resolution model and learn the weights for the formulae in a Markov logic network. We evaluate our technique on real-world data sets and show that we can generate balanced training data and learn and also learn approximate weights for the formulae in the Markov logic network.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Arasu, A., Götz, M., Kaushik, R.: On active learning of record matching packages. In: ACM SIGMOD, pp. 783–794, Indianapolis (2010)
Google Scholar
Bellare, K., Iyengar, S., Parameswaran, A.G., Rastogi, V.: Active sampling for entity matching. In: ACM SIGKDD. ACM (2012)
Google Scholar
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM TKDD 1(1), 5 (2007)
Article Google Scholar
Christen, V.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Springer, Heidelberg (2012)
Book Google Scholar
Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. IEEE TKDE 24(9), 1537–1555 (2012)
Google Scholar
Christen, P., Vatsalan, D., Fu, Z.: Advanced record linkage methods and privacy aspects for population reconstruction - a survey and case studies. In: Bloothooft, G., Christen, P., Mandemakers, K., Schraagen, M. (eds.) Population Reconstruction, pp. 87–110. Springer, Switzerland (2015)
Chapter Google Scholar
Dal Bianco, G., Galante, R., Gonalves, M., Canuto, S., Heuser, C.: A practical and effective sampling selection strategy for large scale deduplication. IEEE KDE 27(9), 2305–2319 (2015)
Google Scholar
Du, J., Ling, C.: Active learning with human-like noisy oracle. In: IEEE ICDM, pp. 797–802 (2010)
Google Scholar
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE TKDE 19(1), 1–16 (2007)
Google Scholar
Fisher, J., Christen, P., Wang, Q., Rahm, V.: A clustering-based framework to control block sizes for entity resolution. In: ACM SIGKDD (2015)
Google Scholar
Fu, Z., Christen, P., Zhou, J.: A graph matching method for historical census household linkage. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014, Part I. LNCS, vol. 8443, pp. 485–496. Springer, Heidelberg (2014)
Chapter Google Scholar
Hernandez, M.A., Stolfo, S.J.: Real-world data is dirty: Data cleansing and the merge/purge problem. DMKD 2(1), 9–37 (1998)
Google Scholar
Huynh, T.N., Mooney, R.J.: Discriminative structure and parameter learning for Markov logic networks. In: ACM ICML (2008)
Google Scholar
Huynh, T.N., Mooney, R.J.: Online max-margin weight learning for Markov logic networks. In: SDM, pp. 642–651 (2011)
Google Scholar
Kalashnikov, D., Mehrotra, S.: Domain-independent data cleaning via analysis of entity-relationship graph. ACM TODS 31(2), 716–767 (2006)
Article Google Scholar
Kok, S., Domingos, P.: Learning the structure of Markov logic networks. In: ACM ICML (2005)
Google Scholar
Köpcke, H., Rahm, E.: Frameworks for entity matching: a comparison. Data Knowl. Eng. 69(2), 197–210 (2010)
Article Google Scholar
MacKay, D.J.: Information-based objective functions for active data selection. Neural Comput. 4(4), 590–604 (1992)
Article Google Scholar
Mihalkova, L., Mooney, R.: Learning to disambiguate search queries from short sessions. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 111–127. Springer, Heidelberg (2009)
Chapter Google Scholar
On, B.W., Elmacioglu, E., Lee, D., Kang, J., Pei, J.: Improving grouped-entity resolution using quasi-cliques. In: IEEE ICDM, pp. 1008–1015 (2006)
Google Scholar
Rastogi, V., Dalvi, N., Garofalakis, M.: Large-scale collective entity matching. VLDB Endowment 4, 208–218 (2011)
Article Google Scholar
Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1–2), 107–136 (2006)
Article Google Scholar
Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: ACM SIGKDD (2002)
Google Scholar
Settles, B.: Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin, Madison (2010)
Google Scholar
Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: ACL Empirical methods in NLP (2008)
Google Scholar
Singla, P., Domingos, P.: Discriminative training of Markov logic networks. AAAI 5, 868–873 (2005)
Google Scholar
Singla, P., Domingos, P.: Entity resolution with Markov logic. In: IEEE ICDM, pp. 572–582 (2006)
Google Scholar
Wang, J., Kraska, T., Franklin, M.J., Feng, J.: CrowdER: crowdsourcing entity resolution. Proc. VLDB Endow. 5(11), 1483–1494 (2012)
Article Google Scholar
Wang, Q., Vatsalan, D., Christen, P.: Efficient interactive training selection for large-scale entity resolution. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS, vol. 9078, pp. 562–573. Springer, Heidelberg (2015)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Research School of Computer Science, Australian National University, Canberra, ACT, 0200, Australia
Jeffrey Fisher, Peter Christen & Qing Wang

Authors

Jeffrey Fisher
View author publications
You can also search for this author in PubMed Google Scholar
Peter Christen
View author publications
You can also search for this author in PubMed Google Scholar
Qing Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeffrey Fisher .

Editor information

Editors and Affiliations

The University of Melbourne, Melbourne, Victoria, Australia
James Bailey
The University of Texas at Dallas, Richardson, Texas, USA
Latifur Khan
Osaka University, Osaka, Japan
Takashi Washio
University of Auckland, Auckland, New Zealand
Gill Dobbie
Shenzhen University, Shenzhen, China
Joshua Zhexue Huang
Massey University, Auckland, New Zealand
Ruili Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fisher, J., Christen, P., Wang, Q. (2016). Active Learning Based Entity Resolution Using Markov Logic. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-31750-2_27
Published: 12 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31749-6
Online ISBN: 978-3-319-31750-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics