Skip to main content
Log in

Topological Features Based Entity Disambiguation

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

This work proposes an unsupervised topological features based entity disambiguation solution. Most existing studies leverage semantic information to resolve ambiguous references. However, the semantic information is not always accessible because of privacy or is too expensive to access. We consider the problem in a setting that only relationships between references are available. A structure similarity algorithm via random walk with restarts is proposed to measure the similarity of references. The disambiguation is regarded as a clustering problem and a family of graph walk based clustering algorithms are brought to group ambiguous references. We evaluate our solution extensively on two real datasets and show its advantage over two state-of-the-art approaches in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ferreira A A, Gonçalves M A, Laender A H. A brief survey of automatic methods for author name disambiguation. ACM SIGMOD Record, 2012, 41(2): 15-26.

  2. Han H, Giles L, Zha H, Li C, Tsioutsiouliklis K. Two supervised learning approaches for name disambiguation in author citations. In Proc. the 4th ACM/IEEE Joint-CS Conference on Digital Libraries, June 2004, pp.296-305.

  3. Han H, Zha H, Giles C L. Name disambiguation in author citations using a k-way spectral clustering method. In Proc. the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, June 2005, pp.334-343.

  4. Bhattacharya I, Getoor L. A latent Dirichlet model for unsupervised entity resolution. In Proc. the 2006 SIAM Conference on Data Mining, April 2006.

  5. Shu L, Long B, Meng W. A latent topic model for complete entity resolution. In Proc. the 25th IEEE International Conference on Data Engineering, March 29-April 2, 2009, pp.880-891.

  6. Song Y, Huang J, Councill I G, Li J, Giles C L. Efficient topic-based unsupervised name disambiguation. In Proc. the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, June 2007, pp.342-351.

  7. Kataria S S, Kumar K S, Rastogi R R, Sen P, Sengamedu S H. Entity disambiguation with hierarchical topic models. In Proc. the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2011, pp.1037-1045.

  8. Tang J, Fong A C M, Wang B, Zhang J. A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6): 975-987.

  9. Sen P. Collective context-aware topic models for entity disambiguation. In Proc. the 21st International Conference on World Wide Web, April 2012, pp.729-738.

  10. Cen L, Dragut E C, Si L, Ouzzani M. Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In Proc. the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 28-Aug. 1, 2013, pp.741-744.

  11. Li Y, Wang C, Han F, Han J, Roth D, Yan X. Mining evidences for named entity disambiguation. In Proc. the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2013, pp.1070-1078.

  12. Yang D, Shen D R, Yu G, Kou Y, Nie T Z. Query intent disambiguation of keyword-based semantic entity search in dataspaces. Journal of Computer Science and Technology, 2013, 28(2): 382-393.

  13. Malin B. Unsupervised name disambiguation via social network similarity. In Proc. the Workshop on Link Analysis, Counterterrorism, and Security at the 2005 SIAM International Conference on Data Mining, April 2005, pp.93-102.

  14. Hermansson L, Kerola T, Johansson F, Jethava V, Dubhashi D. Entity disambiguation in anonymized graphs using graph kernels. In Proc. the 22nd ACM International Conference on Information and Knowledge Management, October 2013, pp.1037-1046.

  15. Bekkerman R, McCallum A. Disambiguating web appearances of people in a social network. In Proc. the 14th International Conference on World Wide Web, May 2005, pp.463-470.

  16. Saha T K, Zhang B, Al Hasan M. Name disambiguation from link data in a collaboration graph. In Proc. the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, August 2014, pp.81-84. [17] Saha T K, Zhang B, Al Hasan M. Name disambiguation from link data in a collaboration graph using temporal and topological features. Social Network Analysis and Mining, 2015, 5(1): Article No. 11.

  17. Minkov E, Cohen W W, Ng A Y. Contextual search and name disambiguation in email using graphs. In Proc. the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval, August 2006, pp.27-34.

  18. Yin X, Han J, Yu P. Object distinction: Distinguishing objects with identical names. In Proc. the 23rd IEEE International Conference on Data Engineering, April 2007, pp.1242-1246.

  19. Bhattacharya I, Getoor L. Iterative record linkage for cleaning and integration. In Proc. the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, June 2004, pp.11-18.

  20. Wang X, Tang J, Cheng H, Yu P S. ADANA: Active name disambiguation. In Proc. the 11th IEEE International Conference on Data Mining, December 2011, pp.794-803.

  21. Aggarwal C C, Yu P S. A condensation approach to privacy preserving data mining. In Proc. the 9th International Conference on Extending Database Technology, March 2004, pp.183-199.

  22. Liu K, Das K, Grandison T, Kargupta H. Privacy-preserving data analysis on graphs and social networks. In Next Generation Data Mining, Kargupta H, Han J, Yu P S et al. (eds.), CRC Press, 2008, pp.419-437.

  23. Benjelloun O, Garcia-Molina H, Menestrina D, Su Q, Whang S E, Widom J. Swoosh: A generic approach to entity resolution. The International Journal on Very Large Data Bases, 2009, 18(1): 255-276.

  24. Jain A K, Murty M N, Flynn P J. Data clustering: A review. ACM Computing Surveys, 1999, 31(3): 264-323.

  25. Newman M E. Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 2005, 46(5): 323-351. [27] Clauset A, Shalizi C R, Newman M E. Power-law distributions in empirical data. SIAM Review, 2009, 51(4): 661-703.

  26. Lovász L. Random walks on graphs: A survey. In Combinatorics: Paul Erdos is Eighty, Milos D, Sos V T, Szony T (eds.), Janos Bolyai Mathematical Society, 1996, pp.353-398.

  27. Macropol K, Can T, Singh A K. RRW: Repeated random walks on genome-scale protein networks for local cluster discovery. BMC Bioinformatics, 2009, 10(1): 283.

  28. Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315(5814): 972-976.

  29. Elmagarmid A K, Ipeirotis P G, Verykios V S. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(1): 1-16.

  30. Cohen W, Ravikumar P, Fienberg S. A comparison of string metrics for matching names and records. In Proc. the 2003 KDD Workshop on Data Cleaning and Object Consolidation, August 2003, pp.73-78.

  31. Hassanzadeh O, Chiang F, Lee H C, Miller R J. Framework for evaluating clustering algorithms in duplicate detection. Proceedings of the VLDB Endowment, 2009, 2(1): 1282-1293.

  32. Dong X, Halevy A, Madhavan J. Reference reconciliation in complex information spaces. In Proc. the 2005 ACM SIGMOD International Conference on Management of Data, June 2005, pp.85-96.

  33. Nuray-Turan R, Kalashnikov D V, Mehrotra S. Adaptive connection strength models for relationship-based entity resolution. Journal of Data and Information Quality, 2013, 4(2): Article No. 8.

  34. Tong H, Faloutsos C, Pan J Y. Fast random walk with restart and its applications. In Proc. the 6th IEEE International Conference on Data Mining, December 2006, pp.613-622.

  35. Fan X, Wang J, Pu X, Zhou L, Lv B. On graph-based name disambiguation. Journal of Data and Information Quality, 2011, 2(2): Article No. 10.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen-Chen Sun.

Additional information

This work is supported by the National Basic Research 973 Program of China under Grant No. 2012CB316201, the Fundamental Research Funds for the Central Universities of China under Grant No. N120816001, and the National Natural Science Foundation of China under Grant Nos. 61472070 and 61402213.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, CC., Shen, DR., Kou, Y. et al. Topological Features Based Entity Disambiguation. J. Comput. Sci. Technol. 31, 1053–1068 (2016). https://doi.org/10.1007/s11390-016-1679-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-016-1679-6

Keywords

Navigation