skip to main content
10.1145/2467696.2467709acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

A relevance feedback approach for the author name disambiguation problem

Published:22 July 2013Publication History

ABSTRACT

This paper presents a new name disambiguation method that exploits user feedback on ambiguous references across iterations. An unsupervised step is used to define pure training samples, and a hybrid supervised step is employed to learn a classification model for assigning references to authors. Our classification scheme combines the Optimum-Path Forest (OPF) classifier with complex reference similarity functions generated by a Genetic Programming framework. Experiments demonstrate that the proposed method yields better results than state-of-the-art disambiguation methods on two traditional datasets.

References

  1. Byung-Won On, Dongwon Lee, Jaewoo Kang, and Prasenjit Mitra. Comparative study of name disambiguation problem using a scalable blocking-based framework. In Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries, pages 344--353, Denver, CO, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anderson A. Ferreira, Marcos Andre Gonçalves, and Alberto H. F. Laender. A brief survey of automatic methods for author name disambiguation. SIGMOD Record, 41(2):15--26, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. P. Papa, A. X. Falc\ ao, and C. T. N. Suzuki. Supervised pattern classification based on optimum-path forest. International Journal of Imaging Systems and Technology, 19(2):120--131, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. P. Papa, A. X. Falc\ ao, V. H. C. Albuquerque, and J. M. R. S. Tavares. Efficient supervised optimum-path forest classification for large datasets. Pattern Recognition, 45(1):512--520, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hui Han, Hongyuan Zha, and C. Lee Giles. Name disambiguation in author citations using a k-way spectral clustering method. In Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries, pages 334--343, Denver, CO, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jian Huang, Seyda Ertekin, and C. Lee Giles. Efficient name disambiguation for large-scale databases. In Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 536--544, Berlin, Germany, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Byung-Won On, Ergin Elmacioglu, Dongwon Lee, Jaewoo Kang, and Jian Pei. Improving grouped-entity resolution using quasi-cliques. In Proceedings of the 6th IEEE International Conference on Data Mining, pages 1008--015, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Indrajit Bhattacharya and Lise Getoor. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data, 1(1), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Aron Culotta, Pallika Kanani, Robert Hall, Michael Wick, and Andrew McCallum. Author disambiguation using error-driven machine learning with a ranking loss function. In Proceedings of the International Workshop on Information Integration on the Web, Vancouver, Canada, 2007.Google ScholarGoogle Scholar
  10. In-Su Kang, Seung-Hoon Na, Seungwoo Lee, Hanmin Jung, Pyung Kim, Won-Kyung Sung, and Jong-Hyeok Lee. On co-authorship for author disambiguation. Information Processing & Management, 45(1):84--97, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Byung-Won On and Dongwon Lee. Scalable name disambiguation using multi-level graph partition. In Proceedings of the 7th SIAM International Conference on Data Mining, pages 575--580, Minneapolis, Minnesota, USA, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  12. José M. Soler. Separating the articles of authors with the same name. Scientometrics, 72(2):281--290, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  13. Yang Song, Jian Huang, Isaac G. Councill, Jia Li, and C. Lee Giles. Efficient topic-based unsupervised name disambiguation. In Proceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries, pages 342--351, Vancouver, BC, Canada, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Denilson Alves Pereira, Berthier A. Ribeiro-Neto, Nivio Ziviani, Alberto H. F. Laender, Marcos André Gonçalves, and Anderson A. Ferreira. Using web information for author name disambiguation. In Proceedings of the 2009 ACM/IEEE Joint Conference on Digital Libraries, pages 49--58, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Vetle I. Torvik and Neil R. Smalheiser. Author name disambiguation in medline. ACM Transactions on Knowledge Discovery from Data, 3(3):1--29, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Pucktada Treeratpituk and C. Lee Giles. Disambiguating authors in academic publications using random forests. In Proceedings of the 2009 ACM/IEEE Joint Conference on Digital Libraries, pages 39--48, Austin, TX, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ricardo G. Cota, Anderson Almeida Ferreira, Marcos André Gonçalves, Alberto H. F. Laender, and Cristiano Nascimento. An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 61(9):1853--1870, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A.A. Ferreira, A. Veloso, M.A. Gonçalves, and A.H.F. Laender. Effective self-training author name disambiguation in scholarly digital libraries. In Proceedings of the 10th annual joint conference on Digital libraries, pages 39--48. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Xiaoming Fan, Jianyong Wang, Xu Pu, Lizhu Zhou, and Bing Lv. On graph-based name disambiguation. ACM Journal of Data and Information Quality, 2:10:1--10:23, February 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ana Paula Carvalho, Anderson A. Ferreira, Alberto H. F. Laender, and Marcos André Gonçalves. Incremental unsupervised name disambiguation in cleaned digital libraries. Journal of Information and Data Management, 2(3):289--304, 2011.Google ScholarGoogle Scholar
  21. Michael Levin, Stefan Krawzyk, Steven Bethard, and Dan Jurafsky. Citation-based bootstrapping for large-scale author disambiguation. Journal of the American Society for Information Science and Technology, 63(5):1030--1047, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Felipe H. Levin and Carlos A. Heuser. Evaluating the use of social networks in author name disambiguation in digital libraries. Journal of Information and Data Management, 1(2):183--197, 2010.Google ScholarGoogle Scholar
  23. Hui Han, C. Lee Giles, Hongyuan Zha, Cheng Li, and Kostas Tsioutsiouliklis. Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 296--305, Tuscon, USA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hui Han, Wei Xu, Hongyuan Zha, and C. Lee Giles. A hierarchical naive Bayes mixture model for name disambiguation in author citations. In Proceedings of the 2005 ACM Symposium on Applied Computing, pages 1065--1069, Santa Fe, New Mexico, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Indrajit Bhattacharya and Lise Getoor. A latent dirichlet model for unsupervised entity resolution. In Proceedings of the Sixth SIAM International Conference on Data Mining, Bethesda, MD, USA, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  26. Jie Tang, Auvis C. M. Fong, Bo Wang, and Jing Zhang. A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering, 24(6):975--987, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Adriano Veloso, Anderson A. Ferreira, Marcos A. Gonçalves, Alberto H.F. Laender, and Wagner Meira Jr. Cost-effective on-demand associative author name disambiguation. Information Processing & Management, 48(4):680 -- 697, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A.A. Ferreira, T.M. Machado, and M.A. Gonçalves. Improving author name disambiguation with user relevance feedback. Journal of Information and Data Management, 3(3):332, 2012.Google ScholarGoogle Scholar
  29. Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Publishing Company, 2nd edition, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. X. Wang, J. Tang, H. Cheng, and P.S. Yu. Adana: Active name disambiguation. In Proceedings of the 11th International Conference on Data Mining, pages 794--803, Vancouver,Canada, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yuhua Li, Aiming Wen, Quan Lin, Ruixuan Li, and Zhengding Lu. Incorporating user feedback into name disambiguation of scientific cooperation network. In Proceedings of the 12th international conference on Web-age information management, WAIM'11, pages 454--466, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A.T. da Silva, J.A. dos Santos, A.X. Falc\ ao, R.S. Torres, and L.P. Magalh\ aes. Incorporating multiple distance spaces in optimum-path forest classification to improve feedback-based learning. Computer Vision and Image Understanding, 116(4):510--523, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A.T. da Silva, AX Falc\ ao, and L.P. Magalh\ aes. A new cbir approach based on relevance feedback and optimum-path forest classification. Journal of WSCG, 18(1--3):73--80, 2010.Google ScholarGoogle Scholar
  34. Jefersson Alex dos Santos, André Tavares da Silva, Ricardo da Silva Torres, Alexandre X. Falcão, Léo Pini Magalhães, and Rubens A. C. Lamparelli. Interactive classification of remote sensing images by using optimum-path forest and genetic programming. In 14th International Conference on Computer Analysis of Images and Patterns (CAIP), pages 300--307, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. Calumby, R. da S. Torres, and M. A. Gonçalves. Multimodal retrieval with relevance feedback based on genetic programming. Multimedia Tools and Applications, pages 1--29, 2012.Google ScholarGoogle Scholar
  36. F. S. P. Andrade, J. Almeida, H. Pedrini, and R. da S. Torres. Fusion of local and global descriptors for content-based image and video retrieval. In Iberoamerican Congress on Pattern Recognition, pages 845--853, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  37. F. F. Faria, A. Veloso, H. M. Almeida, E. Valle, R. da S. Torres, M. A. Gonçalves, and W. Meira Jr. Learning to rank for content-based image retrieval. In ACM MIR, pages 285--294, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. R. da S. Torres, A. X. Falc\ ao, M. A. Gonçalves, J. P. Papa, B. Zhang, W. Fan, and E. A. Fox. A genetic programming framework for content-based image retrieval. Pattern Recognition, 42(2):283--292, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. C. D. Ferreira, J. A. Santos, R. da S. Torres, M. A. Gonçalves, R. C. Rezende, and W. Fan. Relevance feedback based on genetic programming for image retrieval. Pattern Recognition Letters, 32(1):27--37, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Weiguo Fan, Praveen Pathak, and Mi Zhou. Genetic-based approaches in ranking function discovery and optimization in information retrieval - a framework. Decision Support Systems, 47(4):398--407, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. H. M. de Almeida, M. A. Gonçalves, M. Cristo, and P. P. Calado. A combined component approach for finding collection-adapted ranking functions based on genetic programming. In ACM SIGIR, pages 399--406, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction to algorithms. MIT press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A.A. Ferreira, R. Silva, M.A. Gonçalves, A. Veloso, and A.H.F. Laender. Active associative sampling for author name disambiguation. In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, pages 175--184. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. In-Su Kang, Pyung Kim, Seungwoo Lee, Hanmin Jung, and Beom-Jong You. Construction of a large-scale test set for author disambiguation. Information Processing and Management, 47(3):452--465, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Itshak Lapidot. Self-Organizing-Maps with BIC for Speaker Clustering. Technical report, IDIAP Research Institute, Martigny, Switzerland, 2002.Google ScholarGoogle Scholar
  46. C. J. Van Rijsbergen. Information Retrieval, 2nd edition. Butterworths, London, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Robert Feldt and Peter Nordin. Using factorial experiments to evaluate the effect of genetic programming parameters. In EuroGP, pages 271--282, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A relevance feedback approach for the author name disambiguation problem

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              JCDL '13: Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
              July 2013
              480 pages
              ISBN:9781450320771
              DOI:10.1145/2467696

              Copyright © 2013 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 22 July 2013

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              JCDL '13 Paper Acceptance Rate28of95submissions,29%Overall Acceptance Rate415of1,482submissions,28%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader