skip to main content
10.1145/3292500.3330772acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Ranking in Genealogy: Search Results Fusion at Ancestry

Published:25 July 2019Publication History

ABSTRACT

Genealogy research is the study of family history using available resources such as historical records. Ancestry provides its customers with one of the world's largest online genealogical index with billions of records from a wide range of sources, including vital records such as birth and death certificates, census records, court and probate records among many others. Search at Ancestry aims to return relevant records from various record types, allowing our subscribers to build their family trees, research their family history, and make meaningful discoveries about their ancestors from diverse perspectives.

In a modern search engine designed for genealogical study, the appropriate ranking of search results to provide highly relevant information represents a daunting challenge. In particular, the disparity in historical records makes it inherently difficult to score records in an equitable fashion. Herein, we provide an overview of our solutions to overcome such record disparity problems in the Ancestry search engine. Specifically, we introduce customized coordinate ascent (customized CA) to speed up ranking within a specific record type. We then propose stochastic search (SS) that linearly combines ranked results federated across contents from various record types. Furthermore, we propose a novel information retrieval metric, normalized cumulative entropy (NCE), to measure the diversity of results. We demonstrate the effectiveness of these two algorithms in terms of relevance (by NDCG) and diversity (by NCE) if applicable in the offline experiments using real customer data at Ancestry.

References

  1. Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining - WSDM '09. ACM Press, Barcelona, Spain, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Christopher J C Burges. 2010. From RankNet to LambdaRank to LambdaMART: An Overview. (2010), 19.Google ScholarGoogle Scholar
  3. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, Vol. 2, 3 (April 2011), 1--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08. ACM Press, Singapore, Singapore, 659. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. David Cossock and Tong Zhang. 2006. Subset Ranking Using Regression. In Proceedings of the 19th Annual Conference on Learning Theory (COLT'06). Springer-Verlag, Berlin, Heidelberg, 605--619. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V Dang. 2013. The Lemur Project-Wiki-RankLib. http://sourceforge. net/p/lemur/wiki/RankLib.Google ScholarGoogle Scholar
  7. Feng Guan, Shuiyuan Zhang, Chunmei Liu, Xiaoming Yu, Yue Liu, and Xueqi Cheng. 2014. ICTNET at Federated Web Search Track 2014. (2014), 5.Google ScholarGoogle Scholar
  8. Maryam Karimzadehgan, Wei Li, Ruofei Zhang, and Jianchang Mao. 2011. A stochastic learning-to-rank algorithm and its application to contextual advertising. In Proceedings of the 20th international conference on World wide web - WWW '11. ACM Press, Hyderabad, India, 377. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ralf Krestel and Peter Fankhauser. 2012. Reranking web search results for diversity. Information Retrieval, Vol. 15, 5 (Oct 2012), 458--477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Leah S. Larkey, Margaret E. Connell, and Jamie Callan. 2000. Collection Selection and Results Merging with Topically Organized U. S. Patents and TREC Data. In Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM '00). ACM, New York, NY, USA, 282--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. David Lillis, Fergus Toolan, Rem Collier, and John Dunnion. 2006. ProbFuse: A Probabilistic Approach to Data Fusion. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06 (2006), 139. arXiv: 1409.8518. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. I. M. McKinnon. 1998. Convergence of the Nelder--Mead Simplex Method to a Nonstationary Point. SIAM Journal on Optimization, Vol. 9, 1 (Jan. 1998), 148--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Shriphani Palakodety and Jamie Callan. 2014. Query Transformations for Result Merging. (2014), 5.Google ScholarGoogle Scholar
  14. Allison L. Powell, James C. French, Jamie Callan, Margaret Connell, and Charles L. Viles. 2000. The Impact of Database Selection on Distributed Searching. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '00). ACM, New York, NY, USA, 232--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. J. D. Powell. 1973. On search directions for minimization algorithms. Mathematical Programming, Vol. 4, 1 (Dec. 1973), 193--201.Google ScholarGoogle ScholarCross RefCross Ref
  16. Filip Radlinski and Susan Dumais. 2006. Improving personalized web search using result diversification. In 29th annual international ACM SIGIR conference. ACM, 691--692. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C E Shannon. 1949. A Mathematical Theory of Communication. (1949), 55.Google ScholarGoogle Scholar
  18. Daniel Sheldon, Milad Shokouhi, Martin Szummer, and Nick Craswell. 2011. LambdaMerge: Merging the Results of Query Reformulations. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (WSDM '11). ACM, New York, NY, USA, 795--804. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Milad Shokouhi and Justin Zobel. 2009. Robust result merging using sample-based score estimates. ACM Transactions on Information Systems, Vol. 27, 3 (May 2009), 1--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Luo Si and Jamie Callan. 2002. Using Sampled Data and Regression to Merge Search Engine Results. SIGIR, Vol. 8 (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Christopher C Vogt. 1999. Fusion Via a Linear Combination of Scores. Information Retrieval, Vol. 1 (1999), 151--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Qiang Wu, Christopher J. C. Burges, Krysta M. Svore, and Jianfeng Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval, Vol. 13, 3 (June 2010), 254--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Cheng Zhai, William W. Cohen, and John Lafferty. 2003. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In 26th Annual International ACM SIGIR Conference (SIGIR '03). 10--17. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Ranking in Genealogy: Search Results Fusion at Ancestry

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
          July 2019
          3305 pages
          ISBN:9781450362016
          DOI:10.1145/3292500

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 July 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24
        • Article Metrics

          • Downloads (Last 12 months)11
          • Downloads (Last 6 weeks)1

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader