research-article

Ranking in Genealogy: Search Results Fusion at Ancestry

Authors:
Peng Jiang

Ancestry, San Francisco, CA, USA

Ancestry, San Francisco, CA, USA
View Profile

,
Yingrui Yang

Ancestry, San Francisco, CA, USA

Ancestry, San Francisco, CA, USA
View Profile

,
Gann Bierner

Ancestry, San Francisco, CA, USA

Ancestry, San Francisco, CA, USA
View Profile

,
Fengjie Alex Li

Ancestry, San Francisco, CA, USA

Ancestry, San Francisco, CA, USA
View Profile

,
Ruhan Wang

Ancestry, San Francisco, CA, USA

Ancestry, San Francisco, CA, USA
View Profile

,
Azadeh Moghtaderi

Ancestry, San Francisco, CA, USA

Ancestry, San Francisco, CA, USA
View Profile

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2019Pages 2754–2764https://doi.org/10.1145/3292500.3330772

Published:25 July 2019Publication History

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 2754–2764

ABSTRACT

Genealogy research is the study of family history using available resources such as historical records. Ancestry provides its customers with one of the world's largest online genealogical index with billions of records from a wide range of sources, including vital records such as birth and death certificates, census records, court and probate records among many others. Search at Ancestry aims to return relevant records from various record types, allowing our subscribers to build their family trees, research their family history, and make meaningful discoveries about their ancestors from diverse perspectives.

In a modern search engine designed for genealogical study, the appropriate ranking of search results to provide highly relevant information represents a daunting challenge. In particular, the disparity in historical records makes it inherently difficult to score records in an equitable fashion. Herein, we provide an overview of our solutions to overcome such record disparity problems in the Ancestry search engine. Specifically, we introduce customized coordinate ascent (customized CA) to speed up ranking within a specific record type. We then propose stochastic search (SS) that linearly combines ranked results federated across contents from various record types. Furthermore, we propose a novel information retrieval metric, normalized cumulative entropy (NCE), to measure the diversity of results. We demonstrate the effectiveness of these two algorithms in terms of relevance (by NDCG) and diversity (by NCE) if applicable in the offline experiments using real customer data at Ancestry.

References

Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining - WSDM '09. ACM Press, Barcelona, Spain, 5. Google ScholarDigital Library
Christopher J C Burges. 2010. From RankNet to LambdaRank to LambdaMART: An Overview. (2010), 19.Google Scholar
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, Vol. 2, 3 (April 2011), 1--27. Google ScholarDigital Library
Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08. ACM Press, Singapore, Singapore, 659. Google ScholarDigital Library
David Cossock and Tong Zhang. 2006. Subset Ranking Using Regression. In Proceedings of the 19th Annual Conference on Learning Theory (COLT'06). Springer-Verlag, Berlin, Heidelberg, 605--619. Google ScholarDigital Library
V Dang. 2013. The Lemur Project-Wiki-RankLib. http://sourceforge. net/p/lemur/wiki/RankLib.Google Scholar
Feng Guan, Shuiyuan Zhang, Chunmei Liu, Xiaoming Yu, Yue Liu, and Xueqi Cheng. 2014. ICTNET at Federated Web Search Track 2014. (2014), 5.Google Scholar
Maryam Karimzadehgan, Wei Li, Ruofei Zhang, and Jianchang Mao. 2011. A stochastic learning-to-rank algorithm and its application to contextual advertising. In Proceedings of the 20th international conference on World wide web - WWW '11. ACM Press, Hyderabad, India, 377. Google ScholarDigital Library
Ralf Krestel and Peter Fankhauser. 2012. Reranking web search results for diversity. Information Retrieval, Vol. 15, 5 (Oct 2012), 458--477. Google ScholarDigital Library
Leah S. Larkey, Margaret E. Connell, and Jamie Callan. 2000. Collection Selection and Results Merging with Topically Organized U. S. Patents and TREC Data. In Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM '00). ACM, New York, NY, USA, 282--289. Google ScholarDigital Library
David Lillis, Fergus Toolan, Rem Collier, and John Dunnion. 2006. ProbFuse: A Probabilistic Approach to Data Fusion. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06 (2006), 139. arXiv: 1409.8518. Google ScholarDigital Library
K. I. M. McKinnon. 1998. Convergence of the Nelder--Mead Simplex Method to a Nonstationary Point. SIAM Journal on Optimization, Vol. 9, 1 (Jan. 1998), 148--158. Google ScholarDigital Library
Shriphani Palakodety and Jamie Callan. 2014. Query Transformations for Result Merging. (2014), 5.Google Scholar
Allison L. Powell, James C. French, Jamie Callan, Margaret Connell, and Charles L. Viles. 2000. The Impact of Database Selection on Distributed Searching. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '00). ACM, New York, NY, USA, 232--239. Google ScholarDigital Library
M. J. D. Powell. 1973. On search directions for minimization algorithms. Mathematical Programming, Vol. 4, 1 (Dec. 1973), 193--201.Google ScholarCross Ref
Filip Radlinski and Susan Dumais. 2006. Improving personalized web search using result diversification. In 29th annual international ACM SIGIR conference. ACM, 691--692. Google ScholarDigital Library
C E Shannon. 1949. A Mathematical Theory of Communication. (1949), 55.Google Scholar
Daniel Sheldon, Milad Shokouhi, Martin Szummer, and Nick Craswell. 2011. LambdaMerge: Merging the Results of Query Reformulations. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (WSDM '11). ACM, New York, NY, USA, 795--804. Google ScholarDigital Library
Milad Shokouhi and Justin Zobel. 2009. Robust result merging using sample-based score estimates. ACM Transactions on Information Systems, Vol. 27, 3 (May 2009), 1--29. Google ScholarDigital Library
Luo Si and Jamie Callan. 2002. Using Sampled Data and Regression to Merge Search Engine Results. SIGIR, Vol. 8 (2002). Google ScholarDigital Library
Christopher C Vogt. 1999. Fusion Via a Linear Combination of Scores. Information Retrieval, Vol. 1 (1999), 151--173. Google ScholarDigital Library
Qiang Wu, Christopher J. C. Burges, Krysta M. Svore, and Jianfeng Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval, Vol. 13, 3 (June 2010), 254--270. Google ScholarDigital Library
Cheng Zhai, William W. Cohen, and John Lafferty. 2003. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In 26th Annual International ACM SIGIR Conference (SIGIR '03). 10--17. Google ScholarDigital Library

Index Terms

Ranking in Genealogy: Search Results Fusion at Ancestry
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Family History Discovery through Search at Ancestry
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

At Ancestry, we apply learning to rank algorithms to a new area to assist our customers in better understanding their family history. The foundation of our service is an extensive and unique collection of billions of historical records that we have ...
Read More
Quality-biased ranking for queries with commercial intent
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

Modern search engines are good enough to answer popular commercial queries with mainly highly relevant documents. However, our experiments show that users behavior on such relevant commercial sites may differ from one to another web-site with the same ...
Read More
Ranking Relevance in Yahoo Search
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Search engines play a crucial role in our daily lives. Relevance is the core problem of a commercial search engine. It has attracted thousands of researchers from both academia and industry and has been studied for decades. Relevance in a modern search ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2019
3305 pages
ISBN:9781450362016
DOI:10.1145/3292500
General Chairs:
Ankur Teredesai
KenSci
,
Vipin Kumar
University of Minnesota
,
Program Chairs:
Ying Li
EV Analysis Corporation
,
Rómer Rosales
LinkedIn
,
Evimaria Terzi
Boston University
,
George Karypis
University of Minnesota
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
diversity metric
federated search
genealogy
learning to rank
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 240
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Ranking in Genealogy: Search Results Fusion at Ancestry

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Family History Discovery through Search at Ancestry

Quality-biased ranking for queries with commercial intent

Ranking Relevance in Yahoo Search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Ranking in Genealogy: Search Results Fusion at Ancestry

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Family History Discovery through Search at Ancestry

Quality-biased ranking for queries with commercial intent

Ranking Relevance in Yahoo Search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media