skip to main content
10.1145/1390334.1390531acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
poster

Generalising multiple capture-recapture to non-uniform sample sizes

Published:20 July 2008Publication History

ABSTRACT

Algorithms in distributed information retrieval often rely on accurate knowledge of the size of a collection. The "multiple capture-recapture" method of Shokouhi et al. is one of the more reliable algorithms for determining collection size, but it relies on samples with a uniform number of documents. Such uniform samples are often hard to obtain in a working system.

A simple generalisation of multiple capture-recapture does not rely on uniform sample sizes. Simulations show it is as accurate as the original method even when sample sizes vary considerably, making it a useful technique in real tools.

References

  1. K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. In Proc. WWW, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Callan and M. Connell. Query-based sampling of text databases. ACM Trans. Info. Systems, 19(2), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K.-L. Liu, A. Santoso, C. Yu, W. Meng, and C. Zhang. Discovering the representative of a search engine. In Proc. CIKM, 2001. Poster. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Shokouhi, J. Zobel, F. Scholer, and S. M. M. Tahaghoghi. Capturing collection size for distributed non-cooperative retrieval. In Proc. ACM SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Thomas and D. Hawking. Evaluating sampling methods for uncooperative collections. In Proc. ACM SIGIR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generalising multiple capture-recapture to non-uniform sample sizes

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
          July 2008
          934 pages
          ISBN:9781605581644
          DOI:10.1145/1390334

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 July 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Author Tags

          Qualifiers

          • poster

          Acceptance Rates

          Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader