skip to main content
10.1145/2348283.2348429acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
demonstration

ChatNoir: a search engine for the ClueWeb09 corpus

Published: 12 August 2012 Publication History

Abstract

We present the ChatNoir search engine which indexes the entire English part of the ClueWeb09 corpus. Besides Carnegie Mellon's Indri system, ChatNoir is the second publicly available search engine for this corpus. It implements the classic BM25F information retrieval model including PageRank and spam likelihood. The search engine is scalable and returns the first results within three seconds, which is significantly faster than Indri. A convenient API allows for implementing reproducible experiments based on retrieving documents from the ClueWeb09 corpus. The search engine has successfully accomplished a load test involving 100,000 queries.

References

[1]
Cormack, Smucker, and Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. Inf. Retr. 14(5):441--465, 2011.
[2]
Elsayed, Lin, and Metzler. When close enough is good enough: approximate positional indexes for efficient ranked retrieval. CIKM 2011, pp. 1993--1996.
[3]
Hiemstra and Hauff. MIREX: MapReduce information retrieval experiments. Tech. Report TR-CTIT-10-15, University of Twente, 2010.
[4]
Robertson, Zaragoza, and Taylor. Simple BM25 extension to multiple weighted fields. CIKM 2004, pp. 42--49.

Cited By

View all
  • (2024)Investigating the Effects of Sparse Attention on Cross-EncodersAdvances in Information Retrieval10.1007/978-3-031-56027-9_11(173-190)Online publication date: 24-Mar-2024
  • (2023)Users Meet Clarifying Questions: Toward a Better Understanding of User Interactions for Search ClarificationACM Transactions on Information Systems10.1145/352411041:1(1-25)Online publication date: 9-Jan-2023
  • (2023)Asking Clarifying Questions: To benefit or to disturb users in Web search?Information Processing & Management10.1016/j.ipm.2022.10317660:2(103176)Online publication date: Mar-2023
  • Show More Cited By

Index Terms

  1. ChatNoir: a search engine for the ClueWeb09 corpus

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
    August 2012
    1236 pages
    ISBN:9781450314725
    DOI:10.1145/2348283

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ClueWeb09
    2. search engine
    3. trec

    Qualifiers

    • Demonstration

    Conference

    SIGIR '12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Investigating the Effects of Sparse Attention on Cross-EncodersAdvances in Information Retrieval10.1007/978-3-031-56027-9_11(173-190)Online publication date: 24-Mar-2024
    • (2023)Users Meet Clarifying Questions: Toward a Better Understanding of User Interactions for Search ClarificationACM Transactions on Information Systems10.1145/352411041:1(1-25)Online publication date: 9-Jan-2023
    • (2023)Asking Clarifying Questions: To benefit or to disturb users in Web search?Information Processing & Management10.1016/j.ipm.2022.10317660:2(103176)Online publication date: Mar-2023
    • (2022)An Efficient Approach to Retrieve Information for Desktop Search EngineIntelligent Computing and Applications10.1007/978-981-19-4162-7_36(387-396)Online publication date: 14-Nov-2022
    • (2022)Query Expansion, Argument Mining and Document Scoring for an Efficient Question Answering SystemExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-13643-6_13(162-174)Online publication date: 25-Aug-2022
    • (2021)The information retrieval anthology 2021ACM SIGIR Forum10.1145/3476415.347641755:1(1-18)Online publication date: 16-Jul-2021
    • (2021)Predicting essay quality from search and writing behaviorJournal of the Association for Information Science and Technology10.1002/asi.2445172:7(839-852)Online publication date: 9-Jun-2021
    • (2020)Argument Retrieval from WebExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-030-58219-7_7(75-81)Online publication date: 22-Sep-2020
    • (2019)Evolution of the PAN Lab on Digital Text ForensicsInformation Retrieval Evaluation in a Changing World10.1007/978-3-030-22948-1_19(461-485)Online publication date: 14-Aug-2019
    • (2018)Elastic ChatNoir: Search Engine for the ClueWeb and the Common CrawlAdvances in Information Retrieval10.1007/978-3-319-76941-7_83(820-824)Online publication date: 1-Mar-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media