skip to main content
10.1145/3133811.3133817acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicibeConference Proceedingsconference-collections
research-article

Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved Documents

Authors Info & Claims
Published:17 August 2017Publication History

ABSTRACT

Finding a document that is similar to a specified query document within a large document database is one of important issues in the Big Data era, as most data available is in the form of unstructured texts. Our testing collection consists of two parts: In the first part texts were produced by human work by artificial plagiarism approach through the linear pipelined procedure. In the second part, texts are generated by software that inserts, deletes, and substitutes certain parts of the target documents to make a similar document from an input document. These document set is known as the Serially Evolved Documents (SED). We propose new methods: Order Preserving Precision (OPP) and Order Preserving Recall (OPR), to compute how the evolutionary order is kept among output documents obtained from the subject IR system. Using those testing texts we evaluated KONAN, a document retrieval system for Korean documents.

References

  1. Eugene Agichtein and Silviu Cucerzan. 2005. Predicting accuracy of extracting information from unstructured text collections. In Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 413--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. David C Blair and Melvin E Maron. 1985. An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM 28, 3 (1985), 289--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Vuk Ercegovac, David J DeWitt, and Raghu Ramakrishnan. 2005. The TEXTURE benchmark: measuring performance of text queries on a relational DBMS. In Proceedings of the 31st international conference on Very large data bases. VLDB Endowment, 313--324.Google ScholarGoogle Scholar
  4. Claudia Hauff and Franciska de Jong. 2010. Retrieval system evaluation: automatic evaluation versus incomplete judgments. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 863--864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cyril Labbé and Dominique Labbé. 2013. Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science? Scientomet- rics 94, 1 (2013), 379--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Matt Mahoney. 2009. Large text compression benchmark. URL: http://www. mattmahoney. net/text/text.html (2009).Google ScholarGoogle Scholar
  7. Gerard Salton, James Allan, and Chris Buckley. 1993. Approaches to passage retrieval in full text information systems. In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 49--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Mark Sanderson et al. 2010. Test collection based evaluation of information retrieval systems. Foundations and Trends® in Information Retrieval 4, 4 (2010), 247--375.Google ScholarGoogle Scholar
  9. Ellen M Voorhees and Donna Harman. 2000. Overview of the sixth text retrieval conference (TREC-6). Information Processing & Management 36, 1 (2000), 3--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ellen M Voorhees, Donna K Harman, et al. 2005. TREC: Experiment and evaluation in information retrieval. Vol. 1. MIT press Cambridge.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved Documents

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICIBE '17: Proceedings of the 3rd International Conference on Industrial and Business Engineering
        August 2017
        107 pages
        ISBN:9781450353519
        DOI:10.1145/3133811

        Copyright © 2017 ACM

        © 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 August 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited
      • Article Metrics

        • Downloads (Last 12 months)1
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader