research-article

Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved Documents

Authors:
Hwan-Gue Cho

Dep. of Computer Sci. and Eng., PUSAN National Univ., Korea

Dep. of Computer Sci. and Eng., PUSAN National Univ., Korea
View Profile

,
Hae-Sung Tak

Dep. of Computer Sci. and Eng., PUSAN National Univ., Korea

Dep. of Computer Sci. and Eng., PUSAN National Univ., Korea
View Profile

,
Han-Ho Kim

Dep. of Computer Sci. and Eng., PUSAN National Univ., Korea

Dep. of Computer Sci. and Eng., PUSAN National Univ., Korea
View Profile

,
Yeoneo Kim

Programming Language, Laboratory, PUSAN National Univ., Korea

Programming Language, Laboratory, PUSAN National Univ., Korea
View Profile

,
Young-Ju Shin

Korea Institute of Science and Technology Information, Korea

Korea Institute of Science and Technology Information, Korea
View Profile

,
Chulsu Lim

Korea Institute of Science and Technology Information, Korea

Korea Institute of Science and Technology Information, Korea
View Profile

,
Kwang-Nam Choi

Korea Institute of Science and Technology Information, Korea

Korea Institute of Science and Technology Information, Korea
View Profile

ICIBE '17: Proceedings of the 3rd International Conference on Industrial and Business EngineeringAugust 2017Pages 40–45https://doi.org/10.1145/3133811.3133817

Published:17 August 2017Publication History

ICIBE '17: Proceedings of the 3rd International Conference on Industrial and Business Engineering

Pages 40–45

ABSTRACT

Finding a document that is similar to a specified query document within a large document database is one of important issues in the Big Data era, as most data available is in the form of unstructured texts. Our testing collection consists of two parts: In the first part texts were produced by human work by artificial plagiarism approach through the linear pipelined procedure. In the second part, texts are generated by software that inserts, deletes, and substitutes certain parts of the target documents to make a similar document from an input document. These document set is known as the Serially Evolved Documents (SED). We propose new methods: Order Preserving Precision (OPP) and Order Preserving Recall (OPR), to compute how the evolutionary order is kept among output documents obtained from the subject IR system. Using those testing texts we evaluated KONAN, a document retrieval system for Korean documents.

References

Eugene Agichtein and Silviu Cucerzan. 2005. Predicting accuracy of extracting information from unstructured text collections. In Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 413--420. Google ScholarDigital Library
David C Blair and Melvin E Maron. 1985. An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM 28, 3 (1985), 289--299. Google ScholarDigital Library
Vuk Ercegovac, David J DeWitt, and Raghu Ramakrishnan. 2005. The TEXTURE benchmark: measuring performance of text queries on a relational DBMS. In Proceedings of the 31st international conference on Very large data bases. VLDB Endowment, 313--324.Google Scholar
Claudia Hauff and Franciska de Jong. 2010. Retrieval system evaluation: automatic evaluation versus incomplete judgments. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 863--864. Google ScholarDigital Library
Cyril Labbé and Dominique Labbé. 2013. Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science? Scientomet- rics 94, 1 (2013), 379--396. Google ScholarDigital Library
Matt Mahoney. 2009. Large text compression benchmark. URL: http://www. mattmahoney. net/text/text.html (2009).Google Scholar
Gerard Salton, James Allan, and Chris Buckley. 1993. Approaches to passage retrieval in full text information systems. In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 49--58. Google ScholarDigital Library
Mark Sanderson et al. 2010. Test collection based evaluation of information retrieval systems. Foundations and Trends® in Information Retrieval 4, 4 (2010), 247--375.Google Scholar
Ellen M Voorhees and Donna Harman. 2000. Overview of the sixth text retrieval conference (TREC-6). Information Processing & Management 36, 1 (2000), 3--35. Google ScholarDigital Library
Ellen M Voorhees, Donna K Harman, et al. 2005. TREC: Experiment and evaluation in information retrieval. Vol. 1. MIT press Cambridge.Google ScholarDigital Library

Index Terms

Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved Documents
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval effectiveness
  2. Information systems applications
    1. Digital libraries and archives

Recommendations

An evaluation of retrieval effectiveness for a full-text document-retrieval system

An evaluation of a large, operational full-text document-retrieval system (containing roughly 350,000 pages of text) shows the system to be retrieving less than 20 percent of the documents relevant to a particular search. The findings are discussed in ...
Read More
Imaged Document Text Retrieval Without OCR

We propose a method for text retrieval from document images without the use of OCR. Documents are segmented into character objects. Image features, namely, the Vertical Traverse Density (VTD) and Horizontal Traverse Density (HTD), are extracted. An n-...
Read More
Documents clustering using tolerance rough set model and its application to information retrieval
Intelligent exploration of the web

Clustering is a powerful tool for analyzing and finding useful information in text collections. However, document clustering is a difficult clustering problem because of the unstructured form and textual characteristics of documents. As a consequence, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICIBE '17: Proceedings of the 3rd International Conference on Industrial and Business Engineering
August 2017
107 pages
ISBN:9781450353519
DOI:10.1145/3133811

Copyright © 2017 ACM
© 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Document Searching
Information Retrieval
Performance Evaluation
Text Similarity
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 55
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved Documents

ICIBE '17: Proceedings of the 3rd International Conference on Industrial and Business Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

An evaluation of retrieval effectiveness for a full-text document-retrieval system

Imaged Document Text Retrieval Without OCR

Documents clustering using tolerance rough set model and its application to information retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved Documents

ICIBE '17: Proceedings of the 3rd International Conference on Industrial and Business Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

An evaluation of retrieval effectiveness for a full-text document-retrieval system

Imaged Document Text Retrieval Without OCR

Documents clustering using tolerance rough set model and its application to information retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media