skip to main content
10.1145/3129676.3129686acmconferencesArticle/Chapter ViewAbstractPublication PagesracsConference Proceedingsconference-collections
poster

Study on a Text Reuse Measurement Method Using Expanded Index Term

Published: 20 September 2017 Publication History

Abstract

Text reuse has become prominent in the process of information content digitalization owing to the popularization of the Internet and smartphones. Problems related to text reuse are various and complex, and these include problems related to text insertion, deletion, and replacement, and changing of word order. Moreover, in order to inspect reuse in texts with many sources, there must be an efficient method to inspect within a reasonable amount of time and using a reasonable amount of resources. This work is an attempt to improve accuracy of text reuse measurement by using expanded index terms, expanding the range of reused inspection sentences, and circularizing words in order to resolve the issue of undetected reused sentences that arise from the replacement of similar terms. The efficiency of the proposed method was proven through a comparative evaluation with the existing reuse inspection methods.

References

[1]
Seo, J. and Croft, W. B. 2008. Local text reuse detection. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (Singapore, Singapore, July 20--24, 2008). SIGIR'08. ACM, New York, NY, 571--578. DOI=10.1145/1390334.1390432
[2]
Choi, S., Kim, S. and Rim. H. 2005. A Text Reuse Measuring Model Using Circumference Sentence Similarity. In Proceedings of the 17th Annual Conference on Human and Cognitive Language Technology, (Seoul, South Korea, October 21--22, 2005), 179--183.
[3]
Lyon, C., Malcolm, J., and Dickerson, B. 2001. Detecting short passages of similar text in large document collections. In Proceedings of the 2001 conference on empirical methods in natural language processing, (Pittsburgh, USA, June 3--4, 2001). EMNLP'01. 118--125.
[4]
Lulu, L., Belkhouche, B., and Harous, S. 2016. Overview of fingerprinting methods for local text reuse detection. In Proceedings of the 12th International Conference on Innovations in Information Technology, (Al Ain, United Arab Emirates, November 28--29, 2016), IIT'16, IEEE, 1--6. DOI= 10.1109/INNOVATIONS.2016.7880050.
[5]
Clough, P., Gaizauskas, R., Piao, S. S., and Wilks, Y. 2002. Meter: Measuring text reuse. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, (Philadelphia, Pennsylvania, July 07--12, 2002). ACL'02. 152--159. DOI=10.3115/1073083.1073110.
[6]
Shojaei, P. 2013. Recursive Document Reuse Detection. Master's Thesis. University of Amsterdam.
[7]
Stein, B., zu Eissen, S. M., and Potthast, M. 2007. Strategies for retrieving plagiarized documents. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. (Amsterdam, Netherlands, July 23--27, 2007) SIGIR'07. ACM, New York, NY, 825--826. DOI=10.1145/1390334.1390432.
[8]
Ramos, J. 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning. (Piscataway, USA, December 3--8, 2003). iCML'03. 1--4.
[9]
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (Nevada, USA, December 5--10, 2013), NIPS'13. 3111--3119.
[10]
kumar Jayapal, A. Similarity Overlap Metric and Greedy String Tiling at PAN 2012: Plagiarism Detection. In 2012 Conference and Labs of the Evaluation Forum (Rome, Italy september 17--20, 2012). CLEF'12. 1--6.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
RACS '17: Proceedings of the International Conference on Research in Adaptive and Convergent Systems
September 2017
324 pages
ISBN:9781450350273
DOI:10.1145/3129676
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 September 2017

Check for updates

Badges

  • Best Poster

Author Tags

  1. Expanded Index Term
  2. Greedy String Tiling Algorithm
  3. Text Reuse
  4. Word2Vec

Qualifiers

  • Poster
  • Research
  • Refereed limited

Conference

RACS '17
Sponsor:

Acceptance Rates

RACS '17 Paper Acceptance Rate 48 of 207 submissions, 23%;
Overall Acceptance Rate 393 of 1,581 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 33
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media