skip to main content
10.1145/3582768.3582787acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnlpirConference Proceedingsconference-collections
research-article

DIFFSTRACT: distinguishing the content of texts

Published:27 June 2023Publication History

ABSTRACT

Nowadays, it is almost a standard issue to generate summaries of texts automatically. In contrast, it is still a problem to identify the differences in the statements of the two publications. For the most part, this still requires a human being to read and evaluate at least excerpts of the relevant passages. Finding a so-called text differentiation with appropriate tools is becoming an increasingly interesting and important task to effectively cope with the daily flood of information on the WWW. For years, co-occurrence graphs have been a proven means of deriving statements of various kinds from texts. So-called text- representing centroids (TRC's) has often been an effective tool for identifying, comparing and categorizing texts or sections. The present article examines how a different form of co-occurrence graphs can take place and be helpful. First, different co-occurrence graphs are built from a larger corpus and various individual texts or text groups. Subsequently, the calculated difference graphs can be used to create summaries that precisely characterize the differences between texts. Experimental results show that this new method works well.

References

  1. Cancho, R. F. i., and Solé, R. V. (2001). The Small World of Human Language. Proc. R. Soc. Lond. B 268, 2261–2265. doi:10.1098/rspb.2001.1800Google ScholarGoogle ScholarCross RefCross Ref
  2. Unger, Herwig & Kubek, Mario & Ruamsuk, Yanakorn & Mingkhwan, A. (2022). A Concept for Recommender Systems Based on Text-Representing Centroids. 10.1007/978-3-030-90936-9_2.Google ScholarGoogle Scholar
  3. Supaporn Simcharoen and Herwig Unger. The brain: WebEngine version 2.0. In The Autonomous Web, chapter 4, pages 51-69. Springer, first edition, 2021.Google ScholarGoogle Scholar
  4. Jiawei Han, Micheline Kamber, Jian Pei. Data Mining Trends and Research Frontiers, Data Mining (Third Edition), Morgan Kaufmann, 2012, Pages 39-82, ISBN 9780123814791, doi.org/10.1016/B978-0-12-381479-1.00002-2.Google ScholarGoogle ScholarCross RefCross Ref
  5. HA Maurer, F Kappe, B Zaka. Plagiarism-A survey, Journal of Universal Computer Science, vol. 12, no. 8 (2006), 1050-1084Google ScholarGoogle Scholar
  6. n.n. spaCY: industrial-strength natural language processing. Website and download from https://spacy.io/, last visited October 5t, 2022Google ScholarGoogle Scholar
  7. Kubek, Mario & Unger, Herwig. (2016). Centroid Terms as Text Representatives. 10.1145/2960811.2967150.Google ScholarGoogle Scholar
  8. Kubek, Mario & Boehme, Thomas & Unger, Herwig. (2017). Empiric Experiments with Text Representing Centroids. Lecture Notes on Information Theory. 5. 23-28. 10.18178/lnit.5.1.23-28.Google ScholarGoogle ScholarCross RefCross Ref
  9. Kubek,M. M., T. Bo ̈hme, and Unger,H. Spreading Activation: A Fast Calculation Method for Text Centroids. In Proceedings of the 3rd International Conference on Communication and Information Processing (ICCIP 2017), New York, NY, USA, ACM, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Ruamsuk, A. Mingkhwan and H. Unger, "Generating and Evaluating Text Summarisations using Text-representing Centroids (TRC)," 2022 Research, Invention, and Innovation Congress: Innovative Electricals and Electronics (RI2C), 2022, pp. 330-333, doi: 10.1109/RI2C56397.2022.9910272.Google ScholarGoogle ScholarCross RefCross Ref
  11. Y Ruamsuk, W Tirasopitlert, A Mingkhwan, H Unger - Medical Recommendation System using Co-Occurrence Graphs NU. International Journal of Science, 2020Google ScholarGoogle Scholar

Index Terms

  1. DIFFSTRACT: distinguishing the content of texts
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              NLPIR '22: Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval
              December 2022
              241 pages
              ISBN:9781450397629
              DOI:10.1145/3582768

              Copyright © 2022 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 27 June 2023

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited
            • Article Metrics

              • Downloads (Last 12 months)19
              • Downloads (Last 6 weeks)0

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format