skip to main content
10.1145/2980258.2980442acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiciaConference Proceedingsconference-collections
research-article

Automatic Keyword Extraction for Text Summarization in e-Newspapers

Authors Info & Claims
Published:25 August 2016Publication History

Editorial Notes

NOTICE OF CONCERN: ACM has received evidence that casts doubt on the integrity of the peer review process for the ICIA 2016 Conference. As a result, ACM is issuing a Notice of Concern for all papers published and strongly suggests that the papers from this Conference not be cited in the literature until ACM's investigation has concluded and final decisions have been made regarding the integrity of the peer review process for this Conference.

ABSTRACT

Summarization is the process of reducing a text document to create a summary that retains the most important points of the original document. Extractive summarizers work on the given text to extract sentences that best convey the message hidden in the text. Most extractive summarization techniques revolve around the concept of finding keywords and extracting sentences that have more keywords than the rest. Keyword extraction usually is done by extracting relevant words having a higher frequency than others, with stress on important ones'. Manual extraction or annotation of keywords is a tedious process brimming with errors involving lots of manual effort and time. In this paper, we proposed an algorithm to extract keyword automatically for text summarization in e-newspaper datasets. The proposed algorithm is compared with the experimental result of articles having the similar title in four different e-Newspapers to check the similarity and consistency in summarized results.

References

  1. M. Banko and R. C. Moore. Part of speech tagging in context. In Proceedings of the 20th International Conference on Computational Linguistics, COLING '04, Stroudsburg, PA, USA, 2004. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Barzilay and M. Elhadad. Using lexical chains for text summarization. Advances in automatic text summarization, pages 111--121, 1999.Google ScholarGoogle Scholar
  3. S. O. CATEGORIZED. Keyword extraction based summarization of categorized kannada text documents. 2011.Google ScholarGoogle Scholar
  4. L.-F. Chien. Pat-tree-based keyword extraction for chinese information retrieval. In ACM SIGIR Forum, volume 31, pages 50--58. ACM, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. D. Cohen et al. Highlights: Language- and domain-independent automatic indexing terms for abstracting. JASIS, 46(3):162--174, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. M. Conroy and D. P. O'leary. Text summarization via hidden markov models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 406--407. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Ercan and I. Cicekli. Using lexical chains for keyword extraction. Information Processing & Management, 43(6):1705--1714, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Erkan and D. R. Radev. Lexrank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, pages 457--479, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill-Manning. Domain-specific keyphrase extraction. 1999.Google ScholarGoogle Scholar
  10. M. J. Giarlo. A comparative analysis of keyword extraction techniques. 2005.Google ScholarGoogle Scholar
  11. E. Hovy and C.-Y. Lin. Automated text summarization and the summarist system. In Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998, pages 197--214. Association for Computational Linguistics, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pages 216--223. Association for Computational Linguistics, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. K. Humphreys. Phraserate: An html keyphrase extractor. Dept. of Computer Science, University of California, Riverside, California, USA, Tech. Rep, 2002.Google ScholarGoogle Scholar
  14. S. Lappin and H. J. Leass. An algorithm for pronominal anaphora resolution. Computational linguistics, 20(4):535--561, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Litvak and M. Last. Graph-based keyword extraction for single-document summarization. In Proceedings of the workshop on Multi-source Multilingual Information Extraction and Summarization, pages 17--24. Association for Computational Linguistics, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of research and development, 1(4):309--317, 1957. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. I. Mani and M. T. Maybury. Advances in automatic text summarization, volume 293. MIT Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 19(2):1993, pp. 313--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Matsuo and M. Ishizuka. Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools, 13(01):157--169, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  20. A. K. Mondal and D. K. Maji. Improved algorithms for keyword extraction and headline generation from unstructured text. First Journal publication from SIMPLE groups, CLEAR Journal, 2013.Google ScholarGoogle Scholar
  21. J. Ramos. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, 2003.Google ScholarGoogle Scholar
  22. G. Salton, A. Singhal, M. Mitra, and C. Buckley. Automatic text structuring and summarization. Information Processing & Management, 33(2): 193--207, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Turney. Learning to extract keyphrases from text. 1999.Google ScholarGoogle Scholar
  24. L. van der Plas, V. Pallotta, M. Rajman, and H. Ghorbel. Automatic keyword extraction from spoken text. a comparison of two lexical resources: the edr and wordnet. arXiv preprint cs/0410062, 2004.Google ScholarGoogle Scholar
  25. I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea: Practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, pages 254--255. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Zhang. Automatic keyword extraction from documents using conditional random fields. Journal of Computational Information Systems, 4(3):1169--1180, 2008.Google ScholarGoogle Scholar
  27. K. Zhang, H. Xu, J. Tang, and J. Li. Keyword extraction using support vector machine. In Advances in Web-Age Information Management, pages 85--96. Springer, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICIA-16: Proceedings of the International Conference on Informatics and Analytics
    August 2016
    868 pages
    ISBN:9781450347563
    DOI:10.1145/2980258

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 25 August 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader