skip to main content
10.1145/2682571.2797077acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
short-paper

Automatic Document Classification using Summarization Strategies

Published: 08 September 2015 Publication History

Abstract

An efficient way to automatically classify documents may be provided by automatic text summarization, the task of creating a shorter text from one or several documents. This paper presents an assessment of the 15 most widely used methods for automatic text summarization from the text classification perspective. A naive Bayes classifier was used showing that some of the methods tested are better suited for such a task.

References

[1]
A. Abuobieda, N. Salim, A. Albaham, A. Osman, and Y. Kumar. Text summarization features selection method using pseudo genetic-based model. In CAMP, pages 193--197, 2012.
[2]
C. C. Aggarwal and C. Zhai. A survey of text classification algorithms. In Mining Text Data, pages 163--222. 2012.
[3]
M. de Kunder. The size of the world wide web, 2013.
[4]
H. P. Edmundson. New methods in automatic extracting. J. ACM, 16(2):264--285, Apr. 1969.
[5]
D. M. Farid, L. Zhang, C. M. Rahman, M. Hossain, and R. Strachan. Hybrid decision tree and naive bayes classifiers for multi-class classification tasks. Expert Systems with Applications, 41(4, Part 2):1937--1946, 2014.
[6]
M. A. Fattah and F. Ren. Ga, mr, ffnn, pnn and gmm based models for automatic text summarization. Comput. Speech Lang., 23(1):126--144, 2009.
[7]
R. Ferreira, L. de Souza Cabral, R. D. Lins, G. de Franca Silva, F. Freitas, G. D. C. Cavalcanti, R. Lima, S. J. Simske, and L. Favaro. Assessing sentence scoring techniques for extractive text summarization. Expert Systems with Applications, 40(14):5755--5764, 2013.
[8]
M. Ghiassi, M. Olschimke, B. Moon, and P. Arnaudo. Automated text classification using a dynamic artificial neural network model. Expert Systems with Applications, 39(12):10967--10976, 2012.
[9]
M. J. Islam, Q. M. J. Wu, M. Ahmadi, and M. A. Sid-Ahmed. Investigating the performance of naive- bayes classifiers and k- nearest neighbor classifiers. In ICCIT '07, IEEE Computer Society, 2007.
[10]
L. H. Lee, D. Isa, W. O. Choo, and W. Y. Chue. High relevance keyword extraction facility for bayesian text classification on different domains of varying characteristic. Expert Systems with Applications, 39(1):1147--1155, 2012.
[11]
C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In M.-F. Moens and S. Szpakowicz, editors, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74--81, 2004.
[12]
R. D. Lins, S. J. Simske, L. de Souza Cabral, G. de Silva, R. Lima, R. F. Mello, and L. Favaro. A multi-tool scheme for summarizing textual documents. In Proc. of 11st IADIS WWW/INTERNET 2012, pages 1--8, 2012.
[13]
X. Liu, J. J. Webster, and C. Kit. An extractive text summarizer based on significant words. In ICCPOL '09, pages 168--178, 2009. Springer-Verlag.
[14]
E. Lloret and M. Palomar. Text summarisation in progress: a literature review. Artif. Intell. Rev., 37(1):1--41, 2012.
[15]
H. P. Luhn. The automatic creation of literature abstracts. IBM J. Res. Dev., 2(2):159--165, 1958.
[16]
R. Mihalcea and S. Hassan. Using the essence of texts to improve document classification. In (RANLP), 2005.
[17]
R. Mihalcea and P. Tarau. TextRank: Bringing Order into Texts. In Conference on Empirical Methods in Natural Language Processing, 2004.
[18]
T. Mitchell. Machine Learning. McGraw-Hill Education, 1st edition, 1997.
[19]
V. G. Murdock. Aspects of sentence retrieval. PhD thesis, University of Massachusetts Amherst, 2006.
[20]
A. Nenkova and K. McKeown. A survey of text summarization techniques. In Mining Text Data, pages 43--76. Springer, 2012.
[21]
R. S. Prasad, N. M. Uplavikar, S. S. Wakhare, V. Jain, and T. A. Yedke. Feature based text summarization. In International Journal of Advances in Computing and Information Researches, volume 1, 2012.
[22]
D. Shen, Q. Yang, and Z. Chen. Noise reduction through summarization for web-page classification. Information Processing and Management, 43(6):1735--1747, 2007.
[23]
S. Tonelli and E. Pianta. Matching documents and summaries using key-concepts. In Proceedings of the French Text Mining Evaluation Workshop, 2011.

Cited By

View all
  • (2022)Classifying Documents based on Formal and Informal Writing Styles using Machine Learning Algorithms2022 2nd International Conference on Advanced Research in Computing (ICARC)10.1109/ICARC54489.2022.9753774(373-378)Online publication date: 23-Feb-2022

Index Terms

  1. Automatic Document Classification using Summarization Strategies

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DocEng '15: Proceedings of the 2015 ACM Symposium on Document Engineering
    September 2015
    248 pages
    ISBN:9781450333078
    DOI:10.1145/2682571
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 September 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automatic text summarization
    2. extrinsic summarization evaluation
    3. text classification

    Qualifiers

    • Short-paper

    Conference

    DocEng '15
    Sponsor:
    DocEng '15: ACM Symposium on Document Engineering 2015
    September 8 - 11, 2015
    Lausanne, Switzerland

    Acceptance Rates

    DocEng '15 Paper Acceptance Rate 11 of 31 submissions, 35%;
    Overall Acceptance Rate 194 of 564 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Classifying Documents based on Formal and Informal Writing Styles using Machine Learning Algorithms2022 2nd International Conference on Advanced Research in Computing (ICARC)10.1109/ICARC54489.2022.9753774(373-378)Online publication date: 23-Feb-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media