short-paper

Automatic Document Classification using Summarization Strategies

Authors:

Rafael Ferreira,

Rafael Dueire Lins,

Luciano Cabral,

Steven J. Simske,

Marcelo RissAuthors Info & Claims

DocEng '15: Proceedings of the 2015 ACM Symposium on Document Engineering

Pages 69 - 72

https://doi.org/10.1145/2682571.2797077

Published: 08 September 2015 Publication History

Abstract

An efficient way to automatically classify documents may be provided by automatic text summarization, the task of creating a shorter text from one or several documents. This paper presents an assessment of the 15 most widely used methods for automatic text summarization from the text classification perspective. A naive Bayes classifier was used showing that some of the methods tested are better suited for such a task.

References

[1]

A. Abuobieda, N. Salim, A. Albaham, A. Osman, and Y. Kumar. Text summarization features selection method using pseudo genetic-based model. In CAMP, pages 193--197, 2012.

[2]

C. C. Aggarwal and C. Zhai. A survey of text classification algorithms. In Mining Text Data, pages 163--222. 2012.

[3]

M. de Kunder. The size of the world wide web, 2013.

[4]

H. P. Edmundson. New methods in automatic extracting. J. ACM, 16(2):264--285, Apr. 1969.

Digital Library

[5]

D. M. Farid, L. Zhang, C. M. Rahman, M. Hossain, and R. Strachan. Hybrid decision tree and naive bayes classifiers for multi-class classification tasks. Expert Systems with Applications, 41(4, Part 2):1937--1946, 2014.

Digital Library

[6]

M. A. Fattah and F. Ren. Ga, mr, ffnn, pnn and gmm based models for automatic text summarization. Comput. Speech Lang., 23(1):126--144, 2009.

Digital Library

[7]

R. Ferreira, L. de Souza Cabral, R. D. Lins, G. de Franca Silva, F. Freitas, G. D. C. Cavalcanti, R. Lima, S. J. Simske, and L. Favaro. Assessing sentence scoring techniques for extractive text summarization. Expert Systems with Applications, 40(14):5755--5764, 2013.

[8]

M. Ghiassi, M. Olschimke, B. Moon, and P. Arnaudo. Automated text classification using a dynamic artificial neural network model. Expert Systems with Applications, 39(12):10967--10976, 2012.

Digital Library

[9]

M. J. Islam, Q. M. J. Wu, M. Ahmadi, and M. A. Sid-Ahmed. Investigating the performance of naive- bayes classifiers and k- nearest neighbor classifiers. In ICCIT '07, IEEE Computer Society, 2007.

Digital Library

[10]

L. H. Lee, D. Isa, W. O. Choo, and W. Y. Chue. High relevance keyword extraction facility for bayesian text classification on different domains of varying characteristic. Expert Systems with Applications, 39(1):1147--1155, 2012.

Digital Library

[11]

C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In M.-F. Moens and S. Szpakowicz, editors, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74--81, 2004.

[12]

R. D. Lins, S. J. Simske, L. de Souza Cabral, G. de Silva, R. Lima, R. F. Mello, and L. Favaro. A multi-tool scheme for summarizing textual documents. In Proc. of 11st IADIS WWW/INTERNET 2012, pages 1--8, 2012.

[13]

X. Liu, J. J. Webster, and C. Kit. An extractive text summarizer based on significant words. In ICCPOL '09, pages 168--178, 2009. Springer-Verlag.

Digital Library

[14]

E. Lloret and M. Palomar. Text summarisation in progress: a literature review. Artif. Intell. Rev., 37(1):1--41, 2012.

Digital Library

[15]

H. P. Luhn. The automatic creation of literature abstracts. IBM J. Res. Dev., 2(2):159--165, 1958.

Digital Library

[16]

R. Mihalcea and S. Hassan. Using the essence of texts to improve document classification. In (RANLP), 2005.

[17]

R. Mihalcea and P. Tarau. TextRank: Bringing Order into Texts. In Conference on Empirical Methods in Natural Language Processing, 2004.

[18]

T. Mitchell. Machine Learning. McGraw-Hill Education, 1st edition, 1997.

Digital Library

[19]

V. G. Murdock. Aspects of sentence retrieval. PhD thesis, University of Massachusetts Amherst, 2006.

Digital Library

[20]

A. Nenkova and K. McKeown. A survey of text summarization techniques. In Mining Text Data, pages 43--76. Springer, 2012.

[21]

R. S. Prasad, N. M. Uplavikar, S. S. Wakhare, V. Jain, and T. A. Yedke. Feature based text summarization. In International Journal of Advances in Computing and Information Researches, volume 1, 2012.

[22]

D. Shen, Q. Yang, and Z. Chen. Noise reduction through summarization for web-page classification. Information Processing and Management, 43(6):1735--1747, 2007.

Digital Library

[23]

S. Tonelli and E. Pianta. Matching documents and summaries using key-concepts. In Proceedings of the French Text Mining Evaluation Workshop, 2011.

Cited By

Karunarathna KRupasingha RKumara B(2022)Classifying Documents based on Formal and Informal Writing Styles using Machine Learning Algorithms2022 2nd International Conference on Advanced Research in Computing (ICARC)10.1109/ICARC54489.2022.9753774(373-378)Online publication date: 23-Feb-2022
https://doi.org/10.1109/ICARC54489.2022.9753774

Index Terms

Automatic Document Classification using Summarization Strategies
1. Applied computing
  1. Document management and text processing

Recommendations

Automatic Text Summarization Methods: A Comprehensive Review
Abstract
Text summarization is the process of condensing a long text into a shorter version by maintaining the key information and its meaning. Automatic text summarization can save time and helps in selecting the important and relevant sentences from the ...
A topic modeled unsupervised approach to single document extractive text summarization
Abstract
Automatic Text Summarization (ATS) is an essential field in natural language processing that attempts to condense large text documents so that users can assimilate information quickly. It finds uses in medical document summarization, ...
Assessing shallow sentence scoring techniques and combinations for single and multi-document summarization

We investigate eighteen shallow sentence scoring techniques and ensemble strategies.Experiments were performed in several datasets for single- and multi-document task.Ensemble strategies lead to improvements over the individual scoring ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DocEng '15: Proceedings of the 2015 ACM Symposium on Document Engineering

September 2015

248 pages

ISBN:9781450333078

DOI:10.1145/2682571

General Chair:
Christine Vanoirbeek
EPFL, Switzerland
,
Program Chair:
Pierre Genevès
CNRS, France

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 September 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

DocEng '15

Sponsor:

SIGWEB

DocEng '15: ACM Symposium on Document Engineering 2015

September 8 - 11, 2015

Lausanne, Switzerland

Acceptance Rates

DocEng '15 Paper Acceptance Rate 11 of 31 submissions, 35%;

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
259
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Karunarathna KRupasingha RKumara B(2022)Classifying Documents based on Formal and Informal Writing Styles using Machine Learning Algorithms2022 2nd International Conference on Advanced Research in Computing (ICARC)10.1109/ICARC54489.2022.9753774(373-378)Online publication date: 23-Feb-2022
https://doi.org/10.1109/ICARC54489.2022.9753774

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten