Paper
4 February 2013 Using clustering and a modified classification algorithm for automatic text summarization
Abdelkrime Aries, Houda Oufaida, Omar Nouali
Author Affiliations +
Proceedings Volume 8658, Document Recognition and Retrieval XX; 865811 (2013) https://doi.org/10.1117/12.2004001
Event: IS&T/SPIE Electronic Imaging, 2013, Burlingame, California, United States
Abstract
In this paper we describe a modified classification method destined for extractive summarization purpose. The classification in this method doesn’t need a learning corpus; it uses the input text to do that. First, we cluster the document sentences to exploit the diversity of topics, then we use a learning algorithm (here we used Naive Bayes) on each cluster considering it as a class. After obtaining the classification model, we calculate the score of a sentence in each class, using a scoring model derived from classification algorithm. These scores are used, then, to reorder the sentences and extract the first ones as the output summary. We conducted some experiments using a corpus of scientific papers, and we have compared our results to another summarization system called UNIS.1 Also, we experiment the impact of clustering threshold tuning, on the resulted summary, as well as the impact of adding more features to the classifier. We found that this method is interesting, and gives good performance, and the addition of new features (which is simple using this method) can improve summary’s accuracy.
© (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Abdelkrime Aries, Houda Oufaida, and Omar Nouali "Using clustering and a modified classification algorithm for automatic text summarization", Proc. SPIE 8658, Document Recognition and Retrieval XX, 865811 (4 February 2013); https://doi.org/10.1117/12.2004001
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Feature extraction

Bismuth

Genetic algorithms

Algorithm development

Machine learning

Current controlled current source

Electronic imaging

RELATED CONTENT


Back to Top