Abstract
Text summarization is the task of shortening text documents but retaining their overall meaning and information. A good summary should highlight the main concepts of any text document. Many statistical-based, location-based and linguistic-based techniques are available for text summarization. This paper has described a novel hybrid technique for automatic summarization of Punjabi text. Punjabi is an official language of Punjab State in India. There are very few linguistic resources available for Punjabi. The proposed summarization system is hybrid of conceptual-, statistical-, location- and linguistic-based features for Punjabi text. In this system, four new location-based features and two new statistical features (entropy measure and Z score) are used and results are very much encouraging. Support vector machine-based classifier is also used to classify Punjabi sentences into summary and non-summary sentences and to handle imbalanced data. Synthetic minority over-sampling technique is applied for over-sampling minority class data. Results of proposed system are compared with different baseline systems, and it is found that F score, Precision, Recall and ROUGE-2 score of our system are reasonably well as compared to other baseline systems. Moreover, summary quality of proposed system is comparable to the gold summary.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Campos CC, Galván PV, Coronado AV, Carpena P. Improving statistical keyword detection in short texts: entropic and clustering approaches. Elsevier’s Phys. A. 2013;392:1481–92.
Neto JL, et al. Document Clustering and Text Summarization. In: Proceedings of 4th Int.Conf. Practical Applications of Knowledge Discovery and Data Mining, London; 2000; pp. 41-55.
Gupta V, Lehal GS. Automatic keywords extraction for Punjabi language. Int J Comput Sci Issues. 2011;8:327–31.
Gupta V. Automatic stemming of words for Punjabi language. Adv Intell Syst Comput. 2014;264:73–84.
Gupta V, Lehal GS. Complete pre processing phase of Punjabi language text summarization. In: International conference on computational linguistics COLING-2012, IIT Bombay, India; 2012; pp. 199-205.
Wong CWY, Luk RWP, Ho EKS. Discovering title-like terms. Int J Inf Process Manag. 2005;41:789–800.
Kaur K, Gupta V. Keyword Extraction for Punjabi language. Indian J Comput Sci Eng (IJCSE). 2011;2:364–70.
Gupta V, Lehal GS. Named entity recognition for Punjabi language text summarization. Int J Comput Appl. 2011;33:28–32.
Gill MS, Lehal GS, Joshi SS. Part-of-speech tagging for grammar checking of Punjabi. Linguist J. 2009;8:6–22.
Gupta V, Lehal GS. A survey of text summarization extractive techniques. J Emerg Technol Web Intell. 2010;2:258–68.
Pudota N, Dattolo A, Baruzzo A, Tasso C. A new domain independent key-phrase extraction system. Digit Libr Commun Comput Infor Sci. 2010;91:67–78.
Agarwal B, Poria S, Mittal N, GelBukh A, Hussain A. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cognit Comput. 2015;7:487–99.
Atkinson J, Munoz R. Rhetorics-based multi-document summarization. Expert Syst Appl. 2013;40:4346–52.
Ferreira R, Cabral L, Freitas F, Lins R, Silva G, Simske S, Favaro L. A multi-document summarization system based on statistics and linguistic treatment. Expert Syst Appl. 2014;41:5780–7.
Salton G, Singhal A, Mitra M, Buckley C. Automatic text structuring and summarization. Inf Process Manage. 1997;33:193–207.
Mihalcea R. Language independent extractive summarization, In: proceeding of ACL2005, Association for Computational Linguistics. 2005; pp. 49–52.
Page L, Brin S, Motwani, R. Winograd, T., The pagerank citation ranking: bringing order to the web. Technical report, Stanford University, USA; 1998; pp. 1–17.
Kleinberg JM. Authoritative sources in a hyperlinked environment. J ACM (JACM). 1999;46:604–32.
Alguliev Rasim M, Aliguliyev Ramiz M, Hajirahimova Makrufa S, Mehdiyev Chingiz A. MCMR: maximum coverage and minimum redundant text summarization model. Expert Syst Appl. 2011;38:14514–22.
Huang L, He Y, Li W. Modeling document summarization as multi-objective optimization. In: proceedings of the third international symposium on intelligent information technology and security informatics, jinggabgshan, china; 2010; pp. 382–386.
Alguliev RM, Aliguliyev RM, Mehdiyev CA. Sentence selection for generic document summarization using an adaptive differential evolution algorithm. Swarm Evol Comput. 2011;1:213–22.
Gupta VK, Siddiqui TJ Multi-document summarization using sentence clustering. In: Proceedings of 4th international conference on intelligent human computer interaction (IHCI). 2012; pp. 1–5.
Babara SA, Patilb PD. Improving Performance of Text Summarization. In: international conference on information and communication technologies (ICICT 2014), Procedia Computer Science, Elsevier, vol (46). 2015; pp. 354 –363.
Gupta V, Lehal GS. Automatic Text summarization system for Punjabi Language. J Emerg Technol Web Intell. 2013;5:257–71.
Saleh MR, Valdivia MT, Ráez AM, Ureña-López LA. Experiments with SVM to classify opinions in different domains. Expert Syst Appl. 2011;38:14799–804.
PadmaPriya G, Duraiswamy K. An approach for text summarization using deep learning algorithm. J Comput Sci. 2014;10:1–9.
Gu Q, Zhifei Song Z. Image Classification Using SVM, KNN and performance comparison with logistic regression. CS44 Final project report, pp. 1–12. https://www.pdffiller.com/en/project/43663391.htm?form_id=16172581.
Burges CJC. A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc. 1998;2:121–67.
Azmia AM, Thanyyan SA. A text summarizer for Arabic. Comput Speech Lang. 2012;26:260–73.
Fattah MA, Ren F. GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput Speech Lang. 2009;23:126–44.
Peng L, Ting BT, Yang XY, Ben LS. Imbalanced Data Classification Based on AdaBoost-SVM. Int J Database Theory Appl. 2014;7:85–94.
Ertekin S. Adaptive oversampling for imbalanced data classification. Inf Sci Syst. 2013;256:261–9.
Cai Q, He H, Man H. Imbalanced evolving self-organizing learning. J Neurocomput. 2014;133:258–70.
Ganganwar V. An overview of classification algorithms for imbalanced datasets. Int J Emerg Technol Adv Eng. 2012;2:42–7.
Chawla NV, Bowyer KW, Hall LO, Kegelmayer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.
Hollander M, Wolfe DA. Book on nonparametric statistical methods. 2nd ed. USA: Wiley-Inter Science; 1999. p. 787.
Alguliev RM, Aliguliyev RM, Hajirahimova MS. GenDocSum + MCLR: generic document summarization based on maximum coverage and less redundancy. Expert Syst Appl. 2012;39:12460–73.
Yatsko VA, Starikov MS, Butakov AV. Automatic genre recognition and adaptive text summarization. Autom Doc Math Linguist. 2010;44:111–20.
Cho SG, Kimization SB. Summarization of documents by finding key sentences based on social network analysis. In: Proceedings of 28th International Conference IEA/AIE’15, Springer, South Korea. 2015; vol (28), pp. 285–292.
Ferreira R, Cabral LS, Lins RD, Silva GP, Freitas F, Cavalcanti GDC, Lima R, Simske S, Favaro L. Assessing sentence scoring techniques for extractive text summarization. Expert Syst Appl. 2013;40:5755–64.
Lloret E, Boldrini E, Vodolazova T, Martínez-Barco P, Muñoz R, Palomar M. Novel concept-level approach for ultra-concise opinion summarization. Expert Syst Appl. 2015;42:7148–56.
Nagwani NK. Summarizing large text collection using topic modeling and clustering based on MapReduce framework. J Big Data. 2015;6:2–18.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gupta, V., Kaur, N. A Novel Hybrid Text Summarization System for Punjabi Text. Cogn Comput 8, 261–277 (2016). https://doi.org/10.1007/s12559-015-9359-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-015-9359-3