Abstract
Creating a coherent summary of the text is a challenging task in the field of Natural Language Processing (NLP). Various Automatic Text Summarization techniques have been developed for abstractive as well as extractive summarization. This study focuses on extractive summarization which is a process containing selected delineative paragraphs or sentences from the original text and combining these into smaller forms than the document(s) to generate a summary. The methods that have been used for extractive summarization are based on a graph-theoretic approach, machine learning, Latent Semantic Analysis (LSA), neural networks, cluster, and fuzzy logic. In this paper, a semantic graph-based approach SGATS (Semantic Graph-based approach for Automatic Text Summarization) is proposed to generate an extractive summary. The proposed approach constructs a semantic graph of the original Hindi text document by establishing a semantic relationship between sentences of the document using Hindi Wordnet ontology as a background knowledge source. Once the semantic graph is constructed, fourteen different graph theoretical measures are applied to rank the document sentences depending on their semantic scores. The proposed approach is applied to two data sets of different domains of Tourism and Health. The performance of the proposed approach is compared with the state-of-the-art TextRank algorithm and human-annotated summary. The performance of the proposed system is evaluated using widely accepted ROUGE measures. The outcomes exhibit that our proposed system produces better results than TextRank for health domain corpus and comparable results for tourism corpus. Further, correlation coefficient methods are applied to find a correlation between eight different graphical measures and it is observed that most of the graphical measures are highly correlated.
- R. Regina Barzilay and Kathleen R. McKeown. 2005. Sentence fusion for multi-document news summarization. Computational Linguistics 31, 3 (2005), 297–328.Google ScholarCross Ref
- Mathieu Bastian, Sebestian Heymann, and Mathieu Jacomy. 2009. Gephi: An open-source software for exploring and manipulating networks. Icwsm, 8, 3(1) (2009), 361–362.Google Scholar
- Samuel W. K. Chan, Tom B. Y. Lai, T. B., W. J. Gao, and Benjamin K. T. Sou. 2000. Mining discourse markers for Chinese textual summarization. In NAACL-ANLP 2000 Workshop: Automatic Summarization. 11–20. Google ScholarDigital Library
- Vipul Dalal and Latesh Malik. 2013. A survey of extractive and abstractive text summarization techniques. In 2013 6th International Conference on Emerging Trends in Engineering and Technology, Nagpur, India. IEEE, 109–110. doi: 10.1109/ICETET.2013.31 Google ScholarDigital Library
- Vipul Dalal and Latesh Malik. 2018. Semantic graph-based automatic text summarization for Hindi documents using particle swarm optimization. In International Conference on Information and Communication Technology for Intelligent Systems (ICTIS 2017). Springer, Cham, 84, 284–289.Google ScholarCross Ref
- Gerald Francis DeJong. 1978. Fast Skimming of News Stories: The FRUMP System. New Haven: Yale University.Google Scholar
- Brigitte Endres-Niggemeyer. 2012. Summarizing Information: Including CD-ROM “SimSum”, Simulation of Summarizing, for Macintosh and Windows. Springer Science & Business Media. Google ScholarDigital Library
- Gunes Erkan, and Dragomir R. Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22 (2004), 457–479. Google ScholarDigital Library
- Gunes Erkan and Dragomir R. Radev. 2004. Lexpagerank: Prestige in multi-document text summarization. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 365–371.Google Scholar
- Katija Filippova. 2010. Multi-sentence compression: Finding shortest paths in word graphs. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). 322–330. Google ScholarDigital Library
- Albert Gatt and Ehud Reiter. 2009. SimpleNLG: A realization engine for practical applications. In Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009). 90–93. Google ScholarDigital Library
- Kavita Ganesan, Cheng Xiang Zha, and Jiawei Han. 2010. Opinosis: A graph-based approach to abstractive summarization of highly redundant opinions. In Proceedings of the 23rd International Conference on Computational Linguistics, ser. COLING'10. Stroudsburg, PA, USA: Association of Computational Linguistics. 340–348. Google ScholarDigital Library
- Pierre-Etienne Genest and Guy Lapalme. 2011. Framework for abstractive summarization using text-to-text generation. In Proceedings of the Workshop on Monolingual Text-to-Text Generation. 64–73. Google ScholarDigital Library
- Pierre-Etienne Genest and Guy Lapalme. 2012. Fully abstractive approach to guided summarization. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 354–358. Google ScholarDigital Library
- Yihong Gong and Xin Li. 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 19–25. Google ScholarDigital Library
- Thomas R. Gruber. 1993. A translation approach to portable ontology specifications. Knowledge Acquisition 5, 2 (1993), 199–220. Google ScholarDigital Library
- Vishal Gupta and Gurpreet Singh Lehal. 2010. A survey of text summarization extractive techniques. Journal of Emerging Technologies in Web Intelligence 2, 3 (2010), 258–268.Google ScholarCross Ref
- X. Han, T. Lv, Z. Hu, X. Wang, and C. Wang. 2016. Text summarization using framenet-based semantic graph model. Scientific Programming 2016. Google ScholarDigital Library
- Sanda M. Harabagiu and Finley Lacatusu. 2002. Generating single and multi-document summaries with gistexter. In Document Understanding Conferences. 11–12.Google Scholar
- Jan Hauke and Tomasz Kossowski. 2011. Comparison of values of Pearson's and Spearman's correlation coefficients on the same sets of data. Quaestiones Geographicae 30, 2 (2011), 87–93.Google ScholarCross Ref
- A. John and M. Wilscy. 2013. Random forest classifier based multi-document summarization system. In 2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS). IEEE, 31–36.Google Scholar
- Atif Khan Naomie Salim and Yogan Jaya Kumar. 2015a. A framework for multi-document abstractive summarization based on semantic role labeling. Applied Soft Computing 30 (2015a), 737–747. Google ScholarDigital Library
- Atif Khan Naomie Salim and Yogan Jaya Kumar. 2015b. Genetic semantic graph approach for multi-document abstractive summarization. In 2015 Fifth International Conference on Digital Information Processing and Communications (ICDIPC). IEEE, 173–181.Google Scholar
- Atif Khan, Mohammad Adnan Gul, Mahdi Zareei, R. R. Biswal, Asim Zeb, Muhammad Naeem, Yousaf Saeed, and Naomie Salim. 2020. Movie review summarization using supervised learning and graph-based ranking algorithm. Computational Intelligence and Neuroscience 2020.Google Scholar
- Mithesh M. Khapra, Anup Kulkarni, Saurabh Sohoney, and Pushpak Bhattacharyya. 2010. All words domain adapted WSD: Finding a middle ground between supervision and unsupervision. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 1532–1541. Google ScholarDigital Library
- K. Vimal Kumar, Diwakar Yadav, and Arun Sharma. 2015. The graph-based technique for Hindi text summarization. In Information Systems Design and Intelligent Applications. Springer, New Delhi, 301–310.Google Scholar
- Le Huang Thanh and Tien Manh Le. 2013 An approach to abstractive text summarization. In 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR). IEEE, 371–376.Google Scholar
- Chang-Shing Lee, Zhi-Wei Jian, and Lin-Kai Huang. 2005. A fuzzy ontology and its application to news summarization. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 35, 5 (2005), 859–880. Google ScholarDigital Library
- Leskovec Jure, Marko Grobelnik, and Natasa Milic-Frayling. 2004a. Learning semantic graph mapping for document summarization. In Proceedings of ECML/PKDD-2004 Workshop on Knowledge Discovery and Ontologies.Google Scholar
- Leskovec Jure, Marko Grobelnik, and Natasa Milic-Frayling. 2004b. Learning sub-structures of document semantic graphs for document summarization. In LinkKDD Workshop. 133–138.Google Scholar
- Leskovec Jure, Natasa Milic-Frayling, and Marko Grobelnik. 2005. Extracting summary Sentences Based on the document Semantic Graph.Google Scholar
- Wenjie Li, Mingli Wu, Qin Lu, Wei Xu, and Chunfa Yuan. 2006. Extractive summarization using inter-and intra-event relevance. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. 369–376. Google ScholarDigital Library
- Marina Litvak and Mark Last. 2008. Graph-based keyword extraction for single-document summarization. In Coling 2008: Proceedings of the workshop Multi-Source Multilingual Information Extraction and Summarization. 17–24. Google ScholarDigital Library
- Yong Liu, XIaolei Wang, Jin Zhang, and Hongbo Xu. 2008. Personalized PageRank based multi-document summarization. In IEEE International Workshop on Semantic Computing and Systems. IEEE, 169–173. Google ScholarDigital Library
- Elina Lloret and Manuel Palomar. 2011. Analyzing the use of word graphs for abstractive text summarization. In Proceedings of the First International Conference on Advances in Information Mining and Management. Barcelona, Spain, 61–6.Google Scholar
- Elina Lloret, Ester Boldrini, Tatiana Vodolazova, Patricio Martínez-Barco, Rafael Muñoz and Manuel Palomar. 2015. A novel concept-level approach for ultra-concise opinion summarization. Expert Systems with Applications 42, 20 (2015), 7148–7156. Google ScholarDigital Library
- Hans Peter Luhn. 1958. The automatic creation of literature abstracts. IBM Journal of Research and Development 2, 2 (1958), 159–165. Google ScholarDigital Library
- Jian-Ping Mei and Lihui Chen. 2012. SumCR: A new subtopic-based extractive approach for text summarization. Knowledge and Information Systems 31, 3 (2012), 527–545. Google ScholarDigital Library
- Martha Mendoza, Susana Bonilla, Clara Noguera, Carlos Cobos, and Elizabeth León. 2014. Extractive single-document summarization based on genetic operators and guided local search. Expert Systems with Applications 41, 9 (2014), 4158–4169. Google ScholarDigital Library
- Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 404–411.Google Scholar
- Rada Mihalcea and Dragomir Radev. 2011. Graph-based Natural Language Processing and information Retrieval. Cambridge University Press. Google ScholarDigital Library
- Namita Mittal, Basant Agarwal, Nikita Vijay, and Adarsh Gupta. 2013. Text Summarization with Semantics Information. In the 10th International Conference on Natural Language Processing (ICON). 256-259.Google Scholar
- Ibrahim F. Moawad and Mostafa Aref. 2012. Semantic graph reduction approach for abstractive Text Summarization. In 2012 Seventh International Conference on Computer Engineering & Systems (ICCES). IEEE, 132–138.Google Scholar
- Muhidin Mohamed and Mourad Oussalah. 2016. An iterative graph-based generic single and multi-document summarization approach using semantic role labeling and Wikipedia concepts. In 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService). IEEE, 117–120.Google ScholarCross Ref
- N. Moratanch and S. Chitrakala. 2016. A survey on abstractive text summarization. In 2016 International Conference on Circuit, Power, and Computing Technologies (ICCPCT). IEEE, 1–7.Google Scholar
- Nikita Munot and Sharvari S. Govilkar. 2015. A conceptual framework for abstractive text summarization. Int. J. Nat. Lang. Comput 4, 39–50.Google ScholarCross Ref
- Akash Ajampura Natesh, Somaiah Thimmaiah Balekuttira, and Annapurna P. Patil. 2016. A graph-based approach for automatic text summarization. International Journal of Advanced Research in Computer and Communication Engineering 5, 2 (2016), 6–9.Google Scholar
- Hilario Oliveira, Rinaldo Lima, Rafael Dueire Lins, Fred Freitas, Marcelo Riss, and Steven J. Simske. 2016. Assessing concept weighting in integer linear programming based single-document summarization. In Proceedings of the 2016 ACM Symposium on Document Engineering. 205–208. Google ScholarDigital Library
- Mohsen Pourvali and Ph D. Mohammad Saniee Abadeh. 2012. A new graph-based text segmentation using Wikipedia for automatic text summarization. International Journal of Advanced Computer Science and Applications (IJACSA) 3, 1 (2012).Google Scholar
- Rasmita Rautray and Rakesh Chandra Balabantaray. 2018. An evolutionary framework for multi-document summarization using the cuckoo search approach: MDSCSA. Applied Computing and Informatics 14, 2 (2018), 134–144.Google ScholarCross Ref
- Dragomir R. Radev, Hongyan Jing, Matgorzata Stys, and Daniel Tam. 2004. Centroid-based summarization of multiple documents. Information Processing & Management 40, 6 (2004), 919–938. Google ScholarDigital Library
- Hanumant Redkar, Rajita Shukla, Sandhya Singh, Jaya Saraswati, Laxmi Kashyap, Diptesh Kanojia, Preethi Jyothi, Malhar Kulkarni, and Pushpak Bhattacharyya. 2018. Hindi wordnet for language teaching: Experiences and lessons learnt. In Proceedings of the 9th Global WordNet Conference (GWC 2018) 317.Google Scholar
- Yogesh Sankarasubramaniam, Krishnan Ramanathan, and Subhankar Ghosh. 2014. Text summarization using wikipedia. Information Processing & Management 50, 3 (2014), 443–461.Google ScholarCross Ref
- Swati Sargule and Ramesh M. Kagalkar. 2016. Strategy for Hindi text summarization using content-based indexing approach. International Journal of Computer Sciences and Engineering 36.Google Scholar
- H. Kaur and M. Kumar. 2018. A comprehensive survey on word recognition for non-Indic and Indic scripts. Pattern Analysis and Applications 21, 4 (2018), 897–929. Google ScholarDigital Library
- Xiaoping Sun and Hai Zhuge. 2018. Summarization of scientific paper through reinforcement ranking on Semantic Link Network. IEEE Access 6, 40611–40625.Google ScholarCross Ref
- C. Sunitha, A. Jaya, and Amal Ganesh. 2016. A study on abstractive summarization techniques in Indian languages. Procedia Computer Science 87 (2016), 25–31.Google ScholarCross Ref
- Hideki Tanaka, Akinori Kinoshita, Takeshi Kobayakawa, Tadashi Kumano, and Naoto Kato. 2009. Syntax-driven sentence revision for broadcast news summarization. In Proceedings of the 2009 Workshop on Language Generation and Summarisation (UCNLG+ Sum 2009), 39–47. Google ScholarDigital Library
- Antoine Tixier, Konstantinos Skianis, and Michalis Vazirgiannis. 2016. Gowvis: A web application for graph-of-words-based text visualization and summarization. In Proceedings of ACL-2016 System Demonstrations. 151–156.Google ScholarCross Ref
- Jorge V. Tohalino and Diego R. Amancio. 2018. Extractive multi-document summarization using multilayer networks. Physica A: Statistical Mechanics and its Applications 503 (2018), 526–539.Google Scholar
- M. Kumar, M. K. Jindal, R. K. Sharma and S. R. Jindal. 2020. Performance evaluation of classifiers for the recognition of offline handwritten Gurmukhi characters and numerals: A study. Artificial Intelligence Review 53, 3 (2020), 2075–2097.Google ScholarCross Ref
- George Tsatsaronis, Iraklis Varlamis, and Kjetil Nørvåg. 2010. SemanticRank: Ranking keywords and sentences using semantic graphs. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). 1074–1082. Google ScholarDigital Library
- Jai Prakash Verma and Atul Patel. 2017. Evaluation of unsupervised learning based extractive text summarization technique for large scale review and feedback data. Indian Journal of Science and Technology 10, 17 (2017), 1–6.Google Scholar
- M. Kumar, S. R. Jindal, M. K. Jindal, and G. S. Lehal. 2019. Improved recognition results of medieval handwritten Gurmukhi manuscripts using boosting and bagging methodologies. Neural Processing Letters 50, 1 (2019), 43–56.Google ScholarDigital Library
- S. R. Narang, M. K. Jindal, S. Ahuja and M. Kumar. 2020. On the recognition of Devanagari ancient handwritten characters using SIFT and gabor features. Soft Computing 24, 22 (2020), 17279–17289.Google ScholarDigital Library
- Xiaojun Wan and Jianguo Xia. 2009. Graph-based multi-modality learning for topic-focused multi-document summarization. In Twenty-First International Joint Conference on Artificial Intelligence. Google ScholarDigital Library
- Dingding Wang and Tao Li. 2012. Weighted consensus multi-document summarization. Information Processing & Management 48, 3 (2012), 513–523. Google ScholarDigital Library
- Dingding Wang, Tao Li, Shenghou Zhu, and Chris Ding. 2008. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 307–314. Google ScholarDigital Library
- Furu Wei, Wenjie Li, Qin Lu, and Yanxiang He. 2010. A document-sensitive graph model for multi-document summarization. Knowledge and Information Systems 22, 2 (2010), 245–259.Google ScholarDigital Library
- Kam Fai Wong, Mingli Wu, and Wenjie Li. 2008. Extractive summarization using supervised and semi-supervised learning. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008). 985–992. Google ScholarDigital Library
- S. Narang, M. K. Jindal and M. Kumar. 2019. Devanagari ancient documents recognition using statistical feature extraction techniques. Sādhanā 44, 6 (2019), 1–8.Google ScholarCross Ref
- S. Dargan and M. Kumar. 2019. Writer identification system for indic- and non-indic scripts: State-of-the-art survey. Archives of Computational Methods in Engineering 26, 4 (2019), 1283–1311.Google ScholarCross Ref
- T. F. Wu, C. J. Lin, and R. C. Weng. 2004. Probability estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research 5(Aug), 975–1005. Google ScholarDigital Library
- Kang Yang, Kamal Al-Sabahi, Yanmin Xiang, and Zuping Zhang. 2018. An integrated graph model for document summarization. Information 9, 9 (2018), 232.Google ScholarCross Ref
- S. R. Narang, M. K. Jindal and M. Kumar. 2019. Devanagari ancient character recognition using DCT features with adaptive boosting and bootstrap aggregating. Soft Computing 23, 24 (2019), 13603–13614.Google ScholarCross Ref
- Jen-Yuan Yeh, Hao-Ren Ke, and Wei-Pang Yang. 2008. Ispreadrank: Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network. Expert Systems with Applications 35, 3 (2008), 1451–1462. Google ScholarDigital Library
- Hai-Tao Zheng and Shao-Zhou Bai. 2014. Graph-based summarization without redundancy. In Asia-Pacific Web Conference. Springer, Cham, 449–460.Google ScholarCross Ref
- M. Kumar, M. K. Jindal, R. K. Sharma, and S. R. Jindal. 2019. Character and numeral recognition for non-Indic and Indic scripts: A survey. Artificial Intelligence Review 52, 4 (2019), 2235–2261.Google ScholarDigital Library
- (Hindi Wordnet), Hindi Wordnet. [Online]. Available: http://www.cfilt.iitb.ac.in/wordnet/webhwn/Accessed 23 August 2020.Google Scholar
- George A. Miller. 1995. WordNet: A lexical database for english. Communications of the ACM 38, 11 (1995), 39–41. Google ScholarDigital Library
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out 74–81. 2004.Google Scholar
- M. Kumar and S. R. Jindal. 2020. A study on recognition of pre-segmented handwritten multi-lingual characters. Archives of Computational Methods in Engineering 27, 2 (2020), 577–589.Google ScholarCross Ref
Index Terms
- SGATS: Semantic Graph-based Automatic Text Summarization from Hindi Text Documents
Recommendations
A Comparative Analysis on Hindi and English Extractive Text Summarization
Text summarization is the process of transfiguring a large documental information into a clear and concise form. In this article, we present a detailed comparative study of various extractive methods for automatic text summarization on Hindi and English ...
Graph-based abstractive biomedical text summarization
Graphical abstractDisplay Omitted
Highlights- A graph generation and frequent itemset mining approach have been used for the generation of extractive summaries.
AbstractSummarization is the process of compressing a text to obtain its important informative parts. In recent years, various methods have been presented to extract important parts of textual documents to present them in a summarized form. ...
Extractive text summarization using clustering-based topic modeling
AbstractText summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...
Comments