Abstract
Text summarization is an important method that compresses massive amounts of information into clear, succinct summaries that make it easier to grasp and extract knowledge. Text summarization tasks can be broadly divided into two types: extractive and abstractive. In this paper, the task of Extractive Summarization for Dogri language is taken up. The goal of Extractive Summarization is to extract key phrases from text into a meaningful form. The Dogri Extractive Summarization Model has been presented in this paper. Statistical features comprising of sentence-level and word-level features are employed for extracting important sentences from the given document. Word-level features include presence of common noun, proper noun, numerical information, and term frequency-inverse sentence frequency (TF-ISF) whereas sentence-level features include sentence position, sentence length and similarity to news title. A linear combination of all these features score is used to form the final score of the sentence. The ranking of sentences is then done according to the generated score and final summary is generated according to the compression ratio. In this paper, the results for five compression ratios i.e., 70%, 50%, 30%, 20% and 10% has been shown for different Rouge scores i.e., Rouge-1, Rouge-2 and Rouge-L. Also, a comparative analysis of the proposed Dogri Extractive Summarization model with other Indian Text Summarization systems like Hindi, Bengali, Punjabi, and Kannada is also presented in the paper.







Similar content being viewed by others
References
Kumar, Y., Kaur, K., Kaur, S.: Study of automatic text summarization approaches in different languages. Artif. Intell. Rev.. Intell. Rev. 54(8), 5897–5929 (2021)
Torres-Moreno, J.-M.: Automatic Text Summarization. Wiley-ISTE (2014)
Gambhir, M., Gupta, V.: Recent automatic text summarization techniques. Artif. Intell. Rev.. Intell. Rev. 47(1), 1–66 (2017)
Munot, N., Govilkar, S.S.: Comparative study of text summarization methods. Int. J. Comput. Appl.Comput. Appl. 102(12), 975–8887 (2014)
“Dogri language - Wikipedia.” Available: https://en.wikipedia.org/wiki/Dogri_language.
“Languages Included in the Eighth Schedule of the Indian Constution | Department of Official Language | Ministry of Home Affairs | GoI.” Available: http://rajbhasha.nic.in/en/languages-included-eighth-schedule-indian-constution.
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Edmundson, H.P.: New methods in automatic extracting. J. ACM 16(2), 264–285 (1969)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag.Manag. 24(5), 513–523 (1988)
Fattah, M.A., Ren, F.: GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput. Speech Lang.. Speech Lang. 23(1), 126–144 (2009)
Ferreira, R., et al.: Assessing sentence scoring techniques for extractive text summarization. Expert Syst. Appl. 40, 5755–5764 (2013)
Vale, R., Lins, R., Ferreira, R.: Assessing sentence simplification methods applied to text summarization. In: Proc. - 2018 Brazilian Conf. Intell. Syst. BRACIS 2018, pp. 49–54 (2018)
Padmalahari, E., Kumar, D.V.N.S., Prasad, S.: Automatic text summarization with statistical and linguistic features using successive thresholds. In: Proceedings of 2014 IEEE International Conference on Advanced Communication, Control and Computing Technologies, ICACCCT 2014, 2014, no. 978, pp. 1519–1524
Qaroush, A., Abu Farha, I., Ghanem, W., Washaha, M., Maali, E.: An efficient single document arabic text summarization using a combination of statistical and semantic features. J. King Saud Univ. - Comput. Inf. Sci., Mar. (2019)
Shekhar, Y.C., Sharan, A.: Hybrid approach for single text document summarization using statistical and sentiment features. Int. J. Inf. Retr. Res. 5(4), 46–70 (2015)
Gulati, A.N., Sawarkar, S.D.: A novel technique for multidocument hindi text summarization. In: 2017 Int. Conf. Nascent Technol. Eng. ICNTE 2017 - Proc., (2017)
Gupta, M., Garg, N.K.: Text summarization of hindi documents using rule based approach. In: 2016 Int. Conf. Micro-Electronics Telecommun. Eng., pp. 366–370 (2016)
Gupta, V., Lehal, G.S.: Automatic text summarization system for punjabi language. J. Emerg. Technol. Web Intell. 5(3), 257–271 (2013)
Gupta, V., Kaur, N.: A novel hybrid text summarization system for punjabi text. Cognit. Comput. 8(2), 261–277 (2016)
Desai, N., Shah, P.: Automatic text summarization using supervised machine learning technique for Hindi langauge. Int. J. Res. Eng. Technol. 05(06), 361–367 (2016)
Sehgal, S., Kumar, B., Maheshwar, Rampal, L., Chaliya, A.: A modification to graph based approach for extraction based automatic text summarization. In: Advances in Intelligent Systems and Computing, vol. 564, K. Saeed, Ed. Springer Nature Singapore, pp. 373–378 (2018)
Anam, S.A., Muntasir Rahman, A.M., Saleheen, N.N., Arif, H.: Automatic text summarization using fuzzy C-means clustering. In: 2018 Jt. 7th Int. Conf. Informatics, Electron. Vis. 2nd Int. Conf. Imaging, Vis. Pattern Recognition, ICIEV-IVPR 2018, pp. 180–184 (2019).
Pattanaik, A., Sagnika, S., Das, M., Mishra, B.S.P.: Extractive summary: an optimization approach using bat algorithm. In: Hu, Y.-C., Tiwari, S., Mishra, K.K., Trivedi, M.C. (eds.) Ambient Communications and Computer Systems, vol. 904, pp. 339–351. Springer, Singapore (2019)
Bidoki, M., Moosavi, M.R., Fakhrahmad, M.: A semantic approach to extractive multi-document summarization: applying sentence expansion for tuning of conceptual densities. Inf. Process. Manag.Manag. 57(6), 1–25 (2020)
Vetriselvi, T., Gopalan, N.P.: An improved key term weightage algorithm for text summarization using local context information and fuzzy graph sentence score. J. Ambient Intell. Humaniz. Comput., 0123456789 (2020)
Gandotra, S., Arora, B.: On creation of Dogri language corpus. J. Crit. Rev. 7(09), 2337–2343 (2020)
Gandotra, S., Arora, B.: Functional words removal techniques : a review. In: 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 546–550 (2018)
Tijani, O.D., Onashoga, S.A.: An auto-generated approach of stop words using aggregated analysis. In: 13th International Conference on Information Technology Innovation for Sustainable Development (2017)
Gandotra, S., Arora, B.: Automated stop-word list generation for Dogri corpus. Int. J. Adv. Sci. Technol. 28(19), 884–889 (2019)
Suanmali, L., Salim, N., Binwahlan, M.S.: Fuzzy logic based method for improving text summarization. Int. J. Comput. Sci. Inf. Secur. 2, 1 (2009)
Gupta, V., Lehal, G.S.: Features selection and weight learning for punjabi text summarization. Int. J. Emerg. Trends Technol. 2(2), 45–48 (2011)
Han, J., Kamber, M., Pei, J.: Getting to know your data. In: Data Mining, Elsevier, pp. 39–82 (2012)
Alguliyev, R., Aliguliyev, R., Isazade, N.: A sentence selection model and HLO algorithm for extractive text summarization. In: Appl. Inf. Commun. Technol. (AICT 2016) (2017)
Steinberger, J., Ježek, K.: Evaluation measures for text summarization. Comput. Inform. 28(2), 251–275 (2009)
Sarkar, K.: An approach to summarizing bengali news documents. In: Proc. Int. Conf. Adv. Comput. Commun. Informatics - ICACCI ’12, p. 857 (2012)
MurthyK, S.: Document summarization in Kannada using keyword extraction. Comput. Sci. Inf. Technol. 1, 121–127 (2011)
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gandotra, S., Arora, B. & Kumar, Y. Design and development of Dogri extractive summarization model for automated summary generation. Int J Digit Libr 26, 6 (2025). https://doi.org/10.1007/s00799-025-00412-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00799-025-00412-0