Design and development of Dogri extractive summarization model for automated summary generation

Gandotra, Sonam; Arora, Bhavna; Kumar, Yogesh

doi:10.1007/s00799-025-00412-0

Design and development of Dogri extractive summarization model for automated summary generation

Published: 22 February 2025

Volume 26, article number 6, (2025)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

84 Accesses
Explore all metrics

Abstract

Text summarization is an important method that compresses massive amounts of information into clear, succinct summaries that make it easier to grasp and extract knowledge. Text summarization tasks can be broadly divided into two types: extractive and abstractive. In this paper, the task of Extractive Summarization for Dogri language is taken up. The goal of Extractive Summarization is to extract key phrases from text into a meaningful form. The Dogri Extractive Summarization Model has been presented in this paper. Statistical features comprising of sentence-level and word-level features are employed for extracting important sentences from the given document. Word-level features include presence of common noun, proper noun, numerical information, and term frequency-inverse sentence frequency (TF-ISF) whereas sentence-level features include sentence position, sentence length and similarity to news title. A linear combination of all these features score is used to form the final score of the sentence. The ranking of sentences is then done according to the generated score and final summary is generated according to the compression ratio. In this paper, the results for five compression ratios i.e., 70%, 50%, 30%, 20% and 10% has been shown for different Rouge scores i.e., Rouge-1, Rouge-2 and Rouge-L. Also, a comparative analysis of the proposed Dogri Extractive Summarization model with other Indian Text Summarization systems like Hindi, Bengali, Punjabi, and Kannada is also presented in the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature Selection and Extraction for Dogri Text Summarization

Two-Level Text Summarization with Natural Language Processing

State-of-the-art approach to extractive text summarization: a comprehensive review

Article 16 February 2023

Data availability

https://github.com/Sonam2/Dogri_Corpus. https://kaggle.com/datasets/9f5501bbed7f1eb687290e725b9afef1aff2269b270f9256b933c3faf853ea53.

References

Kumar, Y., Kaur, K., Kaur, S.: Study of automatic text summarization approaches in different languages. Artif. Intell. Rev.. Intell. Rev. 54(8), 5897–5929 (2021)
MATH Google Scholar
Torres-Moreno, J.-M.: Automatic Text Summarization. Wiley-ISTE (2014)
MATH Google Scholar
Gambhir, M., Gupta, V.: Recent automatic text summarization techniques. Artif. Intell. Rev.. Intell. Rev. 47(1), 1–66 (2017)
MATH Google Scholar
Munot, N., Govilkar, S.S.: Comparative study of text summarization methods. Int. J. Comput. Appl.Comput. Appl. 102(12), 975–8887 (2014)
MATH Google Scholar
“Dogri language - Wikipedia.” Available: https://en.wikipedia.org/wiki/Dogri_language.
“Languages Included in the Eighth Schedule of the Indian Constution | Department of Official Language | Ministry of Home Affairs | GoI.” Available: http://rajbhasha.nic.in/en/languages-included-eighth-schedule-indian-constution.
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
MathSciNet MATH Google Scholar
Edmundson, H.P.: New methods in automatic extracting. J. ACM 16(2), 264–285 (1969)
MATH Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag.Manag. 24(5), 513–523 (1988)
MATH Google Scholar
Fattah, M.A., Ren, F.: GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput. Speech Lang.. Speech Lang. 23(1), 126–144 (2009)
MATH Google Scholar
Ferreira, R., et al.: Assessing sentence scoring techniques for extractive text summarization. Expert Syst. Appl. 40, 5755–5764 (2013)
MATH Google Scholar
Vale, R., Lins, R., Ferreira, R.: Assessing sentence simplification methods applied to text summarization. In: Proc. - 2018 Brazilian Conf. Intell. Syst. BRACIS 2018, pp. 49–54 (2018)
Padmalahari, E., Kumar, D.V.N.S., Prasad, S.: Automatic text summarization with statistical and linguistic features using successive thresholds. In: Proceedings of 2014 IEEE International Conference on Advanced Communication, Control and Computing Technologies, ICACCCT 2014, 2014, no. 978, pp. 1519–1524
Qaroush, A., Abu Farha, I., Ghanem, W., Washaha, M., Maali, E.: An efficient single document arabic text summarization using a combination of statistical and semantic features. J. King Saud Univ. - Comput. Inf. Sci., Mar. (2019)
Shekhar, Y.C., Sharan, A.: Hybrid approach for single text document summarization using statistical and sentiment features. Int. J. Inf. Retr. Res. 5(4), 46–70 (2015)
MATH Google Scholar
Gulati, A.N., Sawarkar, S.D.: A novel technique for multidocument hindi text summarization. In: 2017 Int. Conf. Nascent Technol. Eng. ICNTE 2017 - Proc., (2017)
Gupta, M., Garg, N.K.: Text summarization of hindi documents using rule based approach. In: 2016 Int. Conf. Micro-Electronics Telecommun. Eng., pp. 366–370 (2016)
Gupta, V., Lehal, G.S.: Automatic text summarization system for punjabi language. J. Emerg. Technol. Web Intell. 5(3), 257–271 (2013)
MATH Google Scholar
Gupta, V., Kaur, N.: A novel hybrid text summarization system for punjabi text. Cognit. Comput. 8(2), 261–277 (2016)
MATH Google Scholar
Desai, N., Shah, P.: Automatic text summarization using supervised machine learning technique for Hindi langauge. Int. J. Res. Eng. Technol. 05(06), 361–367 (2016)
MATH Google Scholar
Sehgal, S., Kumar, B., Maheshwar, Rampal, L., Chaliya, A.: A modification to graph based approach for extraction based automatic text summarization. In: Advances in Intelligent Systems and Computing, vol. 564, K. Saeed, Ed. Springer Nature Singapore, pp. 373–378 (2018)
Anam, S.A., Muntasir Rahman, A.M., Saleheen, N.N., Arif, H.: Automatic text summarization using fuzzy C-means clustering. In: 2018 Jt. 7th Int. Conf. Informatics, Electron. Vis. 2nd Int. Conf. Imaging, Vis. Pattern Recognition, ICIEV-IVPR 2018, pp. 180–184 (2019).
Pattanaik, A., Sagnika, S., Das, M., Mishra, B.S.P.: Extractive summary: an optimization approach using bat algorithm. In: Hu, Y.-C., Tiwari, S., Mishra, K.K., Trivedi, M.C. (eds.) Ambient Communications and Computer Systems, vol. 904, pp. 339–351. Springer, Singapore (2019)
MATH Google Scholar
Bidoki, M., Moosavi, M.R., Fakhrahmad, M.: A semantic approach to extractive multi-document summarization: applying sentence expansion for tuning of conceptual densities. Inf. Process. Manag.Manag. 57(6), 1–25 (2020)
MATH Google Scholar
Vetriselvi, T., Gopalan, N.P.: An improved key term weightage algorithm for text summarization using local context information and fuzzy graph sentence score. J. Ambient Intell. Humaniz. Comput., 0123456789 (2020)
Gandotra, S., Arora, B.: On creation of Dogri language corpus. J. Crit. Rev. 7(09), 2337–2343 (2020)
MATH Google Scholar
Gandotra, S., Arora, B.: Functional words removal techniques : a review. In: 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 546–550 (2018)
Tijani, O.D., Onashoga, S.A.: An auto-generated approach of stop words using aggregated analysis. In: 13th International Conference on Information Technology Innovation for Sustainable Development (2017)
Gandotra, S., Arora, B.: Automated stop-word list generation for Dogri corpus. Int. J. Adv. Sci. Technol. 28(19), 884–889 (2019)
Google Scholar
Suanmali, L., Salim, N., Binwahlan, M.S.: Fuzzy logic based method for improving text summarization. Int. J. Comput. Sci. Inf. Secur. 2, 1 (2009)
Google Scholar
Gupta, V., Lehal, G.S.: Features selection and weight learning for punjabi text summarization. Int. J. Emerg. Trends Technol. 2(2), 45–48 (2011)
MATH Google Scholar
Han, J., Kamber, M., Pei, J.: Getting to know your data. In: Data Mining, Elsevier, pp. 39–82 (2012)
Alguliyev, R., Aliguliyev, R., Isazade, N.: A sentence selection model and HLO algorithm for extractive text summarization. In: Appl. Inf. Commun. Technol. (AICT 2016) (2017)
Steinberger, J., Ježek, K.: Evaluation measures for text summarization. Comput. Inform. 28(2), 251–275 (2009)
MATH Google Scholar
Sarkar, K.: An approach to summarizing bengali news documents. In: Proc. Int. Conf. Adv. Comput. Commun. Informatics - ICACCI ’12, p. 857 (2012)
MurthyK, S.: Document summarization in Kannada using keyword extraction. Comput. Sci. Inf. Technol. 1, 121–127 (2011)
Google Scholar

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Departemnt of Higher Education, Cluster University of Jammu, Jammu, J&K, India
Sonam Gandotra
Department of Computer Science and IT, Central University of Jammu, Jammu, India
Bhavna Arora
Department of CSE, School of Technology, Pandit Deendayal Energy University, Gandhinagar, Gujarat, India
Yogesh Kumar

Authors

Sonam Gandotra
View author publications
You can also search for this author inPubMed Google Scholar
Bhavna Arora
View author publications
You can also search for this author inPubMed Google Scholar
Yogesh Kumar
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yogesh Kumar.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gandotra, S., Arora, B. & Kumar, Y. Design and development of Dogri extractive summarization model for automated summary generation. Int J Digit Libr 26, 6 (2025). https://doi.org/10.1007/s00799-025-00412-0

Download citation

Received: 21 July 2023
Revised: 23 September 2024
Accepted: 19 January 2025
Published: 22 February 2025
DOI: https://doi.org/10.1007/s00799-025-00412-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design and development of Dogri extractive summarization model for automated summary generation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Feature Selection and Extraction for Dogri Text Summarization

Two-Level Text Summarization with Natural Language Processing

State-of-the-art approach to extractive text summarization: a comprehensive review

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now