Skip to main content

Advertisement

Log in

Design and development of Dogri extractive summarization model for automated summary generation

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

Text summarization is an important method that compresses massive amounts of information into clear, succinct summaries that make it easier to grasp and extract knowledge. Text summarization tasks can be broadly divided into two types: extractive and abstractive. In this paper, the task of Extractive Summarization for Dogri language is taken up. The goal of Extractive Summarization is to extract key phrases from text into a meaningful form. The Dogri Extractive Summarization Model has been presented in this paper. Statistical features comprising of sentence-level and word-level features are employed for extracting important sentences from the given document. Word-level features include presence of common noun, proper noun, numerical information, and term frequency-inverse sentence frequency (TF-ISF) whereas sentence-level features include sentence position, sentence length and similarity to news title. A linear combination of all these features score is used to form the final score of the sentence. The ranking of sentences is then done according to the generated score and final summary is generated according to the compression ratio. In this paper, the results for five compression ratios i.e., 70%, 50%, 30%, 20% and 10% has been shown for different Rouge scores i.e., Rouge-1, Rouge-2 and Rouge-L. Also, a comparative analysis of the proposed Dogri Extractive Summarization model with other Indian Text Summarization systems like Hindi, Bengali, Punjabi, and Kannada is also presented in the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

https://github.com/Sonam2/Dogri_Corpus. https://kaggle.com/datasets/9f5501bbed7f1eb687290e725b9afef1aff2269b270f9256b933c3faf853ea53.

References

  1. Kumar, Y., Kaur, K., Kaur, S.: Study of automatic text summarization approaches in different languages. Artif. Intell. Rev.. Intell. Rev. 54(8), 5897–5929 (2021)

    MATH  Google Scholar 

  2. Torres-Moreno, J.-M.: Automatic Text Summarization. Wiley-ISTE (2014)

    MATH  Google Scholar 

  3. Gambhir, M., Gupta, V.: Recent automatic text summarization techniques. Artif. Intell. Rev.. Intell. Rev. 47(1), 1–66 (2017)

    MATH  Google Scholar 

  4. Munot, N., Govilkar, S.S.: Comparative study of text summarization methods. Int. J. Comput. Appl.Comput. Appl. 102(12), 975–8887 (2014)

    MATH  Google Scholar 

  5. “Dogri language - Wikipedia.” Available: https://en.wikipedia.org/wiki/Dogri_language.

  6. “Languages Included in the Eighth Schedule of the Indian Constution | Department of Official Language | Ministry of Home Affairs | GoI.” Available: http://rajbhasha.nic.in/en/languages-included-eighth-schedule-indian-constution.

  7. Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)

    MathSciNet  MATH  Google Scholar 

  8. Edmundson, H.P.: New methods in automatic extracting. J. ACM 16(2), 264–285 (1969)

    MATH  Google Scholar 

  9. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag.Manag. 24(5), 513–523 (1988)

    MATH  Google Scholar 

  10. Fattah, M.A., Ren, F.: GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput. Speech Lang.. Speech Lang. 23(1), 126–144 (2009)

    MATH  Google Scholar 

  11. Ferreira, R., et al.: Assessing sentence scoring techniques for extractive text summarization. Expert Syst. Appl. 40, 5755–5764 (2013)

    MATH  Google Scholar 

  12. Vale, R., Lins, R., Ferreira, R.: Assessing sentence simplification methods applied to text summarization. In: Proc. - 2018 Brazilian Conf. Intell. Syst. BRACIS 2018, pp. 49–54 (2018)

  13. Padmalahari, E., Kumar, D.V.N.S., Prasad, S.: Automatic text summarization with statistical and linguistic features using successive thresholds. In: Proceedings of 2014 IEEE International Conference on Advanced Communication, Control and Computing Technologies, ICACCCT 2014, 2014, no. 978, pp. 1519–1524

  14. Qaroush, A., Abu Farha, I., Ghanem, W., Washaha, M., Maali, E.: An efficient single document arabic text summarization using a combination of statistical and semantic features. J. King Saud Univ. - Comput. Inf. Sci., Mar. (2019)

  15. Shekhar, Y.C., Sharan, A.: Hybrid approach for single text document summarization using statistical and sentiment features. Int. J. Inf. Retr. Res. 5(4), 46–70 (2015)

    MATH  Google Scholar 

  16. Gulati, A.N., Sawarkar, S.D.: A novel technique for multidocument hindi text summarization. In: 2017 Int. Conf. Nascent Technol. Eng. ICNTE 2017 - Proc., (2017)

  17. Gupta, M., Garg, N.K.: Text summarization of hindi documents using rule based approach. In: 2016 Int. Conf. Micro-Electronics Telecommun. Eng., pp. 366–370 (2016)

  18. Gupta, V., Lehal, G.S.: Automatic text summarization system for punjabi language. J. Emerg. Technol. Web Intell. 5(3), 257–271 (2013)

    MATH  Google Scholar 

  19. Gupta, V., Kaur, N.: A novel hybrid text summarization system for punjabi text. Cognit. Comput. 8(2), 261–277 (2016)

    MATH  Google Scholar 

  20. Desai, N., Shah, P.: Automatic text summarization using supervised machine learning technique for Hindi langauge. Int. J. Res. Eng. Technol. 05(06), 361–367 (2016)

    MATH  Google Scholar 

  21. Sehgal, S., Kumar, B., Maheshwar, Rampal, L., Chaliya, A.: A modification to graph based approach for extraction based automatic text summarization. In: Advances in Intelligent Systems and Computing, vol. 564, K. Saeed, Ed. Springer Nature Singapore, pp. 373–378 (2018)

  22. Anam, S.A., Muntasir Rahman, A.M., Saleheen, N.N., Arif, H.: Automatic text summarization using fuzzy C-means clustering. In: 2018 Jt. 7th Int. Conf. Informatics, Electron. Vis. 2nd Int. Conf. Imaging, Vis. Pattern Recognition, ICIEV-IVPR 2018, pp. 180–184 (2019).

  23. Pattanaik, A., Sagnika, S., Das, M., Mishra, B.S.P.: Extractive summary: an optimization approach using bat algorithm. In: Hu, Y.-C., Tiwari, S., Mishra, K.K., Trivedi, M.C. (eds.) Ambient Communications and Computer Systems, vol. 904, pp. 339–351. Springer, Singapore (2019)

    MATH  Google Scholar 

  24. Bidoki, M., Moosavi, M.R., Fakhrahmad, M.: A semantic approach to extractive multi-document summarization: applying sentence expansion for tuning of conceptual densities. Inf. Process. Manag.Manag. 57(6), 1–25 (2020)

    MATH  Google Scholar 

  25. Vetriselvi, T., Gopalan, N.P.: An improved key term weightage algorithm for text summarization using local context information and fuzzy graph sentence score. J. Ambient Intell. Humaniz. Comput., 0123456789 (2020)

  26. Gandotra, S., Arora, B.: On creation of Dogri language corpus. J. Crit. Rev. 7(09), 2337–2343 (2020)

    MATH  Google Scholar 

  27. Gandotra, S., Arora, B.: Functional words removal techniques : a review. In: 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 546–550 (2018)

  28. Tijani, O.D., Onashoga, S.A.: An auto-generated approach of stop words using aggregated analysis. In: 13th International Conference on Information Technology Innovation for Sustainable Development (2017)

  29. Gandotra, S., Arora, B.: Automated stop-word list generation for Dogri corpus. Int. J. Adv. Sci. Technol. 28(19), 884–889 (2019)

    Google Scholar 

  30. Suanmali, L., Salim, N., Binwahlan, M.S.: Fuzzy logic based method for improving text summarization. Int. J. Comput. Sci. Inf. Secur. 2, 1 (2009)

    Google Scholar 

  31. Gupta, V., Lehal, G.S.: Features selection and weight learning for punjabi text summarization. Int. J. Emerg. Trends Technol. 2(2), 45–48 (2011)

    MATH  Google Scholar 

  32. Han, J., Kamber, M., Pei, J.: Getting to know your data. In: Data Mining, Elsevier, pp. 39–82 (2012)

  33. Alguliyev, R., Aliguliyev, R., Isazade, N.: A sentence selection model and HLO algorithm for extractive text summarization. In: Appl. Inf. Commun. Technol. (AICT 2016) (2017)

  34. Steinberger, J., Ježek, K.: Evaluation measures for text summarization. Comput. Inform. 28(2), 251–275 (2009)

    MATH  Google Scholar 

  35. Sarkar, K.: An approach to summarizing bengali news documents. In: Proc. Int. Conf. Adv. Comput. Commun. Informatics - ICACCI ’12, p. 857 (2012)

  36. MurthyK, S.: Document summarization in Kannada using keyword extraction. Comput. Sci. Inf. Technol. 1, 121–127 (2011)

    Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yogesh Kumar.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gandotra, S., Arora, B. & Kumar, Y. Design and development of Dogri extractive summarization model for automated summary generation. Int J Digit Libr 26, 6 (2025). https://doi.org/10.1007/s00799-025-00412-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00799-025-00412-0

Keywords