ABSTRACT
Summarization is a Natural Language Processing application that may seem trivial to a person, but in a time where the quantity of information provided is continuously growing, the possibility of implementing a "helper" in order to summarize it, has become a necessity. Most of the existing scientific studies in automatic text summarization has been paying attention primarily to English with only some recent attempts in other major languages. To the best of our knowledge, no prior approaches handle automatic summarization for Albanian documents. This paper is proposed to fill this gap by implementing a novel extractive summarization system, designed specifically for Albanian Language. We showed experimentally that the enrichment of the summarization system with language-dependent elements improves the systems' performance and the compression rate.
- D. Reinsel, J. Gantz, and J. Rydning: The Digitization of the World: From Edge to Core, Data Age 2020, An IDC whitepaper, November 2018.Google Scholar
- M. Allahvari, S. Pouriyeh, M. Assefi, S. Safaei, E.D. Trippe, J.B. Gutierrez, K. Kochut: Text Summarization Techniques: A Brief Survey, 2017.Google Scholar
- Y. J. Kumar et al. / Journal of Computer Science 2016, 12 (4): 178.190Google Scholar
- M. Last and M. Litvak (2019). Language-independent Techniques for Automated Text Summarization.Google Scholar
- H.P. Luhn (1958). The automatic creation of literature abstracts, IBM Journal of research and development. Google ScholarDigital Library
- H.P. Edmundson (1969). New methods in automatic extracting, Journal of the ACM (JACM). Google ScholarDigital Library
- G. DeJong (1982). An overview of the FRUMP system. In: Lehnert, W., Ringle, M. (eds.) Strategies for Natural Language Processing, pp. 149--176. Lawrence Erlbaum Associates, Hillsdale.Google Scholar
- Ouyang Y, Li W, Li S, Lu Q (2011). Applying regression models to query-focused multi-document summarization. Inf Process Manag 47:227--237 Google ScholarDigital Library
- M.A. Fattah (2014). A hybrid machine learning model for multi-document summarization. 592--600. Google ScholarDigital Library
- R.Mihalcea (2005). Language Independent Extractive Summarization, American Association for Artificial Intelligence. Google ScholarDigital Library
- R. Mihalcea and P. Tarau (2004). TextRank: Bringing Order into Texts In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google Scholar
- U. Hahn and I. Mani (2000). The Challenges of Automatic Summarization Computer, 33 (11) (2000), pp. 29-36 Google ScholarDigital Library
- René Arnulfo García-Hernández and Y. Ledeneva (2009). Word Sequence Models for Single Text Summarization, Second International Conferences on Advances in Computer-Human Interactions. Google ScholarDigital Library
- L.M. Al Qassem, D. Wang, Z. Al Mahmoud, H. Barada, A. Al-Rubaiea, N.I. Almoosa (2017). Automatic Arabic Summarization: A survey of methodologies and systems, 3rd International Conference on Arabic Computational Linguistics.Google ScholarCross Ref
- R. Bois, J. Leveling, L. Goeuriot, G.J.F. Jones, L. Kelly (2014). Porting a Summarizer to the French Language, 21ème Traitement Automatique des Langues Naturelles.Google Scholar
- A. G. Malamos, G. Mamakis and J. A. Ware (2019). Applying Statistic-based Algorithms for Automated Content Summarization in Greek language.Google Scholar
- L.H.M. Rino at al. (2004). A Comparison of Automatic Summarizers of Texts in Brazilian Portuguese. A.L.C. Bazzan and S. Labidi (Eds.): SBIA 2004, LNAI 3171, pp. 235--244, 2004.Google Scholar
- E.P. Hamp (2016). Albanian Language, Encyclopedia Britannica.Google Scholar
- E. Trandafili, E. K. Mece, K. Kica, H. Paci (2017). A Novel Question Answering System for Albanian Language.Google Scholar
- J. Sadiku and M. Biba (2012) Automatic stemming of Albanian through a rule-based approach. J. Int. Res. Publ. Lang. Individuals Soc. 6. ISSN-1313-2547Google Scholar
- D. Jurafsky and J.H. Martin (2018). Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Third Edition. Google ScholarDigital Library
- J. Ramos (2003). Using TF-IDF to Determine Word Relevance in Document Queries.Google Scholar
- R. Mihalcea and H. Ceylan (2007). Explorations in Automatic Book Summarization, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 380--389.Google Scholar
- J. Steinberger and K. Ježek (2009). Evaluation measures for text summarization. Computing and Informatics. 28. 251--275.Google Scholar
Index Terms
- A Novel Document Summarization System for Albanian Language
Recommendations
Hybrid multi-document summarization using pre-trained language models
AbstractAbstractive multi-document summarization is a type of automatic text summarization. It obtains information from multiple documents and generates a human-like summary from them. In this paper, we propose an abstractive multi-document ...
Highlights- Introducing a multi-document summarizer, called HMSumm, based on pre-trained methods.
Towards coherent single-document summarization: an integer linear programming-based approach
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied ComputingAutomatic Text Summarization (ATS) is a viable option to reduce the content of textual documents, e.g., as a possible preprocessing step in many text mining applications. Single-document extractive summarizers have been developed based on different ...
Exploring events and distributed representations of text in multi-document summarization
We explore an event detection framework to improve multi-document summarizationWe use distributed representations of text to address different lexical realizationsSummarization is based on the hierarchical combination of single-document summariesWe ...
Comments