research-article

A Novel Document Summarization System for Albanian Language

Authors:
Evis Trandafili

Department of Computer Engineering, Polytechnic University of Tirana, Albania

Department of Computer Engineering, Polytechnic University of Tirana, Albania
View Profile

,
Hakik Paci

Department of Computer Engineering, Polytechnic University of Tirana, Albania

Department of Computer Engineering, Polytechnic University of Tirana, Albania
View Profile

,
Elona Karaj

Department of Computer Engineering, Polytechnic University of Tirana, Albania

Department of Computer Engineering, Polytechnic University of Tirana, Albania
View Profile

CompSysTech '19: Proceedings of the 20th International Conference on Computer Systems and TechnologiesJune 2019Pages 273–277https://doi.org/10.1145/3345252.3345275

Published:21 June 2019Publication History

CompSysTech '19: Proceedings of the 20th International Conference on Computer Systems and Technologies

Pages 273–277

ABSTRACT

Summarization is a Natural Language Processing application that may seem trivial to a person, but in a time where the quantity of information provided is continuously growing, the possibility of implementing a "helper" in order to summarize it, has become a necessity. Most of the existing scientific studies in automatic text summarization has been paying attention primarily to English with only some recent attempts in other major languages. To the best of our knowledge, no prior approaches handle automatic summarization for Albanian documents. This paper is proposed to fill this gap by implementing a novel extractive summarization system, designed specifically for Albanian Language. We showed experimentally that the enrichment of the summarization system with language-dependent elements improves the systems' performance and the compression rate.

References

D. Reinsel, J. Gantz, and J. Rydning: The Digitization of the World: From Edge to Core, Data Age 2020, An IDC whitepaper, November 2018.Google Scholar
M. Allahvari, S. Pouriyeh, M. Assefi, S. Safaei, E.D. Trippe, J.B. Gutierrez, K. Kochut: Text Summarization Techniques: A Brief Survey, 2017.Google Scholar
Y. J. Kumar et al. / Journal of Computer Science 2016, 12 (4): 178.190Google Scholar
M. Last and M. Litvak (2019). Language-independent Techniques for Automated Text Summarization.Google Scholar
H.P. Luhn (1958). The automatic creation of literature abstracts, IBM Journal of research and development. Google ScholarDigital Library
H.P. Edmundson (1969). New methods in automatic extracting, Journal of the ACM (JACM). Google ScholarDigital Library
G. DeJong (1982). An overview of the FRUMP system. In: Lehnert, W., Ringle, M. (eds.) Strategies for Natural Language Processing, pp. 149--176. Lawrence Erlbaum Associates, Hillsdale.Google Scholar
Ouyang Y, Li W, Li S, Lu Q (2011). Applying regression models to query-focused multi-document summarization. Inf Process Manag 47:227--237 Google ScholarDigital Library
M.A. Fattah (2014). A hybrid machine learning model for multi-document summarization. 592--600. Google ScholarDigital Library
R.Mihalcea (2005). Language Independent Extractive Summarization, American Association for Artificial Intelligence. Google ScholarDigital Library
R. Mihalcea and P. Tarau (2004). TextRank: Bringing Order into Texts In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google Scholar
U. Hahn and I. Mani (2000). The Challenges of Automatic Summarization Computer, 33 (11) (2000), pp. 29-36 Google ScholarDigital Library
René Arnulfo García-Hernández and Y. Ledeneva (2009). Word Sequence Models for Single Text Summarization, Second International Conferences on Advances in Computer-Human Interactions. Google ScholarDigital Library
L.M. Al Qassem, D. Wang, Z. Al Mahmoud, H. Barada, A. Al-Rubaiea, N.I. Almoosa (2017). Automatic Arabic Summarization: A survey of methodologies and systems, 3rd International Conference on Arabic Computational Linguistics.Google ScholarCross Ref
R. Bois, J. Leveling, L. Goeuriot, G.J.F. Jones, L. Kelly (2014). Porting a Summarizer to the French Language, 21ème Traitement Automatique des Langues Naturelles.Google Scholar
A. G. Malamos, G. Mamakis and J. A. Ware (2019). Applying Statistic-based Algorithms for Automated Content Summarization in Greek language.Google Scholar
L.H.M. Rino at al. (2004). A Comparison of Automatic Summarizers of Texts in Brazilian Portuguese. A.L.C. Bazzan and S. Labidi (Eds.): SBIA 2004, LNAI 3171, pp. 235--244, 2004.Google Scholar
E.P. Hamp (2016). Albanian Language, Encyclopedia Britannica.Google Scholar
E. Trandafili, E. K. Mece, K. Kica, H. Paci (2017). A Novel Question Answering System for Albanian Language.Google Scholar
J. Sadiku and M. Biba (2012) Automatic stemming of Albanian through a rule-based approach. J. Int. Res. Publ. Lang. Individuals Soc. 6. ISSN-1313-2547Google Scholar
D. Jurafsky and J.H. Martin (2018). Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Third Edition. Google ScholarDigital Library
J. Ramos (2003). Using TF-IDF to Determine Word Relevance in Document Queries.Google Scholar
R. Mihalcea and H. Ceylan (2007). Explorations in Automatic Book Summarization, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 380--389.Google Scholar
J. Steinberger and K. Ježek (2009). Evaluation measures for text summarization. Computing and Informatics. 28. 251--275.Google Scholar

Index Terms

A Novel Document Summarization System for Albanian Language
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Summarization

Recommendations

Hybrid multi-document summarization using pre-trained language models
Abstract
Abstractive multi-document summarization is a type of automatic text summarization. It obtains information from multiple documents and generates a human-like summary from them. In this paper, we propose an abstractive multi-document ...
Highlights
- Introducing a multi-document summarizer, called HMSumm, based on pre-trained methods.
Read More
Towards coherent single-document summarization: an integer linear programming-based approach
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

Automatic Text Summarization (ATS) is a viable option to reduce the content of textual documents, e.g., as a possible preprocessing step in many text mining applications. Single-document extractive summarizers have been developed based on different ...
Read More
Exploring events and distributed representations of text in multi-document summarization

We explore an event detection framework to improve multi-document summarizationWe use distributed representations of text to address different lexical realizationsSummarization is based on the hierarchical combination of single-document summariesWe ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CompSysTech '19: Proceedings of the 20th International Conference on Computer Systems and Technologies
June 2019
365 pages
ISBN:9781450371490
DOI:10.1145/3345252
Editors:
Tzvetomir Vassilev,
Angel Smrikarov
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Albanian language
Extractive Summarization
Information Retrieval and Extraction
Natural Language Processing
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate241of492submissions,49%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 70
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Novel Document Summarization System for Albanian Language

CompSysTech '19: Proceedings of the 20th International Conference on Computer Systems and Technologies

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hybrid multi-document summarization using pre-trained language models

Towards coherent single-document summarization: an integer linear programming-based approach

Exploring events and distributed representations of text in multi-document summarization