skip to main content
10.1145/2644866.2644890acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

A platform for language independent summarization

Published: 16 September 2014 Publication History

Abstract

The text data available on the Internet is not only huge in volume, but also in diversity of subject, quality and idiom. Such factors make it infeasible to efficiently scavenge useful information from it. Automatic text summarization is a possible solution for efficiently addressing such a problem, because it aims to sieve the relevant information in documents by creating shorter versions of the text. However, most of the techniques and tools available for automatic text summarization are designed only for the English language, which is a severe restriction. There are multilingual platforms that support, at most, 2 languages. This paper proposes a language independent summarization platform that provides corpus acquisition, language classification, translation and text summarization for 25 different languages.

References

[1]
D. Radev, T. Allison, S. Blair-Goldensohn, J. Blitzer, A. Çelebi, S. Dimitrov, E. Drabek, A. Hakim, W. Lam, D. Liu, J. Otterbacher, H. Qi, H. Saggion, S. Teufel, M. Topper, A. Winkel e Z. Zhu, "MEAD - a platform for multidocument multilingual text summarization.," Proceedings of LREC 2004, Lisbon, Portugal, 2004.
[2]
D. K. Evans, K. Mckeown e J. L. Klavans, "Similarity-based Multilingual Multi-Document Summarization," IEEE Transactions on Information Theory, vol. 49, 2005.
[3]
B. Roark e S. Fisher, "OGI/OHSU baseline multilingual multi-document summarization system," IEEE International Conference on Microelectronic Systems Education, USA, 2005.
[4]
M. Litvak, M. Last e M. Friedman, "A New Approach to Improving Multilingual Summarization Using a Genetic Algorithm," 48th Annual Meeting of the Assoc for C. Linguistics, Sweden, 2010.
[5]
V. Gupta, "Hybrid Algorithm for Multilingual Summarization of Hindi and Punjabi Documents," Lecture Notes in Computer Science. Mining Intelligence and Knowledge Exploration, vol. 8284, pp. 717--727, 2013.
[6]
L. Cabral, R. Lins, R. Lima, R. Ferreira, F. Freitas, G. Silva, G. Cavalcanti, S. Simske e L. Favaro, "A Hybrid Algorithm for Automatic Language Detection on Web and Text Documents," 11th IAPR International Workshop on Document Analysis Systems. Tours - Loire Valley, France, 10 April 2014.
[7]
T. Dunning, "Statistical identification of language," Technical Report CRL MCCS-94-273, Computer Research Lab, New Mexico University, New Mexico, 1994.
[8]
W. B. Cavnar e J. M. Trenkle, "N-Gram Based Text Categorization.," Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 161--169, 1994.
[9]
R. Lins e P. Gonçalves, "Automatic language identification of written texts," em Proceedings of the ACM Symposium on Applied Computing (SAC'04), New York, NY, USA, 2004.
[10]
Lexiteria, "Word Frequency Lists," Lexiteria, 2002. {Online}. Available: http://www.lexiteria.com/. {Acesso em 09 10 2013}.
[11]
L. Cabral, R. Lins, R. Lima e S. Simske, "A comparative assessment of language identification approaches in textual documents," IADIS International Conference Applied Computing 2012, Madrid, 2012.
[12]
P. Koehn, "Europarl: A Parallel Corpus for Statistical Machine Translation," MT Summit 2005, 2005.
[13]
Microsoft Corporation, "Microsoft Translator V2," MSDN, 2014. Available: http://msdn.microsoft.com/en-us/library/ff512423.aspx. {Last acess 10 March 2014}.
[14]
R. Ferreira, L. Cabral, R. Lins, G. Silva, F. Freitas, G. Cavalcanti, R. Lima, S. Simske e L. Favaro, "Assesing sentence scoring techniques for extrative text summarization," Expert Systems with Applications, pp. 5755--5764, 2013.
[15]
A. Abuobieda, N. Salim, A. Albaham, A. Osman e Y. Kumar, "Text summarization features selection method using pseudo genetic-based model.," International Conference on Information Retrieval Knowledge Management (CAMP), pp. 193--197, March 2012.
[16]
P. Gupta, V. Pendluri e I. Vats, "Summarizing text by ranking text units according to shallow linguistic features," 13th International Conference on Advanced Communication Technology (ICACT), pp. 1620--1625, February 2011.
[17]
C. N. Satoshi, S. Satoshi, M. Murata, K. Uchimoto, M. Utiyama, H. Isahara e K. Human, "Info-communication: Sentence extraction system assembling multiple evidence.," Proceedings of 2nd NTCIR Workshop, pp. 319--324, 2001.
[18]
C.-Y. Lin, "Rouge: A package for automatic evaluation of summaries.," em Text summarization branches out: Proceedings of the ACL-04 workshop, Barcelona, Spain, Stan Szpakowicz Marie-Francine Moens, 2004, pp. 74--81.
[19]
C.-Y. Lin e E. Hovy, "Automatic evaluation of summaries using n-gram co-occurrence statistics," Proc. of Human Language Technology Conference (HLT-NAACL 2003), Canada, 2003.
[20]
R. D. Lins, S. J. Simske, L. S. Cabral, G. F. P. Silva, R. J. Lima, R. F. Mello e L. Favaro, "A multi-tool scheme for summarizing textual documents," Proceedings of 11st IADIS International Conference WWW/INTERNET, Madrid, Spain, 2012.
[21]
T. Pardo e L. Rino, "TeMario: a corpus for automatic text summarization.," Technical report, NILC-TR-03-09., São Paulo, 2003.
[22]
D. Leite e L. Rino, "Combining multiple features for automatic text summarization through Machine Learning," em Computational Processing of the Portuguese Language: 8th International Conference, PROPOR 2008, Springer-Verlag, 2008, pp. 122--132.

Cited By

View all
  • (2022)An Overview of Indian Language Datasets Used for Text SummarizationICT with Intelligent Applications10.1007/978-981-19-3571-8_63(693-703)Online publication date: 1-Oct-2022
  • (2021)The impact of automatic text translation on classification of online discussions for social and cognitive presencesLAK21: 11th International Learning Analytics and Knowledge Conference10.1145/3448139.3448147(77-87)Online publication date: 12-Apr-2021
  • (2021)A Survey of Automatic Text Summarization: Progress, Process and ChallengesIEEE Access10.1109/ACCESS.2021.31297869(156043-156070)Online publication date: 2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '14: Proceedings of the 2014 ACM symposium on Document engineering
September 2014
226 pages
ISBN:9781450329491
DOI:10.1145/2644866
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification
  2. multilingual summarization
  3. platform
  4. translation

Qualifiers

  • Research-article

Funding Sources

  • Hewlett-Packard-Brazil and UFPE

Conference

DocEng '14
Sponsor:
DocEng '14: ACM Symposium on Document Engineering 2014
September 16 - 19, 2014
Colorado, Fort Collins, USA

Acceptance Rates

DocEng '14 Paper Acceptance Rate 15 of 41 submissions, 37%;
Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)An Overview of Indian Language Datasets Used for Text SummarizationICT with Intelligent Applications10.1007/978-981-19-3571-8_63(693-703)Online publication date: 1-Oct-2022
  • (2021)The impact of automatic text translation on classification of online discussions for social and cognitive presencesLAK21: 11th International Learning Analytics and Knowledge Conference10.1145/3448139.3448147(77-87)Online publication date: 12-Apr-2021
  • (2021)A Survey of Automatic Text Summarization: Progress, Process and ChallengesIEEE Access10.1109/ACCESS.2021.31297869(156043-156070)Online publication date: 2021
  • (2019)The CNN-CorpusProceedings of the ACM Symposium on Document Engineering 201910.1145/3342558.3345388(1-10)Online publication date: 23-Sep-2019
  • (2019)Text summarization from legal documentsArtificial Intelligence Review10.1007/s10462-017-9566-251:3(371-402)Online publication date: 1-Mar-2019
  • (2018)Automatic cohesive summarization with pronominal anaphora resolutionComputer Speech and Language10.1016/j.csl.2018.05.00452:C(141-164)Online publication date: 1-Nov-2018
  • (2016)Mobile Summarizer and News Summary NavigatorProceedings of the 2016 ACM Symposium on Document Engineering10.1145/2960811.2967156(107-110)Online publication date: 13-Sep-2016
  • (2015)Automatic Summarization of News Articles in Mobile DevicesProceedings of the 2015 Fourteenth Mexican International Conference on Artificial Intelligence (MICAI)10.1109/MICAI.2015.8(8-13)Online publication date: 25-Oct-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media