Characterizing User-Generated Text Content Mining: A Systematic Mapping Study of the Portuguese Language

Souza, Ellen; Castro, Dayvid; Vitório, Douglas; Teles, Ingryd; Oliveira, Adriano L. I.; Gusmão, Cristine

doi:10.1007/978-3-319-31232-3_96

Ellen Souza⁷,
Dayvid Castro⁷,
Douglas Vitório⁷,
Ingryd Teles⁷,
Adriano L. I. Oliveira⁸ &
…
Cristine Gusmão⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 444))

2312 Accesses

Abstract

Unstructured data accounts for more than 80 % of enterprise data and is growing at an annual exponential rate of 60 %. Text mining refers to the process of discovering new, previously unknown and potentially useful information from a variety of unstructured data including user-generated text content (UGTC). Given that Portuguese language is one of the most common languages in the world, and it is also the second most frequent language on Twitter, the goal of this work is to plot the landscape of current studies that relates the application of text mining to UGTC in the Portuguese language. The systematic mapping review method was applied to search, select, and to extract data from the included studies. Our manual and automated searches retrieved 6075 studies up to year 2014, from which 35 were included in the study. Text classification concentrates 79 % of all text mining tasks, having the Naïve Bayes as the main classifier and Twitter as the main data source.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

In Search of Insight from Unstructured Text Data: Towards an Identification of Text Mining Techniques

Extraction and Processing of Web Content for Corpus Creation: A Systematic Literature Review

The interactive Leipzig Corpus Miner: An extensible and adaptable text analysis tool for content analysis

Article Open access 29 August 2023

References

Marine-Roig, E., Anton Clavé, S.: Tourism analytics with massive user-generated content: A case study of Barcelona. J. Destin. Mark. Manag. 1–11 (2015).
Google Scholar
Delen, D., Crossland, M.D.: Seeding the survey and analysis of research literature with text mining. Expert Syst. Appl. 34, 1707–1720 (2008).
Google Scholar
Hotho, A., Andreas, N., Paaß, G., Augustin, S.: A Brief Survey of Text Mining. (2005).
Google Scholar
Tan, A.: Text Mining : The state of the art and the challenges Concept-based. Proc. PAKDD 1999 Work. Knowl. Disocovery from Adv. Databases. 65–70 (1999).
Google Scholar
Pardo, T., Gasperin, C., Caseli, H., Nunes, M. das G. V.: Computational Linguistics in Brazil : an overview. Proc. NAACL HLT 2010 Am. 1–7 (2010).
Google Scholar
Poblete, B., Garcia, R., Mendoza, M., Jaimes, A.: Do All Birds Tweet the Same ? Characterizing Twitter Around the World. Society. 1025–1030 (2011).
Google Scholar
Petersen, K., Feldt, R., Mujtaba, S., Mattsson, M.: Systematic Mapping Studies in Software Engineering. (2007).
Google Scholar
Kitchenham, B., Charters, S.: Guidelines for performing Systematic Literature Reviews in Software Engineering. Tech. Rep. EBSE-2007-01, (2007).
Google Scholar
Hotho, A., Nürnberger, A., Paaß, G.: A Brief Survey of Text Mining. Ldv Forum. (2005).
Google Scholar
da Silva Conrado, M., Felippo, A., Salgueiro Pardo, T., Rezende, S.: A survey of automatic term extraction for Brazilian Portuguese. J. Brazilian Comput. Soc. 20, 12 (2014).
Google Scholar
Lu, W., Stepchenkova, S.: User-Generated Content as a Research Mode in Tourism and Hospitality Applications: Topics, Methods, and Software. J. Hosp. Mark. Manag. (2015).
Google Scholar
Laboreiro, G., Bošnjak, M., Sarmento, L., Rodrigues, E.M., Oliveira, E.: Determining language variant in microblog messages. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing - p. 902. ACM Press, USA (2013).
Google Scholar
Evangelista, T.R., Padilha, T.P.P.: Monitoramento de Posts Sobre Empresas de E-Commerce em Redes Sociais Utilizando Análise de Sentimentos. (2013).
Google Scholar
Takçı, H., Güngör, T.: A high performance centroid-based classification approach for language identification. Pattern Recognit. Lett. 33, 2077–2084 (2012).
Google Scholar

Download references

Author information

Authors and Affiliations

MiningBR Research Group, Federal Rural University of Pernambuco (UFRPE), Serra Talhada, PE, Brazil
Ellen Souza, Dayvid Castro, Douglas Vitório & Ingryd Teles
Centro de Informática, Federal Unversity of Pernambuco (CIn-UFPE), Recife, PE, Brazil
Adriano L. I. Oliveira
Programa de Pós-Graduação Em Engenharia Biomédica, Centro de Tecnologia E Geociências - Federal Unversity of Pernambuco (CTG-UFPE), Recife, PE, Brazil
Cristine Gusmão

Authors

Ellen Souza
View author publications
You can also search for this author in PubMed Google Scholar
Dayvid Castro
View author publications
You can also search for this author in PubMed Google Scholar
Douglas Vitório
View author publications
You can also search for this author in PubMed Google Scholar
Ingryd Teles
View author publications
You can also search for this author in PubMed Google Scholar
Adriano L. I. Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Cristine Gusmão
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ellen Souza .

Editor information

Editors and Affiliations

DEI/FCT, Universidade de Coimbra, Coimbra, Portugal
Álvaro Rocha
ISEGI,, Universidade Nova de Lisboa, Lisboa, Portugal
Ana Maria Correia
College of Engineering, The Ohio State University, Columbus, Ohio, USA
Hojjat Adeli
DSI, Universidade do Minho, Guimarães, Portugal
Luis Paulo Reis
Rua Dom Manoel de Medeiros, s/n, DoisIrm, Universidade Federal Rural de Pernambuco, Recife, Brazil
Marcelo Mendonça Teixeira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Souza, E., Castro, D., Vitório, D., Teles, I., Oliveira, A.L.I., Gusmão, C. (2016). Characterizing User-Generated Text Content Mining: A Systematic Mapping Study of the Portuguese Language. In: Rocha, Á., Correia, A., Adeli, H., Reis, L., Mendonça Teixeira, M. (eds) New Advances in Information Systems and Technologies. Advances in Intelligent Systems and Computing, vol 444. Springer, Cham. https://doi.org/10.1007/978-3-319-31232-3_96

Download citation

DOI: https://doi.org/10.1007/978-3-319-31232-3_96
Published: 02 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31231-6
Online ISBN: 978-3-319-31232-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics