An Automated Corpus Annotation Experiment in Brazilian Portuguese for Sentiment Analysis in Public Security

de Carvalho, Victor Diogho Heuer; Nepomuceno, Thyago Celso Cavalcante; Costa, Ana Paula Cabral Seixas

doi:10.1007/978-3-030-46224-6_8

Victor Diogho Heuer de Carvalho^10,11,
Thyago Celso Cavalcante Nepomuceno¹² &
Ana Paula Cabral Seixas Costa¹¹

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 384))

Included in the following conference series:

International Conference on Decision Support System Technology

704 Accesses
6 Citations

Abstract

This paper aims to present an experiment developed in order to produce a corpus with automated annotation, using pre-existing annotated corpus and machine learning classification methods. A search for pre-existing annotated corpora in Brazilian Portuguese was applied, founding six corpora of which one has been selected as the training dataset. A set of tweets was collected in a specific area of Recife (Pernambuco-Brazil) using some keywords related to kinds of crimes and reinforcing some places in that area. Preprocessing tasks were applied over the pre-existing corpus and the tweets’ set collected. Latent Dirichlet Allocation was applied for topic modeling followed by Multinomial Naïve Bayes, Linear Support Vector Machines, and Logistic Regression for the sentiment polarity classification. The results of the cross-validation of the experiment indicated Linear Support Vector Machines as the most accurate classification method among the three considering the specific training set used, and by this method, the new annotated corpus about the selected topic related to public security was created.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

He, W., Wang, F.K., Akula, V.: Managing extracted knowledge from big social media data for business decision making. J. Knowl. Manage 21, 275–294 (2017). https://doi.org/10.1108/JKM-07-2015-0296
Article Google Scholar
Vatrapu, R., Mukkamala, R.R., Hussain, A., Flesch, B.: Social set analysis: a set theoretical approach to big data analytics. IEEE Access 4, 2542–2571 (2016). https://doi.org/10.1109/ACCESS.2016.2559584
Article Google Scholar
Colombo, P., Ferrari, E.: Access control in the era of big data: state of the art and research directions. In: Proceedings of the 23rd ACM on Symposium on Access Control Models and Technologies – SACMAT 2018, pp 185–192. ACM Press, New York, NY, USA (2018)
Google Scholar
Bjurstrom, S.: Sentiment analysis methodology for social web intelligence. In: Proceedings of the Twenty-first Americas Conference on Information Systems. Association for Information Systems, Puerto Rico, pp 1–12 (2015)
Google Scholar
Stieglitz, S., Mirbabaie, M., Ross, B., Neuberger, C.: Social media analytics – challenges in topic discovery, data collection, and data preparation. Int. J. Inf. Manage. 39, 156–168 (2018). https://doi.org/10.1016/j.ijinfomgt.2017.12.002
Article Google Scholar
Feng, L., Chiam, Y.K., Lo, S.K.: Text-mining techniques and tools for systematic literature reviews: a systematic literature review. In: 2017 24th Asia-Pacific Software Engineering Conference (APSEC), pp 41–50. IEEE (2017)
Google Scholar
Lorentzen, D.G.: Webometrics benefitting from web mining? An investigation of methods and applications of two research fields. Scientometrics 99, 409–445 (2014). https://doi.org/10.1007/s11192-013-1227-x
Article Google Scholar
Sisodia, D.S., Reddy, N.R.: Sentiment analysis of prospective buyers of mega online sale using tweets. In: International Conference on Power, Control, Signals and Instrumentation Engineering, ICPCSI 2017, pp. 2734–2739 (2018). https://doi.org/10.1109/ICPCSI.2017.8392217
Boulos, M.N.K., Sanfilippo, A.P., Corley, C.D., Wheeler, S.: Social web mining and exploitation for serious applications: technosocial predictive analytics and related technologies for public health, environmental and national security surveillance. Comput. Methods Programs Biomed. 100, 16–23 (2010). https://doi.org/10.1016/j.cmpb.2010.02.007
Article Google Scholar
de Carvalho, V.D.H., Costa, A.P.C.S.: Social web mining as a tool to support public security sentiment analysis. In: Freitas, P.S., Dargam, F., Ribeiro, R., et al. (eds.) 5th International Conference on Decision Support System Technology, pp. 164–169. EURO Working Group on Decision Support Systems, Funchal (2019)
Google Scholar
Gerber, M.S.: Predicting crime using Twitter and kernel density estimation. Decis. Support Syst. 61, 115–125 (2014). https://doi.org/10.1016/j.dss.2014.02.003
Article Google Scholar
Nepomuceno, T.C.C., Costa, A.P.C.S.: Spatial visualization on patterns of disaggregate robberies. Oper. Res. (2019). https://doi.org/10.1007/s12351-019-00479-z
Article Google Scholar
Pereira, D.V.S., Mota, C.M.M., Andresen, M.A.: The homicide drop in Recife, Brazil: a study of crime concentrations and spatial patterns. Homicide Stud. 21, 21–38 (2017). https://doi.org/10.1177/1088767916634405
Article Google Scholar
Henriques de Gusmão, A.P., Aragão Pereira, R.M., Silva, M.M., da Costa Borba, B.F.: The use of a decision support system to aid a location problem regarding a public security facility. In: Freitas, P.S.A., Dargam, F., Moreno, J.M. (eds.) EmC-ICDSST 2019. LNBIP, vol. 348, pp. 15–27. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-18819-1_2
Chapter Google Scholar
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008). https://doi.org/10.1561/1500000011
Article Google Scholar
Kharrat, S., Kchaou, S.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37, 267–307 (2007)
Google Scholar
Brum, H.B., Das Graças Volpe Nunes, M.: Building a sentiment corpus of tweets in Brazilian Portuguese. In: LREC 2018 - 11th International Conference on Language Resources and Evaluation, pp. 4167–4172 (2019)
Google Scholar
Chathuranga, J., Ediriweera, S., Hasantha, R., et al.: Annotating opinions and opinion targets in student course feedback. In: LREC 2018 - 11th International Conference on Language Resources and Evaluation, pp. 2684–2688 (2019)
Google Scholar
Turchi, M., Negri, M.: Automatic annotation of machine translation datasets with binary quality judgements. In: Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, pp. 1788–1792 (2014)
Google Scholar
Win, S.S.M., Aung, T.N.: Automated text annotation for social media data during natural disasters. Adv. Sci. Technol. Eng. Syst. 3, 119–127 (2018). https://doi.org/10.25046/aj030214
Article Google Scholar
Walkowiak, T., Gniewkowski, M.: Distance measures for clustering of documents in a topic space. Adv. Intell. Syst. Comput. 987, 544–552 (2020). https://doi.org/10.1007/978-3-030-19501-4_54
Article Google Scholar
Cook, P., Brinton, L.J.: Building and evaluating web corpora representing national varieties of English. Lang. Resour. Eval. 51, 643–662 (2017). https://doi.org/10.1007/s10579-016-9378-z
Article Google Scholar
Hovy, E., Lavid, J.: Towards a ‘science’of corpus annotation: a new methodological challenge for corpus linguistics. Int. J. Transl. 22, 13–36 (2010)
Google Scholar
Baccouche, A., Garcia-Zapirain, B., Elmaghraby, A.: Annotation technique for health-related tweets sentiment analysis. In: 2018 IEEE International Symposium on Signal Processing and Information Technology, ISSPIT 2018, pp. 382–387 (2019). https://doi.org/10.1109/ISSPIT.2018.8642685
Zhang, H., Gan, W., Jiang, B.: Machine learning and lexicon based methods for sentiment classification: a survey. In: 2014 11th Web Information System and Application Conference (WISA). IEEE, New York, NY, USA, pp 262–265 (2014)
Google Scholar
Neogi, P.P.G., Das, A.K., Goswami, S., Mustafi, J.: Topic modeling for text classification. In: Mandal, J.K., Bhattacharya, D. (eds.) Emerging Technology in Modelling and Graphics. AISC, vol. 937, pp. 395–407. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-7403-6_36
Chapter Google Scholar
Dahal, B., Kumar, S.A.P., Li, Z.: Topic modeling and sentiment analysis of global climate change tweets. Soc. Netw. Anal. Min. 9, 1–20 (2019). https://doi.org/10.1007/s13278-019-0568-8
Article Google Scholar
Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62, 305–311 (2019). https://doi.org/10.1109/TE.2019.2924385
Article Google Scholar
Groß-Klußmann, A., König, S., Ebner, M.: Buzzwords build momentum: global financial twitter sentiment and the aggregate stock market. Expert Syst. Appl. 136, 171–186 (2019). https://doi.org/10.1016/j.eswa.2019.06.027
Article Google Scholar
Srinivasan, B., Mohan Kumar, K.: Flock the similar users of twitter by using latent Dirichlet allocation. Int. J. Sci. Technol. Res. 8, 1421–1425 (2019)
Google Scholar
Aggarwal, C.C.: Machine learning for text. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73531-3
Book MATH Google Scholar
Blei, D., Carin, L., Dunson, D.: Probabilistic topic models. IEEE Signal Process. Mag. 27, 55–65 (2010). https://doi.org/10.1109/MSP.2010.938079
Article Google Scholar
Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl.-Based Syst. 89, 14–46 (2015). https://doi.org/10.1016/j.knosys.2015.06.015
Article Google Scholar
Yang, P., Chen, Y.: A survey on sentiment analysis by using machine learning methods. In: 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), pp 117–121. IEEE (2017)
Google Scholar
Asghar, M.Z., Kundi, F.M., Ahmad, S., et al.: T-SAF: Twitter sentiment analysis framework using a hybrid classification scheme. Expert Syst. 35, 1–19 (2018). https://doi.org/10.1111/exsy.12233
Article Google Scholar
Khan, F.H., Bashir, S., Qamar, U.: TOM: Twitter opinion mining framework using hybrid classification scheme. Decis. Support Syst. 57, 245–257 (2014). https://doi.org/10.1016/j.dss.2013.09.004
Article Google Scholar
De Arruda, G.D., Roman, N.T., Monteiro, A.M.: An Annotated Corpus for Sentiment Analysis in Political News, pp. 101–110 (2015)
Google Scholar
dos Santos, H.D.P., Woloszyn, V., Vieira, R., Blogset, B.R.: A Brazilian Portuguese blog corpus. In: LREC 2018 11th International Conference on Language Resources and Evaluation, pp. 661–664 (2019)
Google Scholar
Freitas, C., Motta, E., Milidiú, R.L., César, J.: Sparkling Vampire… LOL! Annotating opinions in a book review corpus. In: Aluísio, S., Tagnin, S.E.O. (eds.) New Language Technologies and Linguistic Research: A Two-Way Road, pp. 128–146. Cambridge Scholars Publishing, Newcastle upon Tyne (2013)
Google Scholar
de Souza, K.F., Pereira, M.H.R., Dalip, D.H.: UniLex: Método Léxico para Análise de Sentimentos Textuais sobre Conteúdo de Tweets em Português Brasileiro. Abakós 5, 79 (2017). https://doi.org/10.5752/p.2316-9451.2017v5n2p79
Article Google Scholar
Rosa, R.L., Rodriguez, D.Z., Bressan, G.: SentiMeter-Br: A new social web analysis metric to discover consumers’ sentiment. In: Proceedings of the International Symposium Consumer Electronics, ISCE, pp. 153–154 (2013). https://doi.org/10.1109/ISCE.2013.6570158
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media Inc., Sebastopol (2009). https://www.nltk.org/
Reinoso, G., Farooq, B., Forum, C.T.R.: Urban pulse analysis using big data. In: Canadian Transportation Research Forum 50th Annual Conference. Transportation Association of Canada (TAC), Montreal, p. 16 (2015)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar

Download references

Acknowledgment

This paper was funded in part by the Coordination for the Improvement of Higher Education Personnel (Brazil) – Finance Code 001, and by the National Council for Scientific and Technological Development (Brazil).

Author information

Authors and Affiliations

Universidade Federal de Alagoas, Delmiro Gouveia, AL, Brazil
Victor Diogho Heuer de Carvalho
Universidade Federal de Pernambuco, Recife, PE, Brazil
Victor Diogho Heuer de Carvalho & Ana Paula Cabral Seixas Costa
Universidade Federal de Pernambuco, Caruaru, PE, Brazil
Thyago Celso Cavalcante Nepomuceno

Authors

Victor Diogho Heuer de Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Thyago Celso Cavalcante Nepomuceno
View author publications
You can also search for this author in PubMed Google Scholar
Ana Paula Cabral Seixas Costa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Victor Diogho Heuer de Carvalho .

Editor information

Editors and Affiliations

University of Zaragoza, Zaragoza, Spain
José María Moreno-Jiménez
University of Namur, Namur, Belgium
Isabelle Linden
SimTech Simulation Technology, Graz, Austria
Fatima Dargam
Loughborough University, Loughborough, UK
Uchitha Jayawickrama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Carvalho, V.D.H., Nepomuceno, T.C.C., Costa, A.P.C.S. (2020). An Automated Corpus Annotation Experiment in Brazilian Portuguese for Sentiment Analysis in Public Security. In: Moreno-Jiménez, J., Linden, I., Dargam, F., Jayawickrama, U. (eds) Decision Support Systems X: Cognitive Decision Support Systems and Technologies. ICDSST 2020. Lecture Notes in Business Information Processing, vol 384. Springer, Cham. https://doi.org/10.1007/978-3-030-46224-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-46224-6_8
Published: 18 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46223-9
Online ISBN: 978-3-030-46224-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics