Skip to main content

An Automated Corpus Annotation Experiment in Brazilian Portuguese for Sentiment Analysis in Public Security

  • Conference paper
  • First Online:
Decision Support Systems X: Cognitive Decision Support Systems and Technologies (ICDSST 2020)

Abstract

This paper aims to present an experiment developed in order to produce a corpus with automated annotation, using pre-existing annotated corpus and machine learning classification methods. A search for pre-existing annotated corpora in Brazilian Portuguese was applied, founding six corpora of which one has been selected as the training dataset. A set of tweets was collected in a specific area of Recife (Pernambuco-Brazil) using some keywords related to kinds of crimes and reinforcing some places in that area. Preprocessing tasks were applied over the pre-existing corpus and the tweets’ set collected. Latent Dirichlet Allocation was applied for topic modeling followed by Multinomial Naïve Bayes, Linear Support Vector Machines, and Logistic Regression for the sentiment polarity classification. The results of the cross-validation of the experiment indicated Linear Support Vector Machines as the most accurate classification method among the three considering the specific training set used, and by this method, the new annotated corpus about the selected topic related to public security was created.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. He, W., Wang, F.K., Akula, V.: Managing extracted knowledge from big social media data for business decision making. J. Knowl. Manage 21, 275–294 (2017). https://doi.org/10.1108/JKM-07-2015-0296

    Article  Google Scholar 

  2. Vatrapu, R., Mukkamala, R.R., Hussain, A., Flesch, B.: Social set analysis: a set theoretical approach to big data analytics. IEEE Access 4, 2542–2571 (2016). https://doi.org/10.1109/ACCESS.2016.2559584

    Article  Google Scholar 

  3. Colombo, P., Ferrari, E.: Access control in the era of big data: state of the art and research directions. In: Proceedings of the 23rd ACM on Symposium on Access Control Models and Technologies – SACMAT 2018, pp 185–192. ACM Press, New York, NY, USA (2018)

    Google Scholar 

  4. Bjurstrom, S.: Sentiment analysis methodology for social web intelligence. In: Proceedings of the Twenty-first Americas Conference on Information Systems. Association for Information Systems, Puerto Rico, pp 1–12 (2015)

    Google Scholar 

  5. Stieglitz, S., Mirbabaie, M., Ross, B., Neuberger, C.: Social media analytics – challenges in topic discovery, data collection, and data preparation. Int. J. Inf. Manage. 39, 156–168 (2018). https://doi.org/10.1016/j.ijinfomgt.2017.12.002

    Article  Google Scholar 

  6. Feng, L., Chiam, Y.K., Lo, S.K.: Text-mining techniques and tools for systematic literature reviews: a systematic literature review. In: 2017 24th Asia-Pacific Software Engineering Conference (APSEC), pp 41–50. IEEE (2017)

    Google Scholar 

  7. Lorentzen, D.G.: Webometrics benefitting from web mining? An investigation of methods and applications of two research fields. Scientometrics 99, 409–445 (2014). https://doi.org/10.1007/s11192-013-1227-x

    Article  Google Scholar 

  8. Sisodia, D.S., Reddy, N.R.: Sentiment analysis of prospective buyers of mega online sale using tweets. In: International Conference on Power, Control, Signals and Instrumentation Engineering, ICPCSI 2017, pp. 2734–2739 (2018). https://doi.org/10.1109/ICPCSI.2017.8392217

  9. Boulos, M.N.K., Sanfilippo, A.P., Corley, C.D., Wheeler, S.: Social web mining and exploitation for serious applications: technosocial predictive analytics and related technologies for public health, environmental and national security surveillance. Comput. Methods Programs Biomed. 100, 16–23 (2010). https://doi.org/10.1016/j.cmpb.2010.02.007

    Article  Google Scholar 

  10. de Carvalho, V.D.H., Costa, A.P.C.S.: Social web mining as a tool to support public security sentiment analysis. In: Freitas, P.S., Dargam, F., Ribeiro, R., et al. (eds.) 5th International Conference on Decision Support System Technology, pp. 164–169. EURO Working Group on Decision Support Systems, Funchal (2019)

    Google Scholar 

  11. Gerber, M.S.: Predicting crime using Twitter and kernel density estimation. Decis. Support Syst. 61, 115–125 (2014). https://doi.org/10.1016/j.dss.2014.02.003

    Article  Google Scholar 

  12. Nepomuceno, T.C.C., Costa, A.P.C.S.: Spatial visualization on patterns of disaggregate robberies. Oper. Res. (2019). https://doi.org/10.1007/s12351-019-00479-z

    Article  Google Scholar 

  13. Pereira, D.V.S., Mota, C.M.M., Andresen, M.A.: The homicide drop in Recife, Brazil: a study of crime concentrations and spatial patterns. Homicide Stud. 21, 21–38 (2017). https://doi.org/10.1177/1088767916634405

    Article  Google Scholar 

  14. Henriques de Gusmão, A.P., Aragão Pereira, R.M., Silva, M.M., da Costa Borba, B.F.: The use of a decision support system to aid a location problem regarding a public security facility. In: Freitas, P.S.A., Dargam, F., Moreno, J.M. (eds.) EmC-ICDSST 2019. LNBIP, vol. 348, pp. 15–27. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-18819-1_2

    Chapter  Google Scholar 

  15. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008). https://doi.org/10.1561/1500000011

    Article  Google Scholar 

  16. Kharrat, S., Kchaou, S.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37, 267–307 (2007)

    Google Scholar 

  17. Brum, H.B., Das Graças Volpe Nunes, M.: Building a sentiment corpus of tweets in Brazilian Portuguese. In: LREC 2018 - 11th International Conference on Language Resources and Evaluation, pp. 4167–4172 (2019)

    Google Scholar 

  18. Chathuranga, J., Ediriweera, S., Hasantha, R., et al.: Annotating opinions and opinion targets in student course feedback. In: LREC 2018 - 11th International Conference on Language Resources and Evaluation, pp. 2684–2688 (2019)

    Google Scholar 

  19. Turchi, M., Negri, M.: Automatic annotation of machine translation datasets with binary quality judgements. In: Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, pp. 1788–1792 (2014)

    Google Scholar 

  20. Win, S.S.M., Aung, T.N.: Automated text annotation for social media data during natural disasters. Adv. Sci. Technol. Eng. Syst. 3, 119–127 (2018). https://doi.org/10.25046/aj030214

    Article  Google Scholar 

  21. Walkowiak, T., Gniewkowski, M.: Distance measures for clustering of documents in a topic space. Adv. Intell. Syst. Comput. 987, 544–552 (2020). https://doi.org/10.1007/978-3-030-19501-4_54

    Article  Google Scholar 

  22. Cook, P., Brinton, L.J.: Building and evaluating web corpora representing national varieties of English. Lang. Resour. Eval. 51, 643–662 (2017). https://doi.org/10.1007/s10579-016-9378-z

    Article  Google Scholar 

  23. Hovy, E., Lavid, J.: Towards a ‘science’of corpus annotation: a new methodological challenge for corpus linguistics. Int. J. Transl. 22, 13–36 (2010)

    Google Scholar 

  24. Baccouche, A., Garcia-Zapirain, B., Elmaghraby, A.: Annotation technique for health-related tweets sentiment analysis. In: 2018 IEEE International Symposium on Signal Processing and Information Technology, ISSPIT 2018, pp. 382–387 (2019). https://doi.org/10.1109/ISSPIT.2018.8642685

  25. Zhang, H., Gan, W., Jiang, B.: Machine learning and lexicon based methods for sentiment classification: a survey. In: 2014 11th Web Information System and Application Conference (WISA). IEEE, New York, NY, USA, pp 262–265 (2014)

    Google Scholar 

  26. Neogi, P.P.G., Das, A.K., Goswami, S., Mustafi, J.: Topic modeling for text classification. In: Mandal, J.K., Bhattacharya, D. (eds.) Emerging Technology in Modelling and Graphics. AISC, vol. 937, pp. 395–407. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-7403-6_36

    Chapter  Google Scholar 

  27. Dahal, B., Kumar, S.A.P., Li, Z.: Topic modeling and sentiment analysis of global climate change tweets. Soc. Netw. Anal. Min. 9, 1–20 (2019). https://doi.org/10.1007/s13278-019-0568-8

    Article  Google Scholar 

  28. Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62, 305–311 (2019). https://doi.org/10.1109/TE.2019.2924385

    Article  Google Scholar 

  29. Groß-Klußmann, A., König, S., Ebner, M.: Buzzwords build momentum: global financial twitter sentiment and the aggregate stock market. Expert Syst. Appl. 136, 171–186 (2019). https://doi.org/10.1016/j.eswa.2019.06.027

    Article  Google Scholar 

  30. Srinivasan, B., Mohan Kumar, K.: Flock the similar users of twitter by using latent Dirichlet allocation. Int. J. Sci. Technol. Res. 8, 1421–1425 (2019)

    Google Scholar 

  31. Aggarwal, C.C.: Machine learning for text. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73531-3

    Book  MATH  Google Scholar 

  32. Blei, D., Carin, L., Dunson, D.: Probabilistic topic models. IEEE Signal Process. Mag. 27, 55–65 (2010). https://doi.org/10.1109/MSP.2010.938079

    Article  Google Scholar 

  33. Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl.-Based Syst. 89, 14–46 (2015). https://doi.org/10.1016/j.knosys.2015.06.015

    Article  Google Scholar 

  34. Yang, P., Chen, Y.: A survey on sentiment analysis by using machine learning methods. In: 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), pp 117–121. IEEE (2017)

    Google Scholar 

  35. Asghar, M.Z., Kundi, F.M., Ahmad, S., et al.: T-SAF: Twitter sentiment analysis framework using a hybrid classification scheme. Expert Syst. 35, 1–19 (2018). https://doi.org/10.1111/exsy.12233

    Article  Google Scholar 

  36. Khan, F.H., Bashir, S., Qamar, U.: TOM: Twitter opinion mining framework using hybrid classification scheme. Decis. Support Syst. 57, 245–257 (2014). https://doi.org/10.1016/j.dss.2013.09.004

    Article  Google Scholar 

  37. De Arruda, G.D., Roman, N.T., Monteiro, A.M.: An Annotated Corpus for Sentiment Analysis in Political News, pp. 101–110 (2015)

    Google Scholar 

  38. dos Santos, H.D.P., Woloszyn, V., Vieira, R., Blogset, B.R.: A Brazilian Portuguese blog corpus. In: LREC 2018 11th International Conference on Language Resources and Evaluation, pp. 661–664 (2019)

    Google Scholar 

  39. Freitas, C., Motta, E., Milidiú, R.L., César, J.: Sparkling Vampire… LOL! Annotating opinions in a book review corpus. In: Aluísio, S., Tagnin, S.E.O. (eds.) New Language Technologies and Linguistic Research: A Two-Way Road, pp. 128–146. Cambridge Scholars Publishing, Newcastle upon Tyne (2013)

    Google Scholar 

  40. de Souza, K.F., Pereira, M.H.R., Dalip, D.H.: UniLex: Método Léxico para Análise de Sentimentos Textuais sobre Conteúdo de Tweets em Português Brasileiro. Abakós 5, 79 (2017). https://doi.org/10.5752/p.2316-9451.2017v5n2p79

    Article  Google Scholar 

  41. Rosa, R.L., Rodriguez, D.Z., Bressan, G.: SentiMeter-Br: A new social web analysis metric to discover consumers’ sentiment. In: Proceedings of the International Symposium Consumer Electronics, ISCE, pp. 153–154 (2013). https://doi.org/10.1109/ISCE.2013.6570158

  42. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media Inc., Sebastopol (2009). https://www.nltk.org/

  43. Reinoso, G., Farooq, B., Forum, C.T.R.: Urban pulse analysis using big data. In: Canadian Transportation Research Forum 50th Annual Conference. Transportation Association of Canada (TAC), Montreal, p. 16 (2015)

    Google Scholar 

  44. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

Download references

Acknowledgment

This paper was funded in part by the Coordination for the Improvement of Higher Education Personnel (Brazil) – Finance Code 001, and by the National Council for Scientific and Technological Development (Brazil).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victor Diogho Heuer de Carvalho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

de Carvalho, V.D.H., Nepomuceno, T.C.C., Costa, A.P.C.S. (2020). An Automated Corpus Annotation Experiment in Brazilian Portuguese for Sentiment Analysis in Public Security. In: Moreno-Jiménez, J., Linden, I., Dargam, F., Jayawickrama, U. (eds) Decision Support Systems X: Cognitive Decision Support Systems and Technologies. ICDSST 2020. Lecture Notes in Business Information Processing, vol 384. Springer, Cham. https://doi.org/10.1007/978-3-030-46224-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46224-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46223-9

  • Online ISBN: 978-3-030-46224-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics