Skip to main content
Log in

TexRep: A Text Mining Framework for Online Reputation Monitoring

  • Special Feature
  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

This work aims to understand, formalize and explore the scientific challenges of using unstructured text data from different Web sources for Online Reputation Monitoring. We here present TexRep, an adaptable text mining framework specifically tailored for Online Reputation Monitoring that can be reused in multiple application scenarios, from politics to finance. This framework is able to collect texts from online media, such as Twitter, and identify entities of interest and classify sentiment polarity and intensity. The framework supports multiple data aggregation methods, as well as visualization and modeling techniques that can be used for both descriptive analytics, such as analyze how political polls evolve over time, and predictive analytics, such as predict elections. We here present case studies that illustrate and validate TexRep for Online Reputation Monitoring. In particular, we provide an evaluation of TexRep Entity Filtering and Sentiment Analysis modules using well known external benchmarks. We also present an illustrative example of TexRep application in the political domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Van Riel, C.B.M., Fombrun, C.J., et al.: Essentials of Corporate Communication: Implementing Practices for Effective Reputation Management. Routledge (2007)

  2. Atvesson, M.: Organization: from substance to image? Org. Stud. 11(3), 373–394 (1990)

    Article  Google Scholar 

  3. Maynard, D., Bontcheva, K., Rout, D.: Challenges in developing opinion mining tools for social media. In: Proceedings of @ NLP can u tag# usergeneratedcontent (2012)

  4. Kaufmann, M., Portmann, E., Fathi, M.: A concept of semantics extraction from web data by induction of fuzzy ontologies. In: Electro/Information Technology (EIT), 2013 IEEE International Conference on, pp. 1–6. IEEE (2013)

  5. Portmann, E.: The FORA Framework: A Fuzzy Grassroots Ontology for Online Reputation Management. Springer, New York (2012)

    Google Scholar 

  6. Gonzalo, J.: Monitoring reputation in the wild online west. In: Proceedings of the 4th Spanish Conference on Information Retrieval, p. 1. ACM (2016)

  7. Amigó, E., de Albornoz, J.C., Chugur, I., Corujo, A., Gonzalo, J., Martín, T., Meij, E., de Rijke, M., Spina, D.: Overview of replab 2013: evaluating online reputation monitoring systems. CLEF (2013)

  8. Samangooei, S., Cohn, T., Gibbins, N., Niranjan, M.: Trendminer: an architecture for real time analysis of social media text. In: ICWSM (2012)

  9. Khalili, A., Auer, S., Ngomo, A.-C.N.: Context–lightweight text analytics using linked data. In: European Semantic Web Conference, pp. 628–643. Springer, New York (2014)

  10. Saleiro, P., Amir, S., Silva, M., Soares, C.: Popmine: Tracking political opinion on the web. In: Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM), 2015 IEEE International Conference on, pp. 1521–1526. IEEE (2015)

  11. Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of Wikipedia entities in web text. In: ACM, SIGKDD (2009)

  12. Spina, D., Amigó, E., Gonzalo, J.: Filter keywords and majority class strategies for company name disambiguation in twitter. In: CLEF, Springer, New York (2011)

  13. Munoz, A.D.D., Unanue, R.M., Garcıa-Plaza, A.P., Fresno, V.: Unsupervised real-time company name disambiguation in twitter. In: ICWSM Workshop on Real-Time Analysis and Mining of Social Streams, pp. 25–28 (2012)

  14. Christoforaki, M., Erunse, I., Yu, C.: Searching social updates for topic-centric entities. In: VLDS, pp. 34–39 (2011)

  15. Hangya, V., Farkas, R.: Filtering and polarity detection for reputation management on tweets. In: CLEF (Working Notes) (2013)

  16. Davis, A., Veloso, A., Da Silva, A.S., Meira Jr., W., Laender, A.H.F.: Named entity disambiguation in streaming data. In: ACL: Long Papers-Volume 1, pp. 815–824. Association for Computational Linguistics (2012)

  17. Habib, M.B., Van Keulen, M.: Twitterneed: a hybrid approach for named entity extraction and disambiguation for tweet. Nat. Lang. Eng. 22(03), 423–456 (2016)

    Article  Google Scholar 

  18. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706. ACM (2007)

  19. Ferragina, P., Scaiella, U.: Tagme: on-the-fly annotation of short text fragments (by Wikipedia entities). In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1625–1628. ACM (2010)

  20. Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. Trans. Assoc. Comput. Linguist. 2, 231–244 (2014)

    Google Scholar 

  21. Piccinno, F., Ferragina, P.: From TagME to WAT: a new entity annotator. In: Proceedings of the First International Workshop on Entity Recognition and Disambiguation, pp. 55–62. ACM (2014)

  22. He, Z., Shujie Liu, M., Li, M.Z., Zhang, L., Wang, H.: Learning entity representation for entity disambiguation. ACL 2, 30–34 (2013)

    Google Scholar 

  23. Fang, W., Zhang, J., Wang, D., Chen, Z., Li, M.: Entity disambiguation by knowledge and text jointly embedding. In: CoNLL 2016, p. 260 (2016)

  24. Moreno, J.G., Besançon, R., Beaumont, R., Dhondt, E., Ligozat, A.-L., Rosset, S., Tannier, X., Grau, B.: Combining word and entity embeddings for entity linking. In: European Semantic Web Conference, pp. 337–352. Springer, New York (2017)

  25. Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)

    Article  MathSciNet  Google Scholar 

  26. Rosenthal, S., Nakov, P., Kiritchenko, S., Mohammad, S.M., Ritter, A., Stoyanov, V.: Semeval-2015 task 10: Sentiment analysis in twitter. In: Proceedings of SemEval-2015 (2015)

  27. Mohammad, S., Kiritchenko, S., Zhu, X.: Nrc-canada: building the state-of-the-art in sentiment analysis of tweets. In: SemEva, pp. 321–327, Atlanta, GA (2013). Association for Computational Linguistics

  28. Kouloumpis, E., Wilson, T., Moore, J.D.: Twitter sentiment analysis: the good the bad and the omg! Icwsm 11, 538–541 (2011)

  29. Bamman, D., Smith, N.A.: Contextualized sarcasm detection on twitter. In: Proceedings of the 9th International Conference on Web and Social Media, pp. 574–77. AAAI Menlo Park, CA (2015)

  30. Liu, B.: Sentiment analysis and subjectivity. Handb. Nat. Lang. Process. 2, 627–666 (2010)

    Google Scholar 

  31. Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social web. J. Am. Soc. Inf. Sci. Technol. 63(1), 163–173 (2012)

    Article  Google Scholar 

  32. Bengio, Y.: Deep learning of representations: looking forward. In: Statistical Language and Speech Processing, pp. 1–37. Springer, New York (2013)

  33. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

  34. Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 142–150. Association for Computational Linguistics (2011)

  35. Labutov, I., Lipson, H.: Re-embedding words. ACL 2, 489–493 (2013)

    Google Scholar 

  36. Sun, Y., Lin, L., Yang, N., Ji, Z., Wang, X.: Radical-enhanced Chinese character embedding. In: Neural Information Processing, pp. 279–286. Springer, New York (2014)

  37. Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. ACL 1, 1555–1565 (2014)

    Google Scholar 

  38. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)

  39. Bošnjak, M., Oliveira, E., Martins, J., Rodrigues, E.M., Sarmento, L.: Twitterecho: a distributed focused crawler to support open research with twitter data. ACM (2012)

  40. Laboreiro, G., Sarmento, L., Teixeira, J., Oliveira, E.: Tokenizing micro-blogging messages using a text classification approach. In: Proceedings of the 4th Workshop on Analytics for Noisy Unstructured Text Data, AND 10 (2010)

  41. Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., Smith, N.A.: Part-of-speech tagging for twitter: annotation, features, and experiments. In: ACL: Human Language Technologies: Short Papers-Volume 2, pp. 42–47. Association for Computational Linguistics (2011)

  42. Bodnaruk, A., Loughran, T., McDonald, B.: Using 10-k text to gauge financial constraints. J. Financ. Quant. Anal. 50(04), 623–646 (2015)

    Article  Google Scholar 

  43. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: HLT/EMNLP, pp. 347–354 (2005)

  44. Saleiro, P., Gomes, L., Soares, C.: Sentiment aggregate functions for political opinion polling using microblog streams. In: Proceedings of the 9th International C* Conference on Computer Science and Software Engineering, pp. 44–50. ACM (2016)

  45. Saleiro, P., Rodrigues, E.M., Soares, C., Oliveira, E.: FEUP at semEval-2017 task 5: predicting sentiment polarity and intensity with financial word embeddings. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 895–899. Vancouver, Canada, August 2017. Association for Computational Linguistics (2017)

  46. Saleiro, P., Soares, C.: Learning from the news: predicting entity popularity on twitter. In: International Symposium on Intelligent Data Analysis, pp. 171–182. Springer, New York (2016)

  47. Saleiro, P., Teixeira, J., Soares, C., Oliveira, E.: Timemachine: entity-centric search and visualization of news archives. In: European Conference on Information Retrieval, pp. 845–848. Springer, New York (2016)

  48. Saleiro, P., Rei, L., Pasquali, A., Soares, C., Teixeira, J., Pinto, F., Zarmehri, M.N., Félix, C., Strecht, P.: Popstar at replab 2013: name ambiguity resolution on Twitter. In: CLEF (Working Notes) (2013)

  49. Amigó, E., Gonzalo, J., Verdejo, F.: A general evaluation measure for document organization tasks. In: Proceedings SIGIR (2013)

  50. Cortis, K., Freitas, A., Dauert, T., Huerlimann, M., Zarrouk, M., Handschuh, S., Davis, B.: Semeval-2017 task 5: Fine-grained sentiment analysis on financial microblogs and news. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 510–526, Vancouver. Association for Computational Linguistics (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro Saleiro.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saleiro, P., Rodrigues, E.M., Soares, C. et al. TexRep: A Text Mining Framework for Online Reputation Monitoring. New Gener. Comput. 35, 365–389 (2017). https://doi.org/10.1007/s00354-017-0021-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-017-0021-3

Keywords

Navigation