Skip to main content

Intelligent Twitter Data Analysis Based on Nonnegative Matrix Factorizations

  • Conference paper
  • First Online:
Book cover Computational Science and Its Applications – ICCSA 2017 (ICCSA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10404))

Included in the following conference series:

Abstract

In this paper we face the problem of intelligently analyze Twitter data. We propose a novel workflow based on Nonnegative Matrix Factorization (NMF) to collect, organize and analyze Twitter data. The proposed workflow firstly fetches tweets from Twitter (according to some search criteria) and processes them using text mining techniques; then it is able to extract latent features from tweets by using NMF, and finally it clusters tweets and extracts human-interpretable topics. We report some preliminary experiments demonstrating the effectiveness of the proposed workflow as a tool for Intelligent Data Analysis (IDA), indeed it is able to extract and visualize interpretable topics from some newly collected Twitter datasets, that are automatically grouped together according to these topics. Furthermore, we numerically investigate the influence of different initializations mechanisms for NMF algorithms on the factorization results when very sparse Twitter data are considered. The numerical comparisons confirm that NMF algorithms can be used as clustering method in place of the well known k-means.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    twitter.com.

  2. 2.

    According to Twitter CEO Dick Costolo in October 2012. (http://www.telegraph.co.uk/technology/twitter/9945505/Twitter-in-numbers.html).

  3. 3.

    https://dev.twitter.com/apps.

  4. 4.

    Tweets that a user received in her stream and shared to her followers.

  5. 5.

    Text beginning with the symbol ‘@’ followed by any unique user name.

  6. 6.

    http://docs.tweepy.org/en/v3.5.0/.

  7. 7.

    http://www.nltk.org/py-modindex.html.

  8. 8.

    http://scikitlearn.org.

  9. 9.

    http://nimfa.biolab.si/.

  10. 10.

    Experiments have been run on a machine equipped with an Intel i5-480M 2.6 GHz CPU with 8 GiB of RAM.

References

  1. Gupta, A., Joshi, A., Kumaraguru, P.: Identifying and characterizing user communities on Twitter during crisis events. In: Proceedings of the 2012 Workshop on Data-Driven User Behavioral Modelling and Mining from Social Media, DUBMMSM 2012, pp. 23–26. ACM, New York (2012)

    Google Scholar 

  2. Wong, F.M.F., Tan, C.W., Sen, S., Chiang, M.: Quantifying political leaning from tweets, retweets, and retweeters. IEEE Trans. Knowl. Data Eng. 28(8), 2158–2172 (2016)

    Article  Google Scholar 

  3. Jin, L., Chen, Y., Wang, T., Hui, P., Vasilakos, A.V.: Understanding user behavior in online social networks: a survey. IEEE Commun. Mag. 51(9), 144–150 (2013)

    Article  Google Scholar 

  4. Aggarwal, C.C., Zhai, C.: Mining Text Data. Springer Science & Business Media, New York (2012)

    Book  Google Scholar 

  5. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  Google Scholar 

  6. Gillis, N.: The why and how of nonnegative matrix factorization. In: Signoretto, M., Suykens, J.A.K., Argyriou, A. (eds.) Regularization, Optimization, Kernels, and Support Vector Machines. Machine Learning and Pattern Recognition Series. Chapman and Hall/CRC, Boca Raton (2014)

    Google Scholar 

  7. Casalino, G., Del Buono, N., Mencar, C.: Nonnegative matrix factorizations for intelligent data analysis. In: Naik, G.R. (ed.) Non-negative Matrix Factorization Techniques. SCT, pp. 49–74. Springer, Heidelberg (2016). doi:10.1007/978-3-662-48331-2_2

    Chapter  Google Scholar 

  8. Casalino, G., Del Buono, N., Minervini, M.: Nonnegative matrix factorizations performing object detection and localization. Appl. Comp. Intell. Soft Comput. 2012, 15:1–15:19 (2012)

    Google Scholar 

  9. Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley, Hoboken (2009)

    Book  Google Scholar 

  10. Del Buono, N., Esposito, F., Fumarola, F., Boccarelli, A., Coluccia, M.: Breast cancer’s microarray data: pattern discovery using nonnegative matrix factorizations. In: Pardalos, P.M., Conca, P., Giuffrida, G., Nicosia, G. (eds.) MOD 2016. LNCS, vol. 10122, pp. 281–292. Springer, Cham (2016). doi:10.1007/978-3-319-51469-7_24

    Chapter  Google Scholar 

  11. Kim, Y.-H., Seo, S., Ha, Y.-H., Lim, S., Yoon, Y.: Two applications of clustering techniques to Twitter: community detection and issue extraction. Discret. Dyn. Nat. Soc. 2013, 8 (2013)

    Article  Google Scholar 

  12. Yan, X., Guo, J., Liu, S., Cheng, X., Wang, Y.: Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In: Proceedings of the SIAM International Conference on Data Mining SIAM 2013, pp. 749–757 (2013)

    Google Scholar 

  13. Arifin, A.Z., Sari, Y.A., Ratnasari, E.K., Mutrofinn, S.: Emotion detection of tweets in Indonesian language using non-negative matrix factorization. Int. J. Intell. Syst. Appl. 6(9), 8 (2014)

    Google Scholar 

  14. Saha, A., Sindhwani, V.: Learning evolving and emerging topics in social media: a dynamic NMF approach with temporal regularization. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM 2012, pp. 693–702. ACM, New York (2012)

    Google Scholar 

  15. Godfrey, D., Johns, C., Sadek, C., Meyer, C., Race, S.: A case study in text mining: interpreting Twitter data from world cup tweets (2014)

    Google Scholar 

  16. Alonso, J.M., Castiello, C., Mencar, C.: Interpretability of fuzzy systems: current research trends and prospects. In: Kacprzyk, J., Pedrycz, W. (eds.) Springer Handbook of Computational Intelligence, pp. 219–237. Springer, Heidelberg (2015). doi:10.1007/978-3-662-43505-2_14

    Chapter  Google Scholar 

  17. Casalino, G., Del Buono, N., Mencar, C.: Subtractive clustering for seeding non-negative matrix factorizations. Inf. Sci. 257, 369–387 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  18. Berry, M., Browne, M., Langville, A., Pauca, P., Plemmons, R.: Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Anal. 52(1), 155–173 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  19. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, pp. 556–562. MIT Press, Cambridge (2001)

    Google Scholar 

  20. Lin, C.-J.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19(10), 2756–2779 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  21. Kim, H., Park, H.: Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12), 1495–1502 (2007)

    Article  Google Scholar 

  22. Pascual-Montano, A., Carazo, J.M., Kochi, K., Lehmann, D., Pascual-Marqui, R.D.: Nonsmooth nonnegative matrix factorization (nsNMF). IEEE Trans. Pattern Anal. Mach. Intell. 28(3), 403–415 (2006)

    Article  Google Scholar 

  23. Albright, R., Cox, J., Duling, D., Langville, A., Meyer, C.: Algorithms, initializations, and convergence for the nonnegative matrix factorization. Technical report, NCSU Technical Report Math 81706 (2006)

    Google Scholar 

  24. Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recogn. 41, 1350–1362 (2008)

    Article  MATH  Google Scholar 

  25. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003, pp. 267–273. ACM, New York (2003)

    Google Scholar 

  26. Shahnaz, F., Berry, M.W., Pauca, V.P., Plemmons, R.J.: Document clustering using nonnegative matrix factorization. Inf. Process. Manag. 42(2), 373–386 (2006)

    Article  MATH  Google Scholar 

  27. Ding, C., He, X., Simon, H.D.: On the equivalence of nonnegative matrix factorization and k-means - spectral clustering. In: Proceedings of the SIAM Data Mining Conference, pp. 606–610. SIAM (2005)

    Google Scholar 

  28. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 410–420 (2007)

    Google Scholar 

Download references

Acknowledgements

This work has been supported in part by the GNCS (Gruppo Nazionale per il Calcolo Scientifico) of Istituto Nazionale di Alta Matematica Francesco Severi, P.le Aldo Moro, Roma, Italy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriella Casalino .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Casalino, G., Castiello, C., Del Buono, N., Mencar, C. (2017). Intelligent Twitter Data Analysis Based on Nonnegative Matrix Factorizations. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10404. Springer, Cham. https://doi.org/10.1007/978-3-319-62392-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62392-4_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62391-7

  • Online ISBN: 978-3-319-62392-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics