Abstract
In this paper we face the problem of intelligently analyze Twitter data. We propose a novel workflow based on Nonnegative Matrix Factorization (NMF) to collect, organize and analyze Twitter data. The proposed workflow firstly fetches tweets from Twitter (according to some search criteria) and processes them using text mining techniques; then it is able to extract latent features from tweets by using NMF, and finally it clusters tweets and extracts human-interpretable topics. We report some preliminary experiments demonstrating the effectiveness of the proposed workflow as a tool for Intelligent Data Analysis (IDA), indeed it is able to extract and visualize interpretable topics from some newly collected Twitter datasets, that are automatically grouped together according to these topics. Furthermore, we numerically investigate the influence of different initializations mechanisms for NMF algorithms on the factorization results when very sparse Twitter data are considered. The numerical comparisons confirm that NMF algorithms can be used as clustering method in place of the well known k-means.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
According to Twitter CEO Dick Costolo in October 2012. (http://www.telegraph.co.uk/technology/twitter/9945505/Twitter-in-numbers.html).
- 3.
- 4.
Tweets that a user received in her stream and shared to her followers.
- 5.
Text beginning with the symbol ‘@’ followed by any unique user name.
- 6.
- 7.
- 8.
- 9.
- 10.
Experiments have been run on a machine equipped with an Intel i5-480M 2.6 GHz CPU with 8 GiB of RAM.
References
Gupta, A., Joshi, A., Kumaraguru, P.: Identifying and characterizing user communities on Twitter during crisis events. In: Proceedings of the 2012 Workshop on Data-Driven User Behavioral Modelling and Mining from Social Media, DUBMMSM 2012, pp. 23–26. ACM, New York (2012)
Wong, F.M.F., Tan, C.W., Sen, S., Chiang, M.: Quantifying political leaning from tweets, retweets, and retweeters. IEEE Trans. Knowl. Data Eng. 28(8), 2158–2172 (2016)
Jin, L., Chen, Y., Wang, T., Hui, P., Vasilakos, A.V.: Understanding user behavior in online social networks: a survey. IEEE Commun. Mag. 51(9), 144–150 (2013)
Aggarwal, C.C., Zhai, C.: Mining Text Data. Springer Science & Business Media, New York (2012)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Gillis, N.: The why and how of nonnegative matrix factorization. In: Signoretto, M., Suykens, J.A.K., Argyriou, A. (eds.) Regularization, Optimization, Kernels, and Support Vector Machines. Machine Learning and Pattern Recognition Series. Chapman and Hall/CRC, Boca Raton (2014)
Casalino, G., Del Buono, N., Mencar, C.: Nonnegative matrix factorizations for intelligent data analysis. In: Naik, G.R. (ed.) Non-negative Matrix Factorization Techniques. SCT, pp. 49–74. Springer, Heidelberg (2016). doi:10.1007/978-3-662-48331-2_2
Casalino, G., Del Buono, N., Minervini, M.: Nonnegative matrix factorizations performing object detection and localization. Appl. Comp. Intell. Soft Comput. 2012, 15:1–15:19 (2012)
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley, Hoboken (2009)
Del Buono, N., Esposito, F., Fumarola, F., Boccarelli, A., Coluccia, M.: Breast cancer’s microarray data: pattern discovery using nonnegative matrix factorizations. In: Pardalos, P.M., Conca, P., Giuffrida, G., Nicosia, G. (eds.) MOD 2016. LNCS, vol. 10122, pp. 281–292. Springer, Cham (2016). doi:10.1007/978-3-319-51469-7_24
Kim, Y.-H., Seo, S., Ha, Y.-H., Lim, S., Yoon, Y.: Two applications of clustering techniques to Twitter: community detection and issue extraction. Discret. Dyn. Nat. Soc. 2013, 8 (2013)
Yan, X., Guo, J., Liu, S., Cheng, X., Wang, Y.: Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In: Proceedings of the SIAM International Conference on Data Mining SIAM 2013, pp. 749–757 (2013)
Arifin, A.Z., Sari, Y.A., Ratnasari, E.K., Mutrofinn, S.: Emotion detection of tweets in Indonesian language using non-negative matrix factorization. Int. J. Intell. Syst. Appl. 6(9), 8 (2014)
Saha, A., Sindhwani, V.: Learning evolving and emerging topics in social media: a dynamic NMF approach with temporal regularization. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM 2012, pp. 693–702. ACM, New York (2012)
Godfrey, D., Johns, C., Sadek, C., Meyer, C., Race, S.: A case study in text mining: interpreting Twitter data from world cup tweets (2014)
Alonso, J.M., Castiello, C., Mencar, C.: Interpretability of fuzzy systems: current research trends and prospects. In: Kacprzyk, J., Pedrycz, W. (eds.) Springer Handbook of Computational Intelligence, pp. 219–237. Springer, Heidelberg (2015). doi:10.1007/978-3-662-43505-2_14
Casalino, G., Del Buono, N., Mencar, C.: Subtractive clustering for seeding non-negative matrix factorizations. Inf. Sci. 257, 369–387 (2014)
Berry, M., Browne, M., Langville, A., Pauca, P., Plemmons, R.: Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Anal. 52(1), 155–173 (2007)
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, pp. 556–562. MIT Press, Cambridge (2001)
Lin, C.-J.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19(10), 2756–2779 (2007)
Kim, H., Park, H.: Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12), 1495–1502 (2007)
Pascual-Montano, A., Carazo, J.M., Kochi, K., Lehmann, D., Pascual-Marqui, R.D.: Nonsmooth nonnegative matrix factorization (nsNMF). IEEE Trans. Pattern Anal. Mach. Intell. 28(3), 403–415 (2006)
Albright, R., Cox, J., Duling, D., Langville, A., Meyer, C.: Algorithms, initializations, and convergence for the nonnegative matrix factorization. Technical report, NCSU Technical Report Math 81706 (2006)
Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recogn. 41, 1350–1362 (2008)
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003, pp. 267–273. ACM, New York (2003)
Shahnaz, F., Berry, M.W., Pauca, V.P., Plemmons, R.J.: Document clustering using nonnegative matrix factorization. Inf. Process. Manag. 42(2), 373–386 (2006)
Ding, C., He, X., Simon, H.D.: On the equivalence of nonnegative matrix factorization and k-means - spectral clustering. In: Proceedings of the SIAM Data Mining Conference, pp. 606–610. SIAM (2005)
Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 410–420 (2007)
Acknowledgements
This work has been supported in part by the GNCS (Gruppo Nazionale per il Calcolo Scientifico) of Istituto Nazionale di Alta Matematica Francesco Severi, P.le Aldo Moro, Roma, Italy.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Casalino, G., Castiello, C., Del Buono, N., Mencar, C. (2017). Intelligent Twitter Data Analysis Based on Nonnegative Matrix Factorizations. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10404. Springer, Cham. https://doi.org/10.1007/978-3-319-62392-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-62392-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62391-7
Online ISBN: 978-3-319-62392-4
eBook Packages: Computer ScienceComputer Science (R0)