Intelligent Twitter Data Analysis Based on Nonnegative Matrix Factorizations

Casalino, Gabriella; Castiello, Ciro; Del Buono, Nicoletta; Mencar, Corrado

doi:10.1007/978-3-319-62392-4_14

Gabriella Casalino²³,
Ciro Castiello²³,
Nicoletta Del Buono²⁴ &
…
Corrado Mencar²³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10404))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1779 Accesses
7 Citations

Abstract

In this paper we face the problem of intelligently analyze Twitter data. We propose a novel workflow based on Nonnegative Matrix Factorization (NMF) to collect, organize and analyze Twitter data. The proposed workflow firstly fetches tweets from Twitter (according to some search criteria) and processes them using text mining techniques; then it is able to extract latent features from tweets by using NMF, and finally it clusters tweets and extracts human-interpretable topics. We report some preliminary experiments demonstrating the effectiveness of the proposed workflow as a tool for Intelligent Data Analysis (IDA), indeed it is able to extract and visualize interpretable topics from some newly collected Twitter datasets, that are automatically grouped together according to these topics. Furthermore, we numerically investigate the influence of different initializations mechanisms for NMF algorithms on the factorization results when very sparse Twitter data are considered. The numerical comparisons confirm that NMF algorithms can be used as clustering method in place of the well known k-means.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
twitter.com.
2.
According to Twitter CEO Dick Costolo in October 2012. (http://www.telegraph.co.uk/technology/twitter/9945505/Twitter-in-numbers.html).
3.
https://dev.twitter.com/apps.
4.
Tweets that a user received in her stream and shared to her followers.
5.
Text beginning with the symbol ‘@’ followed by any unique user name.
6.
http://docs.tweepy.org/en/v3.5.0/.
7.
http://www.nltk.org/py-modindex.html.
8.
http://scikitlearn.org.
9.
http://nimfa.biolab.si/.
10.
Experiments have been run on a machine equipped with an Intel i5-480M 2.6 GHz CPU with 8 GiB of RAM.

References

Gupta, A., Joshi, A., Kumaraguru, P.: Identifying and characterizing user communities on Twitter during crisis events. In: Proceedings of the 2012 Workshop on Data-Driven User Behavioral Modelling and Mining from Social Media, DUBMMSM 2012, pp. 23–26. ACM, New York (2012)
Google Scholar
Wong, F.M.F., Tan, C.W., Sen, S., Chiang, M.: Quantifying political leaning from tweets, retweets, and retweeters. IEEE Trans. Knowl. Data Eng. 28(8), 2158–2172 (2016)
Article Google Scholar
Jin, L., Chen, Y., Wang, T., Hui, P., Vasilakos, A.V.: Understanding user behavior in online social networks: a survey. IEEE Commun. Mag. 51(9), 144–150 (2013)
Article Google Scholar
Aggarwal, C.C., Zhai, C.: Mining Text Data. Springer Science & Business Media, New York (2012)
Book Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Article Google Scholar
Gillis, N.: The why and how of nonnegative matrix factorization. In: Signoretto, M., Suykens, J.A.K., Argyriou, A. (eds.) Regularization, Optimization, Kernels, and Support Vector Machines. Machine Learning and Pattern Recognition Series. Chapman and Hall/CRC, Boca Raton (2014)
Google Scholar
Casalino, G., Del Buono, N., Mencar, C.: Nonnegative matrix factorizations for intelligent data analysis. In: Naik, G.R. (ed.) Non-negative Matrix Factorization Techniques. SCT, pp. 49–74. Springer, Heidelberg (2016). doi:10.1007/978-3-662-48331-2_2
Chapter Google Scholar
Casalino, G., Del Buono, N., Minervini, M.: Nonnegative matrix factorizations performing object detection and localization. Appl. Comp. Intell. Soft Comput. 2012, 15:1–15:19 (2012)
Google Scholar
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley, Hoboken (2009)
Book Google Scholar
Del Buono, N., Esposito, F., Fumarola, F., Boccarelli, A., Coluccia, M.: Breast cancer’s microarray data: pattern discovery using nonnegative matrix factorizations. In: Pardalos, P.M., Conca, P., Giuffrida, G., Nicosia, G. (eds.) MOD 2016. LNCS, vol. 10122, pp. 281–292. Springer, Cham (2016). doi:10.1007/978-3-319-51469-7_24
Chapter Google Scholar
Kim, Y.-H., Seo, S., Ha, Y.-H., Lim, S., Yoon, Y.: Two applications of clustering techniques to Twitter: community detection and issue extraction. Discret. Dyn. Nat. Soc. 2013, 8 (2013)
Article Google Scholar
Yan, X., Guo, J., Liu, S., Cheng, X., Wang, Y.: Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In: Proceedings of the SIAM International Conference on Data Mining SIAM 2013, pp. 749–757 (2013)
Google Scholar
Arifin, A.Z., Sari, Y.A., Ratnasari, E.K., Mutrofinn, S.: Emotion detection of tweets in Indonesian language using non-negative matrix factorization. Int. J. Intell. Syst. Appl. 6(9), 8 (2014)
Google Scholar
Saha, A., Sindhwani, V.: Learning evolving and emerging topics in social media: a dynamic NMF approach with temporal regularization. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM 2012, pp. 693–702. ACM, New York (2012)
Google Scholar
Godfrey, D., Johns, C., Sadek, C., Meyer, C., Race, S.: A case study in text mining: interpreting Twitter data from world cup tweets (2014)
Google Scholar
Alonso, J.M., Castiello, C., Mencar, C.: Interpretability of fuzzy systems: current research trends and prospects. In: Kacprzyk, J., Pedrycz, W. (eds.) Springer Handbook of Computational Intelligence, pp. 219–237. Springer, Heidelberg (2015). doi:10.1007/978-3-662-43505-2_14
Chapter Google Scholar
Casalino, G., Del Buono, N., Mencar, C.: Subtractive clustering for seeding non-negative matrix factorizations. Inf. Sci. 257, 369–387 (2014)
Article MathSciNet MATH Google Scholar
Berry, M., Browne, M., Langville, A., Pauca, P., Plemmons, R.: Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Anal. 52(1), 155–173 (2007)
Article MathSciNet MATH Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, pp. 556–562. MIT Press, Cambridge (2001)
Google Scholar
Lin, C.-J.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19(10), 2756–2779 (2007)
Article MathSciNet MATH Google Scholar
Kim, H., Park, H.: Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12), 1495–1502 (2007)
Article Google Scholar
Pascual-Montano, A., Carazo, J.M., Kochi, K., Lehmann, D., Pascual-Marqui, R.D.: Nonsmooth nonnegative matrix factorization (nsNMF). IEEE Trans. Pattern Anal. Mach. Intell. 28(3), 403–415 (2006)
Article Google Scholar
Albright, R., Cox, J., Duling, D., Langville, A., Meyer, C.: Algorithms, initializations, and convergence for the nonnegative matrix factorization. Technical report, NCSU Technical Report Math 81706 (2006)
Google Scholar
Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recogn. 41, 1350–1362 (2008)
Article MATH Google Scholar
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003, pp. 267–273. ACM, New York (2003)
Google Scholar
Shahnaz, F., Berry, M.W., Pauca, V.P., Plemmons, R.J.: Document clustering using nonnegative matrix factorization. Inf. Process. Manag. 42(2), 373–386 (2006)
Article MATH Google Scholar
Ding, C., He, X., Simon, H.D.: On the equivalence of nonnegative matrix factorization and k-means - spectral clustering. In: Proceedings of the SIAM Data Mining Conference, pp. 606–610. SIAM (2005)
Google Scholar
Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 410–420 (2007)
Google Scholar

Download references

Acknowledgements

This work has been supported in part by the GNCS (Gruppo Nazionale per il Calcolo Scientifico) of Istituto Nazionale di Alta Matematica Francesco Severi, P.le Aldo Moro, Roma, Italy.

Author information

Authors and Affiliations

Department of Informatics, University of Bari Aldo Moro, 70125, Bari, Italy
Gabriella Casalino, Ciro Castiello & Corrado Mencar
Department of Mathematics, University of Bari Aldo Moro, 70125, Bari, Italy
Nicoletta Del Buono

Authors

Gabriella Casalino
View author publications
You can also search for this author in PubMed Google Scholar
Ciro Castiello
View author publications
You can also search for this author in PubMed Google Scholar
Nicoletta Del Buono
View author publications
You can also search for this author in PubMed Google Scholar
Corrado Mencar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriella Casalino .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Italy
Beniamino Murgante
Covenant University, Ota, Nigeria
Sanjay Misra
University of Trieste, Trieste, Italy
Giuseppe Borruso
Polytechnic University of Bari, Bari, Italy
Carmelo M. Torre
University of Minho, Braga, Portugal
Ana Maria A.C. Rocha
Monash University, Clayton, Victoria, Australia
David Taniar
Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan
Saint Petersburg State University, Saint Petersburg, Russia
Elena Stankova
University of Trieste, Trieste, Italy
Alfredo Cuzzocrea

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Casalino, G., Castiello, C., Del Buono, N., Mencar, C. (2017). Intelligent Twitter Data Analysis Based on Nonnegative Matrix Factorizations. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10404. Springer, Cham. https://doi.org/10.1007/978-3-319-62392-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-62392-4_14
Published: 06 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62391-7
Online ISBN: 978-3-319-62392-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics