Abstract
In this work we propose a novel approach to estimate the home location of Twitter users. Given a list of Twitter users, we extract their timelines (up to 3,200) using the Twitter Application Programming Interface (API) service. We use Google Trends to obtain a list of cities in which the nouns of a specific Twitter user are more popular. Then, based on word popularity, we sample the geographical coordinates (latitude, longitude) over all the world surface. Finally, the Gaussian Mixture Model and the DBSCAN clustering algorithms are implemented to estimate the users’ geographic coordinates. The results are evaluated using the mean and median error computed on the Haversine distance. Competitive findings are achieved when compared with a baseline approach that estimated the users’ location given the Google Trends city mode.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Bakerman, J., Pazdernik, K., Wilson, A., Fairchild, G., Bahran, R.: Twitter geolocation: a hybrid approach. ACM Trans. Knowl. Discov. Data (TKDD) 12(3), 34 (2018). https://doi.org/10.1145/3178112
Burton, S.H., Tanner, K.W., Giraud-Carrier, C.G., West, J.H., Barnes, M.D.: “Right time, right place” health communication on Twitter: value and accuracy of location information. J. Med. Internet Res. 14(6), 156 (2012)
Earle, P., Guy, M., Buckmaster, R., Ostrum, C., Horvath, S., Vaughan, A.: OMG earthquake! Can Twitter improve earthquake response? Seismol. Res. Lett. 81(2), 246–251 (2010)
Eisenstein, J., O’Connor, B., Smith, N.A., Xing, E.P.: A latent variable model for geographic lexical variation. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1277–1287 (2010)
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96(34), 226–231 (1996)
Han, B., Cook, P., Baldwin, T.: Geolocation prediction in social media data by finding location indicative words. In: Proceedings of COLING 2012, pp. 1045–1062 (2012)
Middleton, S.E., Middleton, L., Modafferi, S.: Real-time crisis mapping of natural disasters using social media. IEEE Intell. Syst. 29(2), 9–17 (2013). https://doi.org/10.1109/MIS.2013.126
Schulz, A., Hadjakos, A., Paulheim, H., Nachtwey, J., Möhlhäuser, M.: A multi-indicator approach for geolocalization of Tweets. In: Seventh International AAAI Conference on Weblogs and Social Media (2013)
Zubiaga, A., Voss, A., Procter, R., Liakata, M., Wang, B., Tsakalidis, A.: Towards real-time, country-level location classification of worldwide Tweets. IEEE Trans. Knowl. Data Eng. 29(9), 2053–2066 (2017). https://doi.org/10.1109/TKDE.2017.2698463
Zola, P., Cortez, P., Carpita, M.: Twitter user geolocation using web country noun searches. Decis. Support Syst. 120, 50–59 (2019). https://doi.org/10.1016/j.dss.2019.03.006
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zola, P., Cortez, P., Tesconi, M. (2020). Using Google Trends, Gaussian Mixture Models and DBSCAN for the Estimation of Twitter User Home Location. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science(), vol 12253. Springer, Cham. https://doi.org/10.1007/978-3-030-58814-4_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-58814-4_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58813-7
Online ISBN: 978-3-030-58814-4
eBook Packages: Computer ScienceComputer Science (R0)