Skip to main content

Using Google Trends, Gaussian Mixture Models and DBSCAN for the Estimation of Twitter User Home Location

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2020 (ICCSA 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12253))

Included in the following conference series:

Abstract

In this work we propose a novel approach to estimate the home location of Twitter users. Given a list of Twitter users, we extract their timelines (up to 3,200) using the Twitter Application Programming Interface (API) service. We use Google Trends to obtain a list of cities in which the nouns of a specific Twitter user are more popular. Then, based on word popularity, we sample the geographical coordinates (latitude, longitude) over all the world surface. Finally, the Gaussian Mixture Model and the DBSCAN clustering algorithms are implemented to estimate the users’ geographic coordinates. The results are evaluated using the mean and median error computed on the Haversine distance. Competitive findings are achieved when compared with a baseline approach that estimated the users’ location given the Google Trends city mode.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.openstreetmap.org.

  2. 2.

    https://www.wikidata.org/wiki/Wikidata:Main_Page.

  3. 3.

    https://github.com/CostRagno/geopolygon.

  4. 4.

    https://github.com/paolazola/Twitter-country-geolocation.

References

  1. Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)

    Article  MathSciNet  Google Scholar 

  2. Bakerman, J., Pazdernik, K., Wilson, A., Fairchild, G., Bahran, R.: Twitter geolocation: a hybrid approach. ACM Trans. Knowl. Discov. Data (TKDD) 12(3), 34 (2018). https://doi.org/10.1145/3178112

    Article  Google Scholar 

  3. Burton, S.H., Tanner, K.W., Giraud-Carrier, C.G., West, J.H., Barnes, M.D.: “Right time, right place” health communication on Twitter: value and accuracy of location information. J. Med. Internet Res. 14(6), 156 (2012)

    Article  Google Scholar 

  4. Earle, P., Guy, M., Buckmaster, R., Ostrum, C., Horvath, S., Vaughan, A.: OMG earthquake! Can Twitter improve earthquake response? Seismol. Res. Lett. 81(2), 246–251 (2010)

    Article  Google Scholar 

  5. Eisenstein, J., O’Connor, B., Smith, N.A., Xing, E.P.: A latent variable model for geographic lexical variation. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1277–1287 (2010)

    Google Scholar 

  6. Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96(34), 226–231 (1996)

    Google Scholar 

  7. Han, B., Cook, P., Baldwin, T.: Geolocation prediction in social media data by finding location indicative words. In: Proceedings of COLING 2012, pp. 1045–1062 (2012)

    Google Scholar 

  8. Middleton, S.E., Middleton, L., Modafferi, S.: Real-time crisis mapping of natural disasters using social media. IEEE Intell. Syst. 29(2), 9–17 (2013). https://doi.org/10.1109/MIS.2013.126

    Article  Google Scholar 

  9. Schulz, A., Hadjakos, A., Paulheim, H., Nachtwey, J., Möhlhäuser, M.: A multi-indicator approach for geolocalization of Tweets. In: Seventh International AAAI Conference on Weblogs and Social Media (2013)

    Google Scholar 

  10. Zubiaga, A., Voss, A., Procter, R., Liakata, M., Wang, B., Tsakalidis, A.: Towards real-time, country-level location classification of worldwide Tweets. IEEE Trans. Knowl. Data Eng. 29(9), 2053–2066 (2017). https://doi.org/10.1109/TKDE.2017.2698463

    Article  Google Scholar 

  11. Zola, P., Cortez, P., Carpita, M.: Twitter user geolocation using web country noun searches. Decis. Support Syst. 120, 50–59 (2019). https://doi.org/10.1016/j.dss.2019.03.006

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paola Zola .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zola, P., Cortez, P., Tesconi, M. (2020). Using Google Trends, Gaussian Mixture Models and DBSCAN for the Estimation of Twitter User Home Location. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science(), vol 12253. Springer, Cham. https://doi.org/10.1007/978-3-030-58814-4_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58814-4_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58813-7

  • Online ISBN: 978-3-030-58814-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics