skip to main content
10.1145/3018896.3018969acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccConference Proceedingsconference-collections
research-article

A machine learning based approach to identify geo-location of Twitter users

Published: 22 March 2017 Publication History

Abstract

Twitter, a popular microblogging platform, has attracted great attention. Twitter enables people from all over the world to interact in an extremely personal way. The immense quantity of user-generated text messages become available on Twitter that could potentially serve as an important source of information for researchers and practitioners. The information available on Twitter may be utilized for many purposes, such as event detection, public health and crisis management. In order to effectively coordinate such activities, the identification of Twitter users' geo-locations is extremely important. Though online social networks can provide some sort of geo-location information based on GPS coordinates, Twitter suffers from geo-location sparseness problem. The identification of Twitter users' geo-location based on the content of send out messages, becomes extremely important. In this regard, this paper presents a machine learning based approach to the problem. In this study, our corpora is represented as a word vector. To obtain a classification scheme with high predictive performance, the performance of five classification algorithms, three ensemble methods and two feature selection methods are evaluated. Among the compared algorithms, the highest results (84.85%) is achieved by AdaBoost ensemble of Random Forest, when the feature set is selected with the use of consistency-based feature selection method in conjunction with best first search.

References

[1]
About.twitter.com. (2016). Company | About. {online} Available at: https://about.twitter.com/company {Accessed 5 Oct. 2016}.
[2]
Java, A., Song, X., Finin, T. and Tseng, B. 2007. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD Conference (San Jose, USA, August 12--15, 2007). KDD '07. ACM, New York, NY, 56--65. .
[3]
Mahmud, J., Nichols, J. and Drews, C. 2014. Home location identification of twitter users. ACM Transactions on Intelligent Systems and Technology. 5(3) (Sept. 2014), Article No. 47. .
[4]
Cheng, Z., Caverlee, J. and Lee, K. 2010. You are where you tweet: a content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (Toronto, Canada, October 26--30, 2010). CIKM '10. ACM, New York, NY, 759--768.
[5]
Hecht, B., Hong, L., Suh, B. and Chi, E.D. 2011. Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, May 7--12, 2011). CHI '11. ACM, New York, NY, 237--246.
[6]
Davis Jr, C.A., Pappa, G.L., de Oliveira, D.R.R. and Arcanjo, F.L. 2011. Inferring the location of twitter messages based on user relationships. Transactions in GIS. 15(6) (Dec. 2011), 735--751.
[7]
Aggarwal, C.C. and Zhai, C.X. 2012. A survey of text classification algorithms. In Mining text data, C.C.Aggarwal and C.X. Zhai, Ed. Springer-Verlag, Berlin, 77--128.
[8]
Onan, A., Korukoğlu, S. and Bulut, H. 2016. Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications. 57 (Sept. 2016), 232--247.
[9]
Kotsiantis, S.B., Zaharakis, I.D. and Pintelas, P.E. 2006. Machine learning: A review of classification and combination techniques. Artificial Intelligence Review. 20(3), 159--190.
[10]
Eisenstein, J., O'Connor, B., Smith, N.A. and Xing, E.P. 2010. A latent variable model for geographic lexical variation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (Massachusetts, USA, October 9--11, 2010). EMNLP '10. ACM, New York, NY, 1277--1287.
[11]
Chandra, S., Khan, L. and Muhaya, F.B. 2011. Estimating twitter user location using social interactions-a content based approach. In Proceedings of the IEEE Third International Conference on Social Computing (Boston, USA, October 9--11, 2011). IEEE, New York, NY, 838--843.
[12]
Kinsella, S., Murdock, V. and O'Hare, N. 2011. I'm eating a sandwich in Glasgow: modeling locations with tweets. In Proceedings of the Third International Workshop on Search and Mining User-Generated Contents (Glasgow, UK, October 24--28, 2011). ACM, New York, NY, 61--68.
[13]
Chang, H-W., Lee, D., Eltaher, M. and Lee, J. 2012. @Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage. In Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (Istanbul, Turkey, August 26--29, 2012). IEEE, New York, NY, 111--118.
[14]
Han, B., Cook, P. and Baldwin, T. 2014. Text-based twitter user geolocation prediction. Journal of Artificial Intelligence Research. 49(1) (January 2014), 451--500.
[15]
Popescu, A. and Grefenstette, G. 2010. Mining user home location and gender from flickr tags. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media (California, USA, May 23--26, 2010). AAII Press, New York, NY, 307--310.
[16]
Gao, H., Tang, J. and Liu, H. 2012. Exploring social-historical ties on location-based social networks. In Proceedings of the 6th International Conference on Weblogs and Social Media (California, USA, May 23--26, 2010). AAAI Press, New York, NY, 114--121.
[17]
Sakaki, T., Okazaki, M. and Matsuo, Y. 2010. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web (NC, USA, April 26--30, 2010). ACM, New York, NY, 851--860.
[18]
MacEachren, A.M., Robinson, A.C., Jaiswal, A., Pezanowski, S., Savelyev, A., Blanford, J. and Mitra, P. 2011. Geo-Twitter Analytics: Applications in Crisis Management. In Proceedings of the 25th International Cartographic Conference. Paris, France, 1--8.
[19]
Dredze, M., Paul, M.J., Bergsma, S. and Tran, H. 2013. Carmen: a twitter geolocation system with applications to public health. In Proceedings of AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI. (HIAI), 1--5.
[20]
Hall, M.A. 1999. Correlation-based feature selection for machine learning. Doctoral Thesis, University of Waikato.
[21]
Hall, M.A. and Smith, L.A. 1999. Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference (Florida, USA, May 16--18, 1999). AAAI Press, New York, NY, 235--239.
[22]
Hall, M.A. and Holmes, M. 2003. Benchmarking attribute selection techniques for data mining. IEEE Transactions on Knowledge and Data Engineering. 15(6), 1437--1447.
[23]
John, G.H. and Langley, P. 1995. Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (Montreal, Canada, August 18--20, 1995). Morgan Kaufmann, San Francisco, 338--345.
[24]
Han, J., Kamber, M. and Pei, J. 2011. Data mining: concepts and techniques. Morgan Kaufmann, San Francisco.
[25]
Kantardzic, M. 2011. Data mining: concepts, models, methods and algorithms. Wiley-IEEE Press, New York.
[26]
Breiman, L. 2001. Random forests. Machine Learning. 45(1), 5--32.
[27]
Vapnik, V. 1995. The nature of statistical learning theory. Springer, New York.
[28]
Breiman, L. 1996. Bagging predictors. Machine Learning. 4(2), 123--140.
[29]
Rokach, L. 2010. Ensemble-based classifiers. Artificial Intelligence Review. 33, 1--39.
[30]
Guo, H. and Viktor, H.L. 2004. Boosting with data generation: improving the classification of hard to learn examples. Lecture Notes in Artificial Intelligence. 3029, 1082--1091.
[31]
Onan, A. 2015. On the performance of ensemble learning for automated diagnosis of breast cancer. In Artificial Intelligence Perspectives and Applications, R.Silhavy, R.Senkerik, Z.K. Oplatkova, Z.Prokopova and P. Silhavy, Ed. Springer-Verlag, Berlin, 119--129.
[32]
Freund, Y. and Schapire, R.E. 1996. Experiments with a new boosting algorithm. In Proceedings of the Thirteent International Conference on Machine Learning (Bari, Italy), 148--156.
[33]
Ho, T.K. 1998. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence. 20(8), 832--844.

Cited By

View all
  • (2023)Predicting Election Results Via Social Media: A Case Study for 2018 Turkish Presidential ElectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.317805210:5(2362-2373)Online publication date: Oct-2023
  • (2022)Efficient and Reliable Geocoding of German Twitter Data to Enable Spatial Data Linkage to Official Statistics and Other Data SourcesFrontiers in Sociology10.3389/fsoc.2022.9101117Online publication date: 9-Jun-2022
  • (2021)Extraction and Analysis of Regionally Specific Behavioral Facilitation Information in the Event of a Large-scale DisasterIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3486622.3493991(538-543)Online publication date: 14-Dec-2021
  • Show More Cited By

Index Terms

  1. A machine learning based approach to identify geo-location of Twitter users

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICC '17: Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing
    March 2017
    1349 pages
    ISBN:9781450347747
    DOI:10.1145/3018896
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 March 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. geo-location identification
    2. location-based estimation
    3. machine learning
    4. text mining

    Qualifiers

    • Research-article

    Conference

    ICC '17

    Acceptance Rates

    ICC '17 Paper Acceptance Rate 213 of 590 submissions, 36%;
    Overall Acceptance Rate 213 of 590 submissions, 36%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Predicting Election Results Via Social Media: A Case Study for 2018 Turkish Presidential ElectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.317805210:5(2362-2373)Online publication date: Oct-2023
    • (2022)Efficient and Reliable Geocoding of German Twitter Data to Enable Spatial Data Linkage to Official Statistics and Other Data SourcesFrontiers in Sociology10.3389/fsoc.2022.9101117Online publication date: 9-Jun-2022
    • (2021)Extraction and Analysis of Regionally Specific Behavioral Facilitation Information in the Event of a Large-scale DisasterIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3486622.3493991(538-543)Online publication date: 14-Dec-2021
    • (2019)The Use of Artificial Intelligence in Disaster Management - A Systematic Literature Review2019 International Conference on Information and Communication Technologies for Disaster Management (ICT-DM)10.1109/ICT-DM47966.2019.9032935(1-8)Online publication date: Dec-2019
    • (2018)Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature SetsBalkan Journal of Electrical and Computer Engineering10.17694/bajece.4195386:2(69-77)Online publication date: 30-Apr-2018
    • (2017)Sarcasm Identification on Twitter: A Machine Learning ApproachArtificial Intelligence Trends in Intelligent Systems10.1007/978-3-319-57261-1_37(374-383)Online publication date: 7-Apr-2017

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media