research-article

A machine learning based approach to identify geo-location of Twitter users

Author:

Aytuğ OnanAuthors Info & Claims

ICC '17: Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing

Article No.: 69, Pages 1 - 6

https://doi.org/10.1145/3018896.3018969

Published: 22 March 2017 Publication History

Abstract

Twitter, a popular microblogging platform, has attracted great attention. Twitter enables people from all over the world to interact in an extremely personal way. The immense quantity of user-generated text messages become available on Twitter that could potentially serve as an important source of information for researchers and practitioners. The information available on Twitter may be utilized for many purposes, such as event detection, public health and crisis management. In order to effectively coordinate such activities, the identification of Twitter users' geo-locations is extremely important. Though online social networks can provide some sort of geo-location information based on GPS coordinates, Twitter suffers from geo-location sparseness problem. The identification of Twitter users' geo-location based on the content of send out messages, becomes extremely important. In this regard, this paper presents a machine learning based approach to the problem. In this study, our corpora is represented as a word vector. To obtain a classification scheme with high predictive performance, the performance of five classification algorithms, three ensemble methods and two feature selection methods are evaluated. Among the compared algorithms, the highest results (84.85%) is achieved by AdaBoost ensemble of Random Forest, when the feature set is selected with the use of consistency-based feature selection method in conjunction with best first search.

References

[1]

About.twitter.com. (2016). Company | About. {online} Available at: https://about.twitter.com/company {Accessed 5 Oct. 2016}.

[2]

Java, A., Song, X., Finin, T. and Tseng, B. 2007. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD Conference (San Jose, USA, August 12--15, 2007). KDD '07. ACM, New York, NY, 56--65. .

Digital Library

[3]

Mahmud, J., Nichols, J. and Drews, C. 2014. Home location identification of twitter users. ACM Transactions on Intelligent Systems and Technology. 5(3) (Sept. 2014), Article No. 47. .

Digital Library

[4]

Cheng, Z., Caverlee, J. and Lee, K. 2010. You are where you tweet: a content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (Toronto, Canada, October 26--30, 2010). CIKM '10. ACM, New York, NY, 759--768.

Digital Library

[5]

Hecht, B., Hong, L., Suh, B. and Chi, E.D. 2011. Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, May 7--12, 2011). CHI '11. ACM, New York, NY, 237--246.

Digital Library

[6]

Davis Jr, C.A., Pappa, G.L., de Oliveira, D.R.R. and Arcanjo, F.L. 2011. Inferring the location of twitter messages based on user relationships. Transactions in GIS. 15(6) (Dec. 2011), 735--751.

[7]

Aggarwal, C.C. and Zhai, C.X. 2012. A survey of text classification algorithms. In Mining text data, C.C.Aggarwal and C.X. Zhai, Ed. Springer-Verlag, Berlin, 77--128.

[8]

Onan, A., Korukoğlu, S. and Bulut, H. 2016. Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications. 57 (Sept. 2016), 232--247.

Digital Library

[9]

Kotsiantis, S.B., Zaharakis, I.D. and Pintelas, P.E. 2006. Machine learning: A review of classification and combination techniques. Artificial Intelligence Review. 20(3), 159--190.

Digital Library

[10]

Eisenstein, J., O'Connor, B., Smith, N.A. and Xing, E.P. 2010. A latent variable model for geographic lexical variation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (Massachusetts, USA, October 9--11, 2010). EMNLP '10. ACM, New York, NY, 1277--1287.

Digital Library

[11]

Chandra, S., Khan, L. and Muhaya, F.B. 2011. Estimating twitter user location using social interactions-a content based approach. In Proceedings of the IEEE Third International Conference on Social Computing (Boston, USA, October 9--11, 2011). IEEE, New York, NY, 838--843.

[12]

Kinsella, S., Murdock, V. and O'Hare, N. 2011. I'm eating a sandwich in Glasgow: modeling locations with tweets. In Proceedings of the Third International Workshop on Search and Mining User-Generated Contents (Glasgow, UK, October 24--28, 2011). ACM, New York, NY, 61--68.

Digital Library

[13]

Chang, H-W., Lee, D., Eltaher, M. and Lee, J. 2012. @Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage. In Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (Istanbul, Turkey, August 26--29, 2012). IEEE, New York, NY, 111--118.

Digital Library

[14]

Han, B., Cook, P. and Baldwin, T. 2014. Text-based twitter user geolocation prediction. Journal of Artificial Intelligence Research. 49(1) (January 2014), 451--500.

Digital Library

[15]

Popescu, A. and Grefenstette, G. 2010. Mining user home location and gender from flickr tags. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media (California, USA, May 23--26, 2010). AAII Press, New York, NY, 307--310.

[16]

Gao, H., Tang, J. and Liu, H. 2012. Exploring social-historical ties on location-based social networks. In Proceedings of the 6th International Conference on Weblogs and Social Media (California, USA, May 23--26, 2010). AAAI Press, New York, NY, 114--121.

[17]

Sakaki, T., Okazaki, M. and Matsuo, Y. 2010. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web (NC, USA, April 26--30, 2010). ACM, New York, NY, 851--860.

Digital Library

[18]

MacEachren, A.M., Robinson, A.C., Jaiswal, A., Pezanowski, S., Savelyev, A., Blanford, J. and Mitra, P. 2011. Geo-Twitter Analytics: Applications in Crisis Management. In Proceedings of the 25th International Cartographic Conference. Paris, France, 1--8.

[19]

Dredze, M., Paul, M.J., Bergsma, S. and Tran, H. 2013. Carmen: a twitter geolocation system with applications to public health. In Proceedings of AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI. (HIAI), 1--5.

[20]

Hall, M.A. 1999. Correlation-based feature selection for machine learning. Doctoral Thesis, University of Waikato.

[21]

Hall, M.A. and Smith, L.A. 1999. Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference (Florida, USA, May 16--18, 1999). AAAI Press, New York, NY, 235--239.

Digital Library

[22]

Hall, M.A. and Holmes, M. 2003. Benchmarking attribute selection techniques for data mining. IEEE Transactions on Knowledge and Data Engineering. 15(6), 1437--1447.

Digital Library

[23]

John, G.H. and Langley, P. 1995. Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (Montreal, Canada, August 18--20, 1995). Morgan Kaufmann, San Francisco, 338--345.

Digital Library

[24]

Han, J., Kamber, M. and Pei, J. 2011. Data mining: concepts and techniques. Morgan Kaufmann, San Francisco.

Digital Library

[25]

Kantardzic, M. 2011. Data mining: concepts, models, methods and algorithms. Wiley-IEEE Press, New York.

Digital Library

[26]

Breiman, L. 2001. Random forests. Machine Learning. 45(1), 5--32.

Digital Library

[27]

Vapnik, V. 1995. The nature of statistical learning theory. Springer, New York.

Digital Library

[28]

Breiman, L. 1996. Bagging predictors. Machine Learning. 4(2), 123--140.

Digital Library

[29]

Rokach, L. 2010. Ensemble-based classifiers. Artificial Intelligence Review. 33, 1--39.

Digital Library

[30]

Guo, H. and Viktor, H.L. 2004. Boosting with data generation: improving the classification of hard to learn examples. Lecture Notes in Artificial Intelligence. 3029, 1082--1091.

Digital Library

[31]

Onan, A. 2015. On the performance of ensemble learning for automated diagnosis of breast cancer. In Artificial Intelligence Perspectives and Applications, R.Silhavy, R.Senkerik, Z.K. Oplatkova, Z.Prokopova and P. Silhavy, Ed. Springer-Verlag, Berlin, 119--129.

[32]

Freund, Y. and Schapire, R.E. 1996. Experiments with a new boosting algorithm. In Proceedings of the Thirteent International Conference on Machine Learning (Bari, Italy), 148--156.

Digital Library

[33]

Ho, T.K. 1998. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence. 20(8), 832--844.

Digital Library

Cited By

Bayrak CKutlu M(2023)Predicting Election Results Via Social Media: A Case Study for 2018 Turkish Presidential ElectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.317805210:5(2362-2373)Online publication date: Oct-2023
https://doi.org/10.1109/TCSS.2022.3178052
Nguyen HTsolak DKarmann AKnauff SKühne S(2022)Efficient and Reliable Geocoding of German Twitter Data to Enable Spatial Data Linkage to Official Statistics and Other Data SourcesFrontiers in Sociology10.3389/fsoc.2022.9101117Online publication date: 9-Jun-2022
https://doi.org/10.3389/fsoc.2022.910111
Yamamoto FSuzuki YNadamoto A(2021)Extraction and Analysis of Regionally Specific Behavioral Facilitation Information in the Event of a Large-scale DisasterIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3486622.3493991(538-543)Online publication date: 14-Dec-2021
https://dl.acm.org/doi/10.1145/3486622.3493991
Show More Cited By

Index Terms

A machine learning based approach to identify geo-location of Twitter users
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

You are where you tweet: a content-based approach to geo-locating twitter users
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

We propose and evaluate a probabilistic framework for estimating a Twitter user's city-level location based purely on the content of the user's tweets, even in the absence of any other geospatial cues. By augmenting the massive human-powered sensing ...
Review On Sentiment Analysis of Twitter Posts About News Headlines Using Machine Learning Approaches and Naïve Bayes Classifier
ICCAE 2020: Proceedings of the 2020 12th International Conference on Computer and Automation Engineering

In today's world there are so much micro blogging sites, among all twitter is one of the popular site. It has become an important part for all individuals, politicians, companies, celebrities, etc. Almost all the major news outlets have Twitter account ...
Predicting Trends in the Twitter Social Network: A Machine Learning Approach
Swarm, Evolutionary, and Memetic Computing
Abstract
The Twitter microblogging site is one of the most popular websites in the Web today, where millions of users post real-time messages (tweets) on different topics of their interest. The content that becomes popular in Twitter (i.e., discussed by a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICC '17: Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing

March 2017

1349 pages

ISBN:9781450347747

DOI:10.1145/3018896

General Chair:
Hani Hamdan
University of Paris-Saclay, Paris, France
,
Program Chairs:
Homero Toral-Cruz
University of Quintana Roo, Mexico
,
Sedat Akleylek
Ondokuz Mayis University, Turkey
,
Hamid Mcheick
Université du Québec, Canada
,
Publications Chair:
Djallel Eddine Boubiche
University of Batna, Algeria

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICC '17

ICC '17: Second International Conference on Internet of Things, Data and Cloud Computing

March 22 - 23, 2017

Cambridge, United Kingdom

Acceptance Rates

ICC '17 Paper Acceptance Rate 213 of 590 submissions, 36%;

Overall Acceptance Rate 213 of 590 submissions, 36%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
134
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bayrak CKutlu M(2023)Predicting Election Results Via Social Media: A Case Study for 2018 Turkish Presidential ElectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.317805210:5(2362-2373)Online publication date: Oct-2023
https://doi.org/10.1109/TCSS.2022.3178052
Nguyen HTsolak DKarmann AKnauff SKühne S(2022)Efficient and Reliable Geocoding of German Twitter Data to Enable Spatial Data Linkage to Official Statistics and Other Data SourcesFrontiers in Sociology10.3389/fsoc.2022.9101117Online publication date: 9-Jun-2022
https://doi.org/10.3389/fsoc.2022.910111
Yamamoto FSuzuki YNadamoto A(2021)Extraction and Analysis of Regionally Specific Behavioral Facilitation Information in the Event of a Large-scale DisasterIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3486622.3493991(538-543)Online publication date: 14-Dec-2021
https://dl.acm.org/doi/10.1145/3486622.3493991
Nunavath VGoodwin M(2019)The Use of Artificial Intelligence in Disaster Management - A Systematic Literature Review2019 International Conference on Information and Communication Technologies for Disaster Management (ICT-DM)10.1109/ICT-DM47966.2019.9032935(1-8)Online publication date: Dec-2019
https://doi.org/10.1109/ICT-DM47966.2019.9032935
ONAN A(2018)Sentiment Analysis on Twitter Based on Ensemble of Psychological and Linguistic Feature SetsBalkan Journal of Electrical and Computer Engineering10.17694/bajece.4195386:2(69-77)Online publication date: 30-Apr-2018
https://doi.org/10.17694/bajece.419538
Onan A(2017)Sarcasm Identification on Twitter: A Machine Learning ApproachArtificial Intelligence Trends in Intelligent Systems10.1007/978-3-319-57261-1_37(374-383)Online publication date: 7-Apr-2017
https://doi.org/10.1007/978-3-319-57261-1_37

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten