Abstract
Social media embed rich but noisy signals of physical locations of their users. Accurately inferring a user’s location can significantly improve the user’s experience on the social media and enable the development of new location-based applications. This paper adopts a supervised learning model—generalized additive model (GAM) to find the best community in a user’s online neighborhood to predict the user’s physical location. It proposes to use geographical proximity, structural proximity, and generic attribute metrics to characterize the goodness of the communities in the ego-net of a user and apply variable selection techniques to identify important community metrics for user location inference. Evaluating the effectiveness of GAM model with real social media data, we discover that GAM can choose better communities for location prediction than using an individual metric and GAM identifies median haversine distance, triangle participation ratio, and internal density as the top three significant metrics for community selection.










Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Notes
In general, the GAM is allowed to be a generalized linear model that can accommodate categorical responses. Since the responses are continuous (distance) in our scenario, we use the linear form.
We only look at community closeness at 50 miles, denoted by CC50, as it was chosen as the best choice of distance in Wagenseller et al. (2019).
We did not include number of reciprocal contacts into the model because community size is linearly dependant upon the number of friends, followers, and reciprocal contacts. Specifically, the sum of friends, followers, and reciprocal contacts equals community size. So, we can pick any three from the four metrics. Here, we omit reciprocal contacts.
References
Abrol S, Khan L (2010) Tweethood: Agglomerative clustering on fuzzy k-closest friends with variable depth for location mining. In: 2010 IEEE second international conference on social computing. IEEE, pp 153–160
Abrol S, Khan L, Thuraisingham B (2012) Tweeque: Spatio-temporal analysis of social networks for location mining using graph partitioning. In: 2012 international conference on social informatics. IEEE, pp 145–148
Backstrom L, Sun E, Marlow C (2010) Find me if you can: improving geographical prediction with social and spatial proximity. In: Proceedings of the 19th international conference on world wide web. ACM, pp 61–70
Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating twitter users. In: Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, pp 759–768
Chon J, Raymond R, Wang H, Wang F (2015) Modeling flu trends with real-time geo-tagged twitter data streams. In: International conference on wireless algorithms, systems, and applications. Springer, pp 60–69
Cresci S, Cimino A, Dell’Orletta F, Tesconi M (2015) Crisis mapping during natural disasters via text analysis of social media messages. In: International conference on web information systems engineering. Springer, pp 250–258
Dredze M, Paul M, Bergsma S, Tran H (2013) Carmen: A twitter geolocation system with applications to public health. In: Workshops at the twenty-seventh AAAI conference on artificial intelligence
Dunbar RI (2016) Do online social media cut through the constraints that limit the size of offline social networks? R Soc Open Sci 3(1):150292
Ghaffari M, Srinivasan A, Liu X (2019) High-resolution home location prediction from tweets using deep learning with dynamic structure. arXiv preprint arXiv:190203111
Hastie TJ (2017) Generalized additive models. In: Statistical models in S. Routledge, pp 249–307
Hecht B, Hong L, Suh B, Chi EH (2011) Tweets from Justin Bieber’s heart: the dynamics of the location field in user profiles. In: Proceedings of the ACM SIGCHI conference on human factors in computing systems, pp 237–246
Jurgens D (2013) That’s what friends are for: inferring location in online social media platforms based on social relationships. In: Seventh international AAAI conference on weblogs and social media
Jurgens D, Finethy T, McCorriston J, Xu YT, Ruths D (2015) Geolocation prediction in twitter using social networks: a critical analysis and review of current practice. In: Ninth international AAAI conference on web and social media
Kinsella S, Murdock V, O’Hare N (2011) I’m eating a sandwich in glasgow: modeling locations with tweets. In: Proceedings of the 3rd international workshop on Search and mining user-generated contents. ACM, pp 61–68
Kumar A, Singh JP (2019) Location reference identification from tweets during emergencies: a deep learning approach. Int J Disaster Risk Reduct 33:365–375
Leetaru K, Wang S, Cao G, Padmanabhan A, Shook E (2013) Mapping the global twitter heartbeat: the geography of twitter. First Monday. https://doi.org/10.5210/fm.v18i5.4366.
Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on World wide web. ACM, pp 631–640
Li R, Wang S, Deng H, Wang R, Chang KCC (2012) Towards social user profiling: unified and discriminative influence model for inferring home locations. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1023–1031
McGee J, Caverlee J, Cheng Z (2013) Location prediction in social media based on tie strength. In: Proceedings of the 22nd ACM international conference on information and knowledge management, pp 459–468
Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci 105(4):1118–1123
Wagenseller P, Wang F, Wu W (2018) Size matters: a comparative analysis of community detection algorithms. IEEE Trans Comput Soc Syst 5(4):951–960
Wagenseller P, Avram A, Jiang E, Wang F, Zhao Y (2019) Location prediction with communities in user ego-net in social media. In: IEEE international conference on communications (ICC), pp 1–67
Weiszfeld E, Plastria F (2009) On the point for which the sum of the distances to n given points is minimum. Ann Oper Res 167(1):7–41
Xu C, Li J, Luo X, Pei J, Li C, Ji D (2019) Dlocrl: A deep learning pipeline for fine-grained location recognition and linking in tweets. In: The world wide web conference. ACM, pp 3391–3397
Yang J, Leskovec J (2015) Defining and evaluating network communities based on ground-truth. Knowl Inf Syst 42(1):181–213
Young JG, Hébert-Dufresne L, Allard A, Dubé LJ (2016) Growing networks of overlapping communities with internal structure. Phys Rev E 94(2):022317
Zheng X, Han J, Sun A (2018) A survey of location prediction on twitter. IEEE Trans Knowl Data Eng 30(9):1652–1671
Acknowledgements
This project is supported by NSF Grant ATD No. 1737861.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wagenseller, P., Zhao, Y., Wang, F. et al. Community-based location inference in social media using supervised learning approach. Soc. Netw. Anal. Min. 11, 64 (2021). https://doi.org/10.1007/s13278-021-00769-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-021-00769-5