Abstract
As of today, millions of people share messages via online social networks, some of which probably contain sensitive information. An adversary can collect these freely available messages and specifically analyze them for privacy leaks, such as the users’ location. Unlike other approaches that try to detect these leaks using complete message streams, we put forward a rule-based approach that works on single and very short messages to detect location leaks. We evaluated our approach based on 2817 tweets from the Tweets2011 data set. It scores significantly better (accuracy = 84.95 %) on detecting whenever a message reveals the user’s location than a baseline using machine learning and three extensions using heuristic. Advantages of our approach are not only to apply for online social network messages but also to extend for other areas (such as email, military, health) and for other languages.
This is a preview of subscription content, log in via an institution.
References
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 363–370 (2005)
Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 1524–1534 (2011)
Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the trec-2011 microblog track. In: 20th Text Retrieval Conference (2011)
Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating twitter users. In: 19th ACM International Conference on Information and Knowledge Management, pp. 759–768. ACM (2010)
Stutzman, F., Gross, R., Acquisti, A.: Silent listeners: the evolution of privacy and disclosure on facebook. J. Priv. Confid. 4(2), 7–41 (2013)
Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging web content. In: 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 273–280. ACM (2004)
Fink, C., Piatko, C.D., Mayfield, J., Finin, T., Martineau, J.: Geolocating blogs from their textual content. In: AAAI Spring Symposium: Social Semantic Web: Where Web 2.0 Meets Web 3.0., pp. 25–26 (2009)
Kitamoto, A., Sagara, T.: Toponym-based geotagging for observing precipitation from social and scientific data streams. In: ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia, pp. 23–26. ACM (2012)
Shuyo, N.: Language detection library for java (2010). http://code.google.com/p/language-detection/
Han, B., Cook, P., Baldwin, T.: Automatically constructing a normalisation dictionary for microblogs. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, pp. 421–432 (2012)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: 52nd Annual Meeting of the Association for Computational Linguistics, pp. 55–60 (2014)
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Jurafsky, D., James, H.: Speech And Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech, 2nd edn, pp. 83–122. Prentice Hall, Upper Saddle River (2008)
Nguyen-Son, H.Q., Minh-Triet, T., Yoshiura, H., Sonehara, N., Echizen, I.: Anonymizing personal text messages posted in online social networks and detecting disclosures of personal information. IEICE Trans. Inf. Syst. 98(1), 78–88 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nguyen-Son, HQ., Tran, MT., Yoshiura, H., Sonehara, N., Echizen, I. (2015). A Rule-Based Approach for Detecting Location Leaks of Short Text Messages. In: Abramowicz, W. (eds) Business Information Systems Workshops. BIS 2015. Lecture Notes in Business Information Processing, vol 228. Springer, Cham. https://doi.org/10.1007/978-3-319-26762-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-26762-3_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26761-6
Online ISBN: 978-3-319-26762-3
eBook Packages: Computer ScienceComputer Science (R0)