A Rule-Based Approach for Detecting Location Leaks of Short Text Messages

Nguyen-Son, Hoang-Quoc; Tran, Minh-Triet; Yoshiura, Hiroshi; Sonehara, Noboru; Echizen, Isao

doi:10.1007/978-3-319-26762-3_18

A Rule-Based Approach for Detecting Location Leaks of Short Text Messages

Hoang-Quoc Nguyen-Son⁷,
Minh-Triet Tran⁸,
Hiroshi Yoshiura⁹,
Noboru Sonehara¹⁰ &
…
Isao Echizen^7,10

Conference paper
First Online: 02 December 2015

622 Accesses

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 228))

Abstract

As of today, millions of people share messages via online social networks, some of which probably contain sensitive information. An adversary can collect these freely available messages and specifically analyze them for privacy leaks, such as the users’ location. Unlike other approaches that try to detect these leaks using complete message streams, we put forward a rule-based approach that works on single and very short messages to detect location leaks. We evaluated our approach based on 2817 tweets from the Tweets2011 data set. It scores significantly better (accuracy = 84.95 %) on detecting whenever a message reveals the user’s location than a baseline using machine learning and three extensions using heuristic. Advantages of our approach are not only to apply for online social network messages but also to extend for other areas (such as email, military, health) and for other languages.

This is a preview of subscription content, log in via an institution.

Notes

1.
http://blog.rjmetrics.com/2010/01/26/new-data-on-twitters-users-and-engagement/.

References

Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 363–370 (2005)
Google Scholar
Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 1524–1534 (2011)
Google Scholar
Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the trec-2011 microblog track. In: 20th Text Retrieval Conference (2011)
Google Scholar
Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating twitter users. In: 19th ACM International Conference on Information and Knowledge Management, pp. 759–768. ACM (2010)
Google Scholar
Stutzman, F., Gross, R., Acquisti, A.: Silent listeners: the evolution of privacy and disclosure on facebook. J. Priv. Confid. 4(2), 7–41 (2013)
Google Scholar
Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging web content. In: 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 273–280. ACM (2004)
Google Scholar
Fink, C., Piatko, C.D., Mayfield, J., Finin, T., Martineau, J.: Geolocating blogs from their textual content. In: AAAI Spring Symposium: Social Semantic Web: Where Web 2.0 Meets Web 3.0., pp. 25–26 (2009)
Google Scholar
Kitamoto, A., Sagara, T.: Toponym-based geotagging for observing precipitation from social and scientific data streams. In: ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia, pp. 23–26. ACM (2012)
Google Scholar
Shuyo, N.: Language detection library for java (2010). http://code.google.com/p/language-detection/
Han, B., Cook, P., Baldwin, T.: Automatically constructing a normalisation dictionary for microblogs. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, pp. 421–432 (2012)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: 52nd Annual Meeting of the Association for Computational Linguistics, pp. 55–60 (2014)
Google Scholar
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Jurafsky, D., James, H.: Speech And Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech, 2nd edn, pp. 83–122. Prentice Hall, Upper Saddle River (2008)
Google Scholar
Nguyen-Son, H.Q., Minh-Triet, T., Yoshiura, H., Sonehara, N., Echizen, I.: Anonymizing personal text messages posted in online social networks and detecting disclosures of personal information. IEICE Trans. Inf. Syst. 98(1), 78–88 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

SOKENDAI (The Graduate University for Advanced Studies), Tokyo, Kanagawa, Japan
Hoang-Quoc Nguyen-Son & Isao Echizen
University of Science, VNU-HCM, Hochiminh, Vietnam
Minh-Triet Tran
University of Electro-Communications, Tokyo, Japan
Hiroshi Yoshiura
National Institute of Informatics, Tokyo, Japan
Noboru Sonehara & Isao Echizen

Authors

Hoang-Quoc Nguyen-Son
View author publications
You can also search for this author in PubMed Google Scholar
Minh-Triet Tran
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Yoshiura
View author publications
You can also search for this author in PubMed Google Scholar
Noboru Sonehara
View author publications
You can also search for this author in PubMed Google Scholar
Isao Echizen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hoang-Quoc Nguyen-Son .

Editor information

Editors and Affiliations

Poznań University of Economics and Business, Poznań, Poland
Witold Abramowicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen-Son, HQ., Tran, MT., Yoshiura, H., Sonehara, N., Echizen, I. (2015). A Rule-Based Approach for Detecting Location Leaks of Short Text Messages. In: Abramowicz, W. (eds) Business Information Systems Workshops. BIS 2015. Lecture Notes in Business Information Processing, vol 228. Springer, Cham. https://doi.org/10.1007/978-3-319-26762-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-26762-3_18
Published: 02 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26761-6
Online ISBN: 978-3-319-26762-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics