Skip to main content

A Rule-Based Approach for Detecting Location Leaks of Short Text Messages

  • Conference paper
  • First Online:
  • 622 Accesses

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 228))

Abstract

As of today, millions of people share messages via online social networks, some of which probably contain sensitive information. An adversary can collect these freely available messages and specifically analyze them for privacy leaks, such as the users’ location. Unlike other approaches that try to detect these leaks using complete message streams, we put forward a rule-based approach that works on single and very short messages to detect location leaks. We evaluated our approach based on 2817 tweets from the Tweets2011 data set. It scores significantly better (accuracy = 84.95 %) on detecting whenever a message reveals the user’s location than a baseline using machine learning and three extensions using heuristic. Advantages of our approach are not only to apply for online social network messages but also to extend for other areas (such as email, military, health) and for other languages.

This is a preview of subscription content, log in via an institution.

Notes

  1. 1.

    http://blog.rjmetrics.com/2010/01/26/new-data-on-twitters-users-and-engagement/.

References

  1. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 363–370 (2005)

    Google Scholar 

  2. Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 1524–1534 (2011)

    Google Scholar 

  3. Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the trec-2011 microblog track. In: 20th Text Retrieval Conference (2011)

    Google Scholar 

  4. Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating twitter users. In: 19th ACM International Conference on Information and Knowledge Management, pp. 759–768. ACM (2010)

    Google Scholar 

  5. Stutzman, F., Gross, R., Acquisti, A.: Silent listeners: the evolution of privacy and disclosure on facebook. J. Priv. Confid. 4(2), 7–41 (2013)

    Google Scholar 

  6. Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging web content. In: 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 273–280. ACM (2004)

    Google Scholar 

  7. Fink, C., Piatko, C.D., Mayfield, J., Finin, T., Martineau, J.: Geolocating blogs from their textual content. In: AAAI Spring Symposium: Social Semantic Web: Where Web 2.0 Meets Web 3.0., pp. 25–26 (2009)

    Google Scholar 

  8. Kitamoto, A., Sagara, T.: Toponym-based geotagging for observing precipitation from social and scientific data streams. In: ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia, pp. 23–26. ACM (2012)

    Google Scholar 

  9. Shuyo, N.: Language detection library for java (2010). http://code.google.com/p/language-detection/

  10. Han, B., Cook, P., Baldwin, T.: Automatically constructing a normalisation dictionary for microblogs. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, pp. 421–432 (2012)

    Google Scholar 

  11. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: 52nd Annual Meeting of the Association for Computational Linguistics, pp. 55–60 (2014)

    Google Scholar 

  12. Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  13. Jurafsky, D., James, H.: Speech And Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech, 2nd edn, pp. 83–122. Prentice Hall, Upper Saddle River (2008)

    Google Scholar 

  14. Nguyen-Son, H.Q., Minh-Triet, T., Yoshiura, H., Sonehara, N., Echizen, I.: Anonymizing personal text messages posted in online social networks and detecting disclosures of personal information. IEICE Trans. Inf. Syst. 98(1), 78–88 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hoang-Quoc Nguyen-Son .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen-Son, HQ., Tran, MT., Yoshiura, H., Sonehara, N., Echizen, I. (2015). A Rule-Based Approach for Detecting Location Leaks of Short Text Messages. In: Abramowicz, W. (eds) Business Information Systems Workshops. BIS 2015. Lecture Notes in Business Information Processing, vol 228. Springer, Cham. https://doi.org/10.1007/978-3-319-26762-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26762-3_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26761-6

  • Online ISBN: 978-3-319-26762-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics