Abstract
Regularly, hackers steal data sets containing user identifiers and passwords. Often these data sets become publicly available. The most prominent and important leaks use bad password protection mechanisms, e.g. rely on unsalted password hashes, despite longtime known recommendations. The accumulation of leaked password data sets allows the research community to study the problems of password strength estimation, password breaking and to conduct usability and usage studies. The impact of these leaks in terms of privacy has not been studied.
In this paper, we consider attackers trying to break the privacy of users, while not breaking a single password. We consider attacks revealing that distinct identifiers are in fact used by the same physical person. We evaluate large scale linkability attacks based on properties and relations between identifiers and password information. With these attacks, stronger passwords lead to better predictions. Using a leaked and publicly available data set containing 130 \(\times \,10^{6}\) encrypted passwords, we show that a privacy attacker is able to build a database containing the multiple identifiers of people, including their secret identifiers. We illustrate potential consequences by showing that a privacy attacker is capable of deanonymizing (potentially embarrassing) secret identifiers by intersecting several leaked password databases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Such as recalled in the OWASP Password Storage Cheat Sheet.
- 2.
See game http://zed0.co.uk/crossword and picture http://xkcd.com/1286.
- 3.
- 4.
The \(uid\) of D increases monotonically with the time of creation of the identifier. It allows the reconstruction of a timeline, by e.g. using creation dates of some identifiers or by searching in the fields \(name\) and \(hint\) for events having a worldwide notoriety.
References
Bonneau, J.: The science of guessing: analyzing an anonymized corpus of 70 million passwords. In: IEEE Symposium on Security and Privacy (2012)
Bonneau, J.: Statistical metrics for individual password strength. In: 20th International Workshop on Security Protocols, April 2012
Castelluccia, C., Dürmuth, M., Perito, D.: Adaptive password-strength meters from markov models. In: Network and Distributed System Security (NDSS) Symposium (2012)
Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: KDD Workshop on Data Cleaning and Object Consolidation (2003)
Das, A., Bonneau, J., Caesar, M., Borisov, N., Wang, X.: The tangled web of password reuse. In: Network and Distributed System Security (NDSS) Symposium (2014)
Dell’Amico, M., Michiardi, P., Roudier, Y.: Password strength: an empirical analysis. In: IEEE INFOCOM (2010)
Ding, W., Wang, P.: On the implications of zipf’s law in passwords. In: ESORICS (2016)
Egelman, S., Bonneau, J., Chiasson, S., Dittrich, D., Schechter, S.: It’s not stealing if you need it: a panel on the ethics of performing research using public data of illicit origin. In: Blyth, J., Dietrich, S., Camp, L.J. (eds.) FC 2012. LNCS, vol. 7398, pp. 124–132. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34638-5_11
Florencio, D., Herley, C.: A large-scale study of web password habits. In: ACM WWW (2007)
Gambs, S., Heen, O., Potin, C.: A comparative privacy analysis of geosocial networks. In: 4th ACM SIGSPATIAL International Workshop on Security and Privacy in GIS and LBS, SPRINGL 2011 (2011)
Halevi, S., Krawczyk, H.: Strengthening digital signatures via randomized hashing. In: Dwork, C. (ed.) CRYPTO 2006. LNCS, vol. 4117, pp. 41–59. Springer, Heidelberg (2006). doi:10.1007/11818175_3
Janssens, J., Huszßr, F., Postma, E., van den Herik, J.: TiCC TR 2012–001, Stochastic Outlier Selection. Technical report, Tilburg University (2012)
Kelley, P.G., Komanduri, S., Mazurek, M.L., Shay, R., Vidas, T., Bauer, L., Christin, N., Cranor, L.F., Lopez, J.: Guess again (and again and again): Measuring password strength by simulating password-cracking algorithms. In: IEEE Symposium on Security and Privacy (2012)
Malone, D., Maher, K.: Investigating the distribution of password choices. In: ACM WWW, pp. 301–310. ACM (2012)
Mazurek, M.L., Komanduri, S., Vidas, T., Bauer, L., Christin, N., Cranor, L.F., Kelley, P.G., Shay, R., Ur, B.: Measuring password guessability for an entire university. In: ACM CCS (2013)
Almishari, M., Tsudik, G.: Exploring linkability of user reviews. In: Foresti, S., Yung, M., Martinelli, F. (eds.) ESORICS 2012. LNCS, vol. 7459, pp. 307–324. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33167-1_18
Narayanan, A., Paskov, H., Gong, N.Z., Bethencourt, J., Stefanov, E., Shin, E.C.R., Song, D.: On the feasibility of internet-scale author identification. In: IEEE Symposium on Security and Privacy (2012)
Narayanan, A., Shmatikov, V.: Fast dictionary attacks on passwords using time-space tradeoff. In: ACM CCS (2005)
Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: IEEE Symposium on Security and Privacy (2008)
Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: IEEE Symposium on Security and Privacy (2009)
Newman, M.E.: Power laws, pareto distributions and zipf’s law. Contemp. Phys. 46(5), 323–351 (2005)
Oechslin, P.: Making a faster cryptanalytic time-memory trade-off. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 617–630. Springer, Heidelberg (2003). doi:10.1007/978-3-540-45146-4_36
Perito, D., Castelluccia, C., Kaafar, M.A., Manils, P.: How unique and traceable are usernames? In: Fischer-Hübner, S., Hopper, N. (eds.) PETS 2011. LNCS, vol. 6794, pp. 1–17. Springer, Heidelberg (2011). doi:10.1007/978-3-642-22263-4_1
Schechter, S., Herley, C., Mitzenmacher, M.: Popularity is everything: a new approach to protecting passwords from statistical-guessing attacks. In: USENIX HotSec (2010)
Ur, B., Kelley, P.G., Komanduri, S., Lee, J., Maass, M., Mazurek, M., Passaro, T., Shay, R., Vidas, T., Bauer, L., et al.: How does your password measure up? The effect of strength meters on password creation. In: USENIX Security (2012)
Weir, M., Aggarwal, S., de Medeiros, B., Glodek, B.: Password cracking using probabilistic context-free grammars. In: IEEE Symposium on Security and Privacy (2009)
Acknowledgements
We thank the Program Committee and reviewers for the many valuable comments that significantly improved the final version of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Terms for ‘as usual’
always, usual, the rest, for all, normal, same as, standard, regular, costumbres, siempre, sempre, wie immer, toujours, habit, d’hab, comme dab, altijd.
1.2 A.2 List of generic email addresses
abuse admin administrator contact design email info intern it legal kontakt mail marketing no-reply office post press print printer sales security service spam support sysadmin test web webmaster webmestre.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Heen, O., Neumann, C. (2017). On the Privacy Impacts of Publicly Leaked Password Databases. In: Polychronakis, M., Meier, M. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2017. Lecture Notes in Computer Science(), vol 10327. Springer, Cham. https://doi.org/10.1007/978-3-319-60876-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-60876-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60875-4
Online ISBN: 978-3-319-60876-1
eBook Packages: Computer ScienceComputer Science (R0)