Elsevier

Computers & Security

Volume 61, August 2016, Pages 130-141
Computers & Security

User practice in password security: An empirical study of real-life passwords in the wild

https://doi.org/10.1016/j.cose.2016.05.007Get rights and content

Abstract

Due to increasing security awareness of password from the public and little attention on the characteristics of real-life passwords, it is thus natural to understand the current state of characteristics of real-life passwords, and to explore how password characteristics change over time and how earlier password practice is understood in current context. In this work, we attempt to present an in-depth and comprehensive understanding of user practice in real-life passwords, and to see whether the previous observations can be confirmed or reversed, based on large-scale measurements rather than anecdotal knowledge or user surveys. Specifically, we measure password characteristics on over 6 million passwords, in terms of password length, password composition, and password selection. We then make informed comparisons of the findings between our investigation and previously reported results. Our general findings include: (1) average password length is at least 12% longer than previous results, and 75% of our passwords have the length between 8 and 10 characters; (2) there is a significant increase of using only numbers as passwords, and easy-to-reach symbols are always the first choice when users added symbols into passwords; (3) there observes a remarkable increase (about 40%) of using combo-meaningful data as passwords, and a striking proportion of using the most common passwords or login names as passwords. Our investigation also includes collecting statistics about the use of symbols, letter-case, and meaningful details, which presents a systematic analysis of password usage. The comparative results indicate that the password characteristics and password practice on this massive password data set are somewhat inconsistent with those from anecdotal knowledge and user surveys, and exhibit a substantial change over time in some ways. Further research needs to build upon this understanding for gaining insight into how password security can be improved.

Introduction

Despite a growing number of graphical and biometric authentication mechanisms, passwords remain the dominant method of authentication (Herley, Van Oorschot, 2012, Uellenbeck, Durmuth, 2013). According to NIST specifications, text-based passwords are popular in typical web users' experience since they are conceptually simple, inexpensive to administer, and user-friendly (Burr et al., 2011). Since passwords are commonly used to protect accounts with valuable assets (e.g., Bank or Email accounts), they have increasingly been subjected to several attacks, which mainly exploit users' tendency of choosing simple and poor passwords (e.g., some dictionary words, names, and personal information) User-selected passwords have always been easily guessable and predictable (Adams, Sasse, 1999, Bonneau, 2012, Bunnell et al, 1997, Dell'Amico et al, 2010, Grampp, Morris, 1984, Ji et al, 2015, Mazurek et al, 2013, Morris, Thompson, 1979, Ur et al, 2012), which has been joined with an emerging problem of users unwittingly divulging their passwords to the public [e.g., the recent password-leakage events in Facebook, LinkedIn, and Google (Bogart, 2013, Goodin, 2012, Pagliery, 2014)].

About 37 years ago, Morris and Thompson (1979) found that users had poor practice with their passwords and did not afford sufficient attention to safeguarding their secrets. Since then, we have seen studies on password characteristics (Adams, Sasse, 1999, Bryant, Campbell, 2005, Bryant, Campbell, 2006, Campbell, Bryant, 2004, Grampp, Morris, 1984, Zviran, Haga, 1999), which provide some understanding of user practice in password usage and security. But – has this understanding been applied in practice? Has anything changed over time? Moreover, most previous studies favored using passwords from user surveys and anecdotal knowledge, and little attention has been given to real-life passwords and their practical usage, which Dourish et al. (2004) have called “security in the wild”.

In this work, we attempt to provide an in-depth and comprehensive understanding of user practice in real-life passwords, and to see whether the previous-observed facts can be confirmed or reversed based on a real-life and large-scale measurement rather than anecdotal knowledge or user surveys. We measure the characteristics of real-life passwords over a large population in terms of password length, password composition, and password selection, and make informed comparisons between our investigation and previous studies. Among our interesting findings are how password characteristics change over time and how earlier password practice is understood in current context: the average password length is 9.46 characters, which is longer than what has been found in the literature, and most of our passwords have the length between 8 and 10 characters; passwords are still dominated by simple structure, and there is a significant increase of using only numbers as passwords, and easy-to-reach symbols are always the first choice when users added symbols into passwords; users prefer to use meaningful data in passwords, and there observes a remarkable increment of selecting combinations of multiple meaningful data as passwords, and a striking proportion of using the most common passwords or the login-names as the passwords. Our investigation also includes collecting statistics about the use of symbols, letter case, and meaningful details, which presents a systematic analysis of password usage. These comparative results indicate that the password characteristics and password practice on this massive password data set are somewhat inconsistent with past anecdotal knowledge and password surveys, and exhibit a substantial change over time in some ways.

This paper is organized as follows. Section 2 describes previous work. Section 3 develops the problem and approach of this study. Section 4 introduces the source of our passwords. 5 Study 1: password length, 6 Study 2: password composition, 7 Study 3: password selection present our descriptive findings and comparative results between our study and previous studies. Section 8 offers discussions and concludes.

Section snippets

Background and related work

The usage of password for authentication has been analyzed at length from the security literature. However, there have been few studies on understanding the characteristics of real-life passwords and the application of this understanding in practice, which need to be examined in the concrete reality of daily usage.

An early study to notice this problem was by Morris and Thompson in 1979 (Morris and Thompson, 1979), in which they examined users' password habit when no constraint was put on their

Problem and approach

Because of increasing security awareness of password from the public, it is natural to understand the current state of password characteristics and its practical usage, and to explore whether earlier findings have changed over time. In addition, the validation of current password characteristics and usage in practice and its comparison with previous studies have rarely been quantified or measured.

In this work, we intend to provide an in-depth and comprehensive understanding of user practice in

Source of data

The source of data for this study was originally obtained by hackers via SQL injections against the “Chinese Software Developer Network” website (CSDN), which is the biggest network for Chinese software developers, and provides web forums, blogs, IT news, and other services. The news of the hack quickly spread to popular websites where criminals could exploit the passwords to facilitate further attacks. Moreover, since many users re-use the same passwords for their online accounts (Bailey et

Study 1: password length

The purpose of Study 1 is to answer the questions: What is the range of password lengths? How does the password length change with time? The following section reports our findings regarding the characteristics of password length, and compares them with previously reported results. We first describe our experimental methodology. Then we answer a few questions on the password length, and offer a discussion and comparison.

Study 2: password composition

The purpose of Study 2 is to answer the questions: What combinations of alphabetic, numeric, and symbol characters are used? Have the composition and structure of password changed over time? The following section presents our findings with regard to password composition, and compares them with previous observations. We first introduce our methodology to compute the characteristics of password composition. Then we present our results on common questions and provide a comparison.

Study 3: password selection

The purpose of Study 3 is to answer the questions: Is the password (or its parts) meaningful? Is the password made up from pronounceable words? Is the password composed of random characters? Do previous-reported selection methods remain the same? The following section reports our findings regarding password selection, and provides comparative results among extant studies. We first describe our methodology and findings, and then offer comparative results with respect to password meaningness,

Discussion and conclusion

It is important to understand the current state of characteristics of the passwords in the wild, and to make informed comparisons of the findings between our investigation and previous studies, based on a real-life and large-scale measurement rather than anecdotal knowledge or user surveys. Among our interesting findings are how password characteristics change with time and how previously observed password practice is understood in current context. Here we take away some message and the

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (61403301, 61221063), the China Postdoctoral Science Foundation (2014M560783), the Special Foundation of China Postdoctoral Science (2015T81032), the Natural Science Foundation of Shaanxi Province (2015JQ6216), the Application Foundation Research Program of SuZhou (SYG201444), and the Fundamental Research Funds for the Central Universities (xjj2015115).

Chao Shen is currently an Associate Professor in the School of Electronic and Information Engineering, Xi'an Jiaotong University of China. His research interests include insider/intrusion detection, behavioral biometric, and measurement and experimental methodology.

References (58)

  • J. Bunnell et al.

    Cognitive, associative and conventional passwords: recall and guessing rates

    Comput Secur

    (1997)
  • I.S. Herschberg

    The hackers' comfort

    Comput Secur

    (1987)
  • B.L. Riddle et al.

    Passwords in use in a university timesharing environment

    Comput Secur

    (1989)
  • A. Adams et al.

    Users are not the enemy

    Commun ACM

    (1999)
  • D. Bailey et al.

    Statistics on password re-use and adaptive strength for financial accounts

    (2014)
  • N. Bogart

    Over 2 million stolen Facebook, LinkedIn and Google passwords leaked online

  • J. Bonneau

    The science of guessing: analyzing an anonymized corpus of 70 million passwords

    (2012)
  • J. Bonneau et al.

    Towards reliable storage of 56-bit secrets in human memory

    (2014)
  • J. Bonneau et al.

    The quest to replace passwords: a framework for comparative evaluation of web authentication schemes

    (2012)
  • K. Bryant et al.

    An empirical study of user practice in password security and management

    (2005)
  • K. Bryant et al.

    User behaviours associated with password security and management

    Australas J Inf Syst

    (2006)
  • W.E. Burr et al.

    Sp 800-63-1. Electronic authentication guideline

    (2011)
  • J. Campbell et al.

    Password composition and security: an exploratory study of user practice

    (2004)
  • J.A. Cazier et al.

    Password security: an empirical investigation into e-commerce passwords and their crack times

    Inf Syst Secur

    (2006)
  • ChinaAutoWeb.com
  • CSDN.

    Chinese Software Developer Network [Online]

  • M. Dell'Amico et al.

    Password strength: an empirical analysis

    (2010)
  • P. Dourish et al.

    Security in the wild: user strategies for managing security as an everyday, practical problem

    Pers Ubiquit Comput

    (2004)
  • Englishclub.com
  • D.C. Feldmeier et al.

    Unix password security – 10 years later

    Lect Notes Comput Sci

    (1990)
  • D. Florêncio et al.

    A large-scale study of web password habits

    (2007)
  • D. Florêncio et al.

    Sex, lies and cyber-crime surveys

  • D. Florêncio et al.

    Do strong web passwords accomplish anything

    (2007)
  • D. Florêncio et al.

    An administrator's guide to internet password research

    (2014)
  • S. Gaw et al.

    Password management strategies for online accounts

    (2006)
  • D. Goodin

    8 million leaked passwords connected to LinkedIn, dating website. Arstechnica.com [Online]

  • F.T. Grampp et al.

    The UNIX system: UNIX operating system security

    AT&T Bell Lab Tech J

    (1984)
  • C. Herley et al.

    A research agenda acknowledging the persistence of passwords

    IEEE Secur Priv

    (2012)
  • P.G. Inglesant et al.

    The true cost of unusable password policies: password use in the wild

    (2010)
  • Cited by (58)

    • A large-scale analysis of Wi-Fi passwords

      2022, Journal of Information Security and Applications
      Citation Excerpt :

      Although there is a large body of research work aiming at discovering how users select their web account passwords to login into websites, Wi-Fi passwords have not been investigated by the security research community. That is, the related work focuses on several aspects of password security that fall into seven broad categories: (i) Design and evaluation of password strength meters [5,6]; (ii) Efficient password guessing (cracking) attacks [7,8]; (iii) Analysis of leaked password composition and distribution [9,10]; (iv) usability studies of password managers [11,12]; (v) Impact of gender, culture and personality on password choices [13,14]; (vi) Nudging (i.e., indirect suggestions) to influence behavior towards secure password choices [15,16] and (vii) Password policies [17,18]. Despite the rich and sheer volume of password security studies, to the best of our knowledge there is no work that examines Wi-Fi passwords.

    View all citing articles on Scopus

    Chao Shen is currently an Associate Professor in the School of Electronic and Information Engineering, Xi'an Jiaotong University of China. His research interests include insider/intrusion detection, behavioral biometric, and measurement and experimental methodology.

    Tianwen Yu is currently a graduate student in the School of Electronic and Information Engineering, Xi'an Jiaotong University of China. Her research interests include authentication, machine learning, and mobile security.

    Haodi Xu is currently an undergraduate student in the School of Software Engineering, Xi'an Jiaotong University of China. Her research interests include mobile security, machine learning, and intrusion detection.

    Gengshan Yang is currently an undergraduate student in the School of Software Engineering, Xi'an Jiaotong University of China. His research interests include mobile security, machine learning, and intrusion detection.

    Xiaohong Guan received the B.S. and M.S. degrees in automatic control from Tsinghua University, Beijing, China, in 1982 and 1985, respectively, and the Ph.D. degree in electrical engineering from the University of Connecticut, Storrs, in 1993. He is currently a Cheung Kong Professor of Systems Engineering and the Dean of School of Electronic and Information Engineering in Xi'an Jiaotong University. His research interests include allocation and scheduling of complex networked resources, network security, and sensor networks.

    View full text