Keywords

1 Introduction

Password based authentication plays a critical role in information access, controlling everything from bank accounts to web forums and everything in between. Unfortunately, passwords are easy targets and thus are constantly under attack from many cracking methods. The consequences of these attacks can vary from minor annoyances such as having to reset a password, to extremely severe if someone manages to access personal data or financial information. These cracking attempts are made easier by the fact that an overwhelming proportion of users are creating passwords that only contain lowercase letters if no other character types are required [1]. Though many password policies do require users to create passwords containing multiple character types and of a certain length, this introduces usability concerns such as password creation difficulty and memorability.

When creating a password, the user’s ultimate goal is to create a text string that is both memorable and sufficiently secure. However, the additional creation criteria can drastically slow down the generation process as the user needs to ponder what items they can include to satisfy the requirements while still making the password easy to recall [2]. What is needed is an examination of what actions can be taken to alleviate some of the usability issues that arise from stringent password requirements. Here we present lab-based user-generated data and examine the differences in password generation performance when users are faced with different requirements and instruction formats, as well as character distribution patterns in user-generated passwords.

2 Background

As IT-based technologies become more and more integrated in our lives, the number of accounts and passwords a person must keep track of increases. The average person has multiple accounts, ranging from email and banking to the more recent areas of social media and mobile applications. Weak passwords for these accounts could result in increased security risks including unauthorized access to personal information and finances, activity monitoring, and the attacker posing as the legitimate user in online interactions. Consequently, large swathes of research have been dedicated to the area, analyzing not only the security of passwords (e.g., [36]), but also users’ password selection behaviors (e.g., [79]).

Of the three stages in the password management lifecycle [10], our paper focuses on the first – the password generation stage. In this stage, users need to comprehend the password rules presented, explore options of characters to use, and finally compose a text string to satisfy the rules. It is important to understand what factors are at play here, as the subsequent maintenance and authentication stages rely on the generation stage to be both secure and usable. Several methods of facilitating password generation have been proposed, including mnemonics, passphrases, and various probes into graphical authentication. Though research in these areas shows varying levels of promise (e.g., [5, 11, 12]), their real world application is limited.

One of the more commonly implemented methods of regulating the password generation stage is dynamic compliance checking. This approach programmatically checks for adherence of the character-based passwords created by a user to pre-defined rules. These rules often include minimum and maximum string lengths, mandatory inclusion/exclusion of certain character types, and restricting the use of certain words. These password rules are to ensure that users create passwords that fall within a range of acceptable security levels, as users tend to rarely use special characters (non-letter and non-number) unless explicitly required to do so (e.g., [1, 5]).

This study has two objectives. The first is to investigate the password generation space in relation to the length and complexity of password rules. Examining how these rules affect the makeup of passwords such as character distribution and placement patterns will help us better understand how password requirements constrain human-generated passwords. The second is to explore the effects the presentation of the password rules may have on user’s password generation performance. Understanding and quantifying the cognitive processes and strategies used during password generation will support the ultimate goal of finding an optimal combination of length and complexity requirements, and presentation style that balances security and usability.

Past research that explores the password generation space asked users to create limited number of passwords (e.g., [5]) or instructed users to create passwords for specific accounts (e.g., [2, 3]). In contrast, this study examines password composition and creation behavior when users are given a longer period of time to generate passwords with only rule complexity and presentation style as factors. Giving participants more time to create passwords allows for an in-depth investigation into generation patterns, while not focusing on creating passwords for specific accounts avoids potential changes to creation behavior due to pre-conceived notions that certain accounts require more secure passwords. A limitation of the study is that users were asked to generate multiple passwords at one time in a lab setting.

Research has found that formatted text can facilitate online reading such as improving comprehension and reading efficiency, compared to block text (e.g., [13, 14]). To understand the potential effects of how password rules are presented, we formed the following hypothesis: users with password requirements presented in a formatted manner will have better password generation performance than users with password requirements presented in an unformatted manner.

3 Method

3.1 Participants

Eighty-one participants were recruited from the metropolitan area of Washington, D.C., the United States. The participants ranged in ages from 18 to 69 years old (Mean = 35.1). Approximately 47 % were male and 53 % were female and represented diverse education and occupation backgrounds. Qualified participants had to be familiar with typing using a standard keyboard.

3.2 Apparatus

An experimental program was developed in Python version 3.3.2 for data collection. The program is running on a desktop computer (Windows 7 Enterprise, Intel® Core i7-3770 CPU @ 3.40 GHz, with 16.0 GB RAM) with a 24 in. LCD monitor, a standard keyboard, and a 2-button USB optical mouse with scroll wheel.

3.3 Experimental Design

To investigate the password generation space, we gave each participant two sets of password rules and asked them to generate as many passwords as possible within set time limits. The password rule sets had two levels of complexity. The simple rule set only required minimum length of 6 characters. For the complex rule set, we chose stricter rules commonly used in organizations controlling their employees’ access or used for personal accounts protecting data of more sensitive nature such as banking or credit cards. The complex rules included minimum length, mixed-case alphabets, numbers, and special characters (Table 1).

Table 1. Experimental design

To test the proposed hypothesis, the password rules were presented in different styles: formatted and unformatted. There are many existing guidelines on text formatting for online reading and comprehension. As a starting point to explore the effects of password requirement presentation styles, we only employed minimal formatting differences by turning unformatted text into bullets and adding line breaks. This condition was between-subjects, i.e., 40 participants were presented with formatted password rules and 41 participants were presented with unformatted password rules. To eliminate potential order effects, the sequence of receiving the two rule sets was counter-balanced, i.e., half of the participants (41) in a between-subjects conditions (formatted or unformatted) started with the complex set, followed by the simple set; the other half (40) started with the simple set, followed by the complex set.

Detailed data were logged programmatically including: number of passwords generated, time spent on password generation, and key presses. All timing data were measured in milliseconds and reported in seconds (s). The final experimental design with different password rule presentation styles is in Table 1.

3.4 Procedure

Participants performed the study individually. Upon arriving at the study facility, the participant was greeted and briefed about the study by the researcher. Each participant was assigned an identification number and randomly assigned to a condition (formatted or unformatted). The researcher started the experimental program, left the testing room, observed the session in an adjacent control room via video feeds, and communicated with the participant using microphones and speakers if necessary.

The experimental program presented the first password rule set and instructed the participant to generate as many passwords as possible according to the requirements within a pre-determined time limit (12 min for the complex rules and 8 min for the simple rules). Participants were informed that they did not have to memorize the passwords generated. Repeated passwords were rejected. Upon finishing the first rule set, the participant received a second rule set and performed the generation task.

After the password generation tasks, participants completed a questionnaire regarding their perception on the difficulty of the password generation tasks and on the strength of the password rule sets.

4 Results and Discussion

4.1 Descriptive Statistics

The 81 participants created 8,165 compliant passwords in total (3,138 complex; 5,026 simple), averaging 100.8 passwords per participant (STD = 57.04). On average, a participant generated 38.74 complex passwords and 62.05 simple passwords. Detailed performance metrics are summarized in Table 2.

Table 2. Password generation performance

The demanding nature of the complex rule set made participants take longer to reach milestones such as hitting the first key or creating their first compliant password. On average, it took participants 82.65 s to create their first compliant complex password. Further breaking down steps taken in these 82.65 s, it took users 23.98 s on average to make their first key press after being presented with the complex rules. Then, it took additional 33.35 s to attempt their first password, and another 25.32 s to create their first compliant password. In contrast, when faced with the simple rules, participants took an average of 14.35 s to press the first key, an additional 7.82 s for first password attempt, and just 0.11 more seconds to complete their first compliant password. Overall it took participants 17.71 s longer to generate a compliant complex password (29.25 s) than to generate a compliant simple password (11.54 s). Finally, due to the differences in length requirements (at least 12 characters for complex; at least 6 characters for simple), the passwords generated from complex rules average 14.23 characters in length while passwords generated from simple rules average 9.15 characters in length.

During the password generation tasks, the experimental program provided instantaneous visual feedback on the compliance of the text string being typed. The text entry field started with a red background (i.e., non-compliant) and changed to a green background (i.e., compliant) at the moment when the password string adhered to the rule set. Once minimum compliance was met, the participants had the option to submit the string or keep typing until they were satisfied. Because of this real-time dynamic compliance checking feature, there were not many non-compliant passwords (i.e., errors) submitted. Twenty-six participants did not generate any non-compliant passwords and the other fifty-five participants generated at least one non-compliant password. We also recorded the number of retry attempts submitted after an error occurred until a compliant password was generated. The results from those fifty-five participants are summarized in Table 3. On average, participants made about twice as many errors with the complex rule set and took three more attempts to recover from the errors, as compared to the performance with simple rule set.

Table 3. Errors and retry attempts

4.2 Password Generation Space

4.2.1 Character Distribution

To understand the content of the user-generated passwords, we split all characters into four types: lowercase letters, uppercase letters, numbers, and special characters. Table 4 shows the character distribution of the 3,138 complex passwords and the 5,027 simple passwords.

Table 4. Character type distribution

Lowercase letters far outstrip all other character types in both rule sets, representing 56.38 % of characters in complex passwords and 69.39 % of characters in simple passwords. The large proportion of lowercase letters in simple passwords is likely due to the rule set only requiring at least six characters of any type.

Previous research has reported that if character type use is not enforced, users are much more likely to stick to lowercase letters [1]. This rise of lowercase letters in simple passwords does not affect character type frequency ranking, as both datasets have lowercase letters as the most common character type, followed by numbers, then by uppercase letters and special characters. Further, due to the lack of character type quotas in the simple rule set, the occurrences of numbers, uppercase letters, and special characters are all lower than those in complex passwords. More interesting are the results pertaining to the complex dataset, as the rules closely mimic many real world generation guidelines and thus the results are more relevant in today’s password creation landscape.

After splitting up the character distribution by character type, we further explored the data by examining the most common characters from each category, as seen in Table 5. We compared specific alphabet frequencies to their occurrences in continuous English text to see if the password generation environment had any effect. Nine of the top ten lowercase letters (e, a, o, s, r, n, t, l, and h) in complex passwords appear in the top ten most common letters in the English language (e, t, a, o, n, i, r, s, and h) [15]. The top ten uppercase letters in Table 5 do not fair quite as well, with only S, L, T, and A matching up. They do match much more closely with the top ten most common starting letters in the English language (t, o, a, w, b, c, d, s, f, and m) [15], with eight matches total. A possible explanation for this difference is that during the study sessions, we observed many participants used English-like words in their passwords. With the need for an uppercase letter in a valid complex password, many participants capitalized the first letter of these English-like words to fulfill the requirement.

Table 5. Ten most common characters in complex passwords, based on character types

The top three numbers are 1, 2, and 3, which follows the natural numerical ordering, followed by 0 which is the last digit of the number row on the keyboard. Special characters follow a similar distribution, with ! (SHIFT-1), @ (SHIFT-2), and # (SHIFT-3) appearing in the top four in Table 5.

4.2.2 Complex Password Patterns

In addition to the character distribution, we examined character type positioning to determine if the generated passwords followed any particular placement pattern. We again focused our analysis on compliant complex passwords. Figure 1 displays the overall character type distribution relative to their position for password lengths of 12 through 18. This range accounts for 92 % of all complex passwords created.

Fig. 1.
figure 1

Character type distribution for specific string positions

Uppercase letters dominate the first position of the password string, accounting for 66 % of all characters. This correlates with the earlier statement that many participants capitalized the English-like words in their passwords, which were often the first portion of the string. However the rate sharply drops off to 11 % at the second position and slowly decreases toward the last position. Lowercase letters start at a much more modest 19 %, before rising to 71 % in position 2. This 71 % trend holds steady for four positions (2 to 5) before the rate begins to decline at position 6 with an average rate of 5 % per position, before finally ending at 8.6 %. Numbers and special characters begin at about 7 %, but the percentage of numbers begins to increase at position 6, as opposed to special characters which stay relatively steady until position 12 where they begin to rise. Numbers are the predominant character type from position 13 and stay so until the last position in which special characters make up half of the character distribution.

This pattern of uppercase, lowercase, numbers, and special characters positioning was found consistently when examining the data from specific password lengths. We observed that, when generating passwords, participants would exceed the minimum 12-character requirement. Thus, any particular generation pattern used would hold steady regardless of password length. In addition, we found that this pattern closely follows the rule sequence as presented in the complex password requirements in Table 1. It is of great interest to investigate in future research whether the character positioning changes if the rules are presented in different orders.

4.3 Hypothesis Testing

We set the α of all tests for statistical significance to 0.05 for testing the hypothesis on whether users with formatted rule presentation have better password generation performance over users with unformatted presentation. First, we performed a check on all data against the assumptions for parametric statistical tests. Since all of the data violate the normality and equal-variance assumptions, non-parametric tests were used. Table 6 summarizes the performance for each condition group.

Table 6. Impacts of presentation styles on password generation performance

We performed the Mann-Whitney Independent Samples U test to examine the impacts of presentation styles on participants’ performance. The hypothesis is partially supported with significant differences found on three performance variables: Time to first key press for complex passwords (U = 584.0, z = -2.229, effect size (r) = -0.25), Number of simple passwords generated (U = 612.5, z = -1.96, effect size (r) = -0.22), and Average generation time of simple passwords (U = 612.5, z = -1.96, effect size (r) = -0.22). The results show that formatted presentation has positive effects on simple password generation, i.e. more passwords and shorter generation time.

Also, when participants were faced with the stringent complex password requirements, it took them 33 % longer to start the password generation activity (i.e., time to 1st key press) with unformatted presentation. This indicates that the formatted presentation helped reduce participants’ cognitive load in reading the password rules and facilitated their comprehension.

4.4 Perceptions on Password Rule Strength and Generation Difficulty

Participants were asked to rate their perception using a 5-point semantic distance scale on: the strength of the password rules in protecting their accounts on (1 – Very Weak and 5 – Very Strong); and the difficulty of password generation (1 – Very Difficult and 5 – Very Easy), for each password rule set. The results are summarized in Table 7.

Table 7. Perceptions on password rule strength and generation difficulty

We used the Wilcoxon Signed Ranks Test, within-subject comparisons, to examine whether each participant had different perceptions on different password rules. There are significant differences on the perceptions of the strength of the password rules and the difficulty of password generation tasks. In general, participants understand that the complex password rules provide stronger protection (Mdn = 4) over their accounts than the simple password rules do (Mdn = 3). However, it is more difficult to generate passwords that are compliant with stringent and complex password rules (Mdn = 2).

Participants with formatted presentation style tend to perceive the password generation as an easier task than the participants with unformatted style for both the complex rules and the simple rules, as summarized in Table 8. Mann-Whitney U Test was used to examine the impacts of presentation styles on participants’ perceptions. Only the perception on the strength of simple password requirements is found statistically significant. Interestingly, participants with the unformatted simple rules perceive the rules as being stronger as opposed to the formatted rules. While more research is needed, a plausible explanation is that the unformatted presentation makes it harder for participants to separate each requirement from the others, which then triggers a false perception of added complexity and strength in the rules.

Table 8. Impacts of presentation styles on participants’ perceptions

5 Conclusion

Given the near universal reliance on password based authentication methods, our study aimed to better understand the human generated password space as it relates to password requirements and formats. Users’ password generation performance with the complex rule set was consistently lower, e.g., longer times for rule comprehension, longer times for password generation and fewer passwords generated, compared to their performance with the simple password rule set. Additionally, participants made twice as many errors when generating complex passwords, and took three times the amount of retries until a valid password. With close examination on the passwords from the complex rule set, it is clear that the stringent nature of the rules does not expand the password generation space much beyond those commonly used alphabetical letters in English language.

This study explored the potential impacts of password rule presentation styles on users’ password generation performance. Although the hypothesis was only partially supported, the results show general trends of better performance, e.g., taking shorter time and generating more passwords, from formatted rule presentation. Given the fact that the formatting manipulation in this study was only adding some organization (such as bullets and line breaks) to an unformatted block of text, it would be of great interest to investigate the impacts on password generation performance with more elaborate formatting manipulations such as changing phrasing, plain language, re-ordering rules, and changing text styles (e.g., font family, font size, bolding).

This paper provides findings from our preliminary analyses on the data collected from the study. We intend to perform more in-depth analyses to fully understand how participants approached the password generation tasks when faced with different password rules. It is also of great interest to investigate whether there are relationships between the demographic data (e.g., age, education, self-reported computer proficiency) and participants’ password generation performance. We expect the research will shed light on the development of password policies, shoring up the difficulty balancing security and usability.