Improving Deep Learning Based Password Guessing Models Using Pre-processing

Wu, Yuxuan; Wang, Ding; Zou, Yunkai; Huang, Ziyi

doi:10.1007/978-3-031-15777-6_10

Yuxuan Wu¹¹,
Ding Wang¹²,
Yunkai Zou¹² &
…
Ziyi Huang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13407))

Included in the following conference series:

International Conference on Information and Communications Security

1376 Accesses

Abstract

Passwords are the most widely used authentication method and play an important role in users’ digital lives. Password guessing models are generally used to understand password security, yet statistic-based password models (like the Markov model and probabilistic context-free grammars (PCFG)) are subject to the inherent limitations of overfitting and sparsity. With the improvement of computing power, deep-learning based models with higher crack rates are emerging. Since neural networks are generally used as black boxes for learning password features, a key challenge for deep-learning based password guessing models is to choose the appropriate preprocessing methods to learn more effective features.

To fill the gap, this paper explores three new preprocessing methods and makes an attempt to apply them to two promising deep-learning networks, i.e., Long Short-Term Memory (LSTM) neural networks and Generative Adversarial Networks (GAN). First, we propose a character-feature based method for encoding to replace the canonical one-hot encoding. Second, we add so far the most comprehensive recognition rules of words, keyboard patterns, years, and website names into the basic PCFG, and find that the frequency distribution of extracted segments follows the Zipf’s law. Third, we adopt Xu et al.’s PCFG improvement with chunk segmentation at CCS’21, and study the performance of the Chunk+PCFG preprocessing method when applied to LSTM and GAN.

Extensive experiments on six large real-world password datasets show the effectiveness of our preprocessing methods. Results show that within 50 million guesses: 1) When we apply the PCFG preprocessing method to PassGAN (a GAN-based password model proposed by Hitja et al. at ACNS’19), 13.83%–38.81% (26.79% on average) more passwords can be cracked; 2) Our LSTM based model using PCFG for preprocessing (short for PL) outperforms Wang et al.’s original PL model by 0.35%–3.94% (1.36% on average). Overall, our preprocessing methods can improve the attacking rates in four over seven tested cases. We believe this work provides new feasible directions for guessing optimization, and contributes to a better understanding of deep-learning based models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Blocki, J., Harsha, B., Zhou, S.: On the economics of offline password cracking. In: Proceedings of IEEE S &P 2018, pp. 853–871 (2018)
Google Scholar
Bonneau, J., Herley, C., Van Oorschot, P.C., Stajano, F.: The request to replace passwords: a framework for comparative evaluation of web authentication schemes. In: Proceedings of IEEE S &P 2012, pp. 553–567 (2012)
Google Scholar
Bonneau, J., Herley, C., Van Oorschot, P.C., Stajano, F.: Passwords and the evolution of imperfect authentication. Commun. ACM 58(7), 78–87 (2015)
Article Google Scholar
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: Proceedings of the NIPS 2017, pp. 5769–5779 (2017)
Google Scholar
Hitaj, B., Gasti, P., Ateniese, G., Perez-Cruz, F.: PassGAN: a deep learning approach for password guessing. In: Proceedings of the ACNS 2019 (2019)
Google Scholar
Houshmand, S., Aggarwal, S., Flood, R.: Next gen PCFG password cracking. IEEE Trans. Inf. Forensics Secur. 10(8), 1776–1791 (2015)
Article Google Scholar
Li, Z., Han, W., Xu, W.: A large-scale empirical analysis of Chinese web passwords. In: Proceedings of the USENIX Security 2014, pp. 559–574 (2014)
Google Scholar
Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)
Liu, Y., et al.: GENPass: a general deep learning model for password guessing with PCFG rules and adversarial generation. In: Proceedings of ICC 2018, pp. 1–6 (2018)
Google Scholar
Ma, J., Yang, W., Luo, M., Li, N.: A study of probabilistic password models. In: Proceedings of IEEE S &P 2014, pp. 689–704 (2014)
Google Scholar
Melicher, W., Ur, B., Komanduri, S., Bauer, L., Christin, N., Cranor, L.F.: Fast, lean and accurate: modeling password guessability using neural networks. In: Proceedings of the USENIX SEC 2017, pp. 1–17 (2017)
Google Scholar
Narayanan, A., Shmatikov, V.: Fast dictionary attacks on passwords using time-space tradeoff. In: Proceedings of the ACM CCS 2005, pp. 364–372 (2005)
Google Scholar
Rodríguez, P., Bautista, M.A., Gonzàlez, J., Escalera, S.: Beyond one-hot encoding: lower dimensional target embedding. Image Vis. Comput. 75, 21–31 (2018)
Article Google Scholar
Wang, D., Cheng, H., Wang, P., Huang, X., Jian, G.: Zipf’s law in passwords. IEEE Trans. Inf. Forensics Secur. 12(11), 2776–2791 (2017)
Article Google Scholar
Wang, D., Wang, P., He, D., Tian, Y.: Birthday, name and bifacial-security: understanding passwords of Chinese web users. In: Proceedings of the USENIX SEC 2019 (2019)
Google Scholar
Wang, D., Zhang, Z., Wang, P., Yan, J., Huang, X.: Targeted online password guessing: an underestimated threat. In: Proceedings of the ACM CCS 2016, pp. 1242–1254 (2016)
Google Scholar
Wang, D., Zou, Y., Tao, Y., Wang, B.: Password guessing based on recurrent neural networks and generative adversarial networks. Chin. J. Comput. 1519–1534 (2021)
Google Scholar
Weir, M., Aggarwal, S., de Medeiros, B., Glodek, B.: Password cracking using probabilistic context-free grammars. In: Proceedings of the IEEE S &P 2009, pp. 391–405 (2009)
Google Scholar
Xie, Z., Zhang, M., Yin, A., Li, Z.: A new targeted password guessing model. In: Liu, J.K., Cui, H. (eds.) ACISP 2020. LNCS, vol. 12248, pp. 350–368. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55304-3_18
Chapter Google Scholar
Xu, M., Wang, C., Yu, J., Zhang, J., Zhang, K., Han, W.: Chunk-level password guessing: towards modeling refined password composition representations. In: Proceedings of the ACM CCS 2021, pp. 5–20 (2021)
Google Scholar
Yang, K., Hu, X., Zhang, Q., Wei, J., Liu, W.: Studies of keyboard patterns in passwords: recognition, characteristics and strength evolution. In: Gao, D., Li, Q., Guan, X., Liao, X. (eds.) ICICS 2021. LNCS, vol. 12918, pp. 153–168. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86890-1_9
Chapter Google Scholar

Download references

Acknowledgment

The authors are grateful to the anonymous reviewers for their invaluable comments. Ding Wang is the corresponding author. This research was in part supported by the National Natural Science Foundation of China under Grant No.62172240, and by the Natural Science Foundation of Tianjin, China under Grant No. 21JCZDJC00190. There is no competing interests.

Author information

Authors and Affiliations

College of Computer Science, Nankai University, Tianjin, 300350, China
Yuxuan Wu
College of Cyber Science, Nankai University, Tianjin, 300381, China
Ding Wang & Yunkai Zou
College of Software, Nankai University, Tianjin, 300457, China
Ziyi Huang

Authors

Yuxuan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ding Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yunkai Zou
View author publications
You can also search for this author in PubMed Google Scholar
Ziyi Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ding Wang .

Editor information

Editors and Affiliations

University of Malaga, Malaga, Spain
Cristina Alcaraz
University of Surrey, Guildford, UK
Liqun Chen
University of Kent, Canterbury, UK
Shujun Li
University of Milan, Milan, Italy
Pierangela Samarati

Appendices

Appendix 1 Some Statistics About User-Chosen Passwords

The length distributions of each dataset are shown in Table 7. Most passwords’ length are between six and nine (avg. 73.81%). The length distribution is affected by the password policy. For example, CSDN dataset has much fewer passwords of length under eight as compared to other datasets, which may be caused by the fact that CSDN website changed the password policy to a more strict one. The character composition information is summarized in Table 8. Chinese users prefer to use digits in passwords, while English users prefer to use letters. This may be caused by cultural differences because most Chinese users use more digits in their daily lives than English words. In addition, English users prefer lowercase letters rather than uppercase letters. The top-10 passwords information is shown in Table 9. The password “123456" is the most commonly used password except for CSDN (due to its password policy). It is also interesting to see that the top-10 passwords in Chinese datasets are almost all pure digits.

Table 7. Length distribution information of each web service.

Full size table

Table 8. Character composition information of each web service\(*\).

Full size table

Table 9. Top-10 password information of each web service.

Full size table

Appendix 2 Exploratory Experiments

In Sect. 4.3, Probabilistic context-free grammars (i.e., PCFG) [10, 18] can be used for data preprocessing when integrated with neural networks. Our refined PCFG are based on the basic PCFG with four additional recognition rules, including keyboard pattern, word, website and year. The experiment result in Sect. 5.2 has already shown that our refiend PCFG can improve the performance by 1.36% on average compared to the basic PCFG when integrated with Long Short-Term Memory neural networks (i.e., LSTM) [17]. To explore the impact of different recognition rules on the experiment results, we evaluate the performance of LSTM based models using PCFG for preprocessing, where only one recognition rule is added to basic PCFG each time.

The result in Table 10 shows that compared to the LSTM based model with basic PCFG for preprocessing: (1) Using PCFG with additional word recognition for preprocessing has a 0.26% improvement on average; (2) Using PCFG with additional keyboard recognition for preprocessing has a 0.06% improvement on average; (3) The remaining recognition rules (i.e., website and year) have little improvement on the results (less than 0.01% on average). In general, adding one recognition rule to the basic PCFG [10] alone is not as effective as adding all the rules (i.e., our refined PCFG) when integrated with LSTM. The reason why the year recognition rule has the worst performance can be can be attributed to two reasons. Firstly, years are part of birthdays and birthdays vary widely among users, which has little effect on trawling password guessing attack. Secondly, individual year segments can be replaced by digit segments. Moreover, the promotion effect of different recognition rules to some extent reflects the pattern that users tend to use when creating passwords.

Table 10. Cracking results of LSTM based models using PCFG based preprocessing methods (Guess number = \(5*10^7\))\(\dagger \)

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Y., Wang, D., Zou, Y., Huang, Z. (2022). Improving Deep Learning Based Password Guessing Models Using Pre-processing. In: Alcaraz, C., Chen, L., Li, S., Samarati, P. (eds) Information and Communications Security. ICICS 2022. Lecture Notes in Computer Science, vol 13407. Springer, Cham. https://doi.org/10.1007/978-3-031-15777-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-15777-6_10
Published: 24 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15776-9
Online ISBN: 978-3-031-15777-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Deep Learning Based Password Guessing Models Using Pre-processing

Abstract

Access this chapter

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix 1 Some Statistics About User-Chosen Passwords

Appendix 2 Exploratory Experiments

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation