Passgan: A Deep Learning Approach For Password Guessing
Passgan: A Deep Learning Approach For Password Guessing
Passgan: A Deep Learning Approach For Password Guessing
Password Guessing? ??
?
Due to space limitations, an extended version of the paper can be found here: https:
//arxiv.org/abs/1709.00440
??
A preliminary version of this paper appeared in NeurIPS 2018 Workshop on Security
and Machine Learning (SecML’18) [24].
2 Hitaj et al.
1 Introduction
Passwords are the most popular authentication method, mainly because they are
easy to implement, require no special hardware or software, and are familiar to
users and developers [27]. Unfortunately, multiple password database leaks have
shown that users tend to choose easy-to-guess passwords [10,14,36], primarily
composed of common strings (e.g., password, 123456, iloveyou), and variants
thereof.
Password guessing tools provide a valuable tool for identifying weak pass-
words when they are stored in hashed form [49,53]. The effectiveness of password
guessing software relies on the ability to quickly test a large number of highly
likely passwords against each password hash. Instead of exhaustively trying all
possible character combinations, password guessing tools use words from dic-
tionaries and previous password leaks as candidate passwords. State-of-the-art
password guessing tools, such as John the Ripper [55] and HashCat [22], take this
approach one step further by defining heuristics for password transformations,
which include combinations of multiple words (e.g., iloveyou123456), mixed
letter case (e.g., iLoVeyOu), and leet speak (e.g., il0v3you). These heuristics,
in conjunction with Markov models, allow John the Ripper and HashCat to
generate a large number of new highly likely passwords.
While these heuristics are reasonably successful in practice, they are ad-hoc
and based on intuitions on how users choose passwords, rather than being con-
structed from a principled analysis of large password datasets. For this reason,
each technique is ultimately limited to capturing a specific subset of the pass-
word space which depends upon the intuition behind that technique. Further,
developing and testing new rules and heuristics is a time-consuming task that
requires specialized expertise, and therefore has limited scalability.
1.2 Contributions
1. We show that a GAN can generate high-quality password guesses. Our GAN
is trained on a portion of the RockYou dataset [57], and tested on two dif-
ferent datasets: (1) another (distinct) subset of the RockYou dataset; and
(2) a dataset of leaked passwords from LinkedIn [35]. In our experiments,
we were able to match 1,350,178 (43.6%) unique passwords out of 3,094,199
passwords from the RockYou dataset, and 10,478,322 (24.2%) unique pass-
words out of 43,354,871 passwords from the LinkedIn dataset. To quantify
the ability of PassGAN to generate new passwords, we removed from the test-
ing set all passwords that were present also in the training set. This resulted
in testing sets of size 1,978,367 and 40,593,536 for RockYou and LinkedIn,
respectively. In this setting, PassGAN was able to match 676,439 (34.6%)
samples in the RockYou testing set and 8,878,284 (34.2%) samples in the
LinkedIn set. Moreover, the overwhelming majority of passwords generated
4 Hitaj et al.
by PassGAN that did not match the testing sets still “looked like” human-
generated passwords, and thus could potentially match real user accounts
not considered in our experiments.
2. We show that PassGAN is competitive with state-of-the-art password gen-
eration rules. Even though these rules were specially tuned for the datasets
used in our evaluation, the quality of PassGAN’s output was comparable to
that of password rules.
3. With password generation rules, the number of unique passwords that can
be generated is defined by the number of rules and by the size of the pass-
word dataset used to instantiate them. In contrast, PassGAN can output
a practically unbounded number of password guesses. Crucially, our exper-
iments show that with PassGAN the number of matches increases steadily
with the number of passwords generated, Table 1. This is important because
it shows that the output of PassGAN is not restricted to a small subset of
the password space.
4. PassGAN is competitive with current state of the art password guessing
algorithms based on deep neural networks [38], matching the performance
of Melicher et al. [38], (indicated as FLA in the rest of the paper).
5. We show that PassGAN can be effectively used to augment password gen-
eration rules. In our experiments, PassGAN matched passwords that were
not generated by any password rule. When we combined the output of Pass-
GAN with the output of HashCat, we were able to guess between 51% (case
of RockYou) and 73% (case of LinkedIn) additional unique passwords com-
pared to HashCat alone.
We consider this work as the first step toward a fully automated generation
of high-quality password guesses. We argue that this work is relevant, important,
and timely. Relevant, because despite numerous alternatives [50,63,16,13,71], we
see little evidence that passwords will be replaced any time soon. Important,
because establishing the limits of password guessing—and better understanding
how guessable real-world passwords are—will help make password-based systems
more secure. And timely, because recent leaks containing hundreds of millions of
passwords [15] provide a formidable source of data for attackers to compromise
systems, and for system administrators to re-evaluate password policies.
1.3 Organization
3 Experiment Setup
– Batch size, which represents the number of passwords from the training
set that propagate through the GAN at each step of the optimizer. We
instantiated our model with a batch size of 64.
– Number of iterations, which indicates how many times the GAN invokes
its forward step and its back-propagation step [61,32,33]. In each iteration,
the GAN runs one generator iteration and one or more discriminator itera-
tions. We trained the GAN using various number of iterations and eventually
settled for 199,000 iterations, as further iterations provided diminishing re-
turns in the number of matches.
– Number of discriminator iterations per generator iteration, which
indicates how many iterations the discriminator performs in each GAN it-
eration. The number of discriminator iterations per generative iteration was
set to 10, which is the default value used by IWGAN.
– Model dimensionality, which represents the number of dimensions for
each convolutional layer. We experimented using 5 residual layers for both
PassGAN: A Deep Learning Approach for Password Guessing 7
the generator and the discriminator, with each of the layers in both deep
neural networks having 128 dimensions.
– Gradient penalty coefficient (λ), which specifies the penalty applied to
the norm of the gradient of the discriminator with respect to its input [20].
Increasing this parameter leads to a more stable training of the GAN [20].
In our experiments, we set the value of the gradient penalty to 10.
– Output sequence length, which indicates the maximum length of the
strings generated by the generator (G). We modified the length of the se-
quence generated by the GAN from 32 characters (default length for IW-
GAN) to 10 characters, to match the maximum length of passwords used
during training. We padded passwords shorter than 10 characters using ac-
cent symbol (i.e., “`”); we then removed it from the output of PassGAN.
– Size of the input noise vector (seed), which determines how many
random numbers from a normal distribution are fed as input to G to generate
samples. We set this size to 128 floating point numbers.
– Maximum number of examples, which represents the maximum number
of training items (passwords, in the case of PassGAN) to load. The maximum
number of examples loaded by the GAN was set to the size of the entire
training dataset.
– Adam optimizer’s hyper-parameters:
• Learning rate, i.e., how quickly the weights of the model are adjusted
• Coefficient β1 , which specifies the decaying rate of the running average
of the gradient.
• Coefficient β2 , which indicates the decaying rate of the running average
of the square of the gradient.
Coefficients β1 and β2 of the Adam optimizer were set to 0.5 and 0.9, respec-
tively, while the learning rate was 10−4 . These parameters are the default
values used by Gulrajani et al. [20].
We used the portion of RockYou dataset selected for training, see Section 4.1,
as the input dataset to HashCat Best64, HashCat gen2, JTR Spiderlab rules,
Markov Model, PCFG, and FLA, and generated passwords as follows:
– We instantiated HashCat and JTR’s rules using passwords from the training
set sorted by frequency in descending order (as in [38]). HashCat Best64
generated 754,315,842 passwords, out of which 361,728,683 were unique and
of length 10 characters or less. Note that this was the maximum number of
8 Hitaj et al.
samples produced by Best64 rule-set for the given input set, i.e., RockYou
training set. With HashCat gen2 and JTR SpiderLab we uniformly sampled
a random subset of size 109 from their output. This subset was composed of
passwords of length 10 characters or less.
– For FLA, we set up the code from [31] according to the instruction provided
in [17]. We trained a model containing 2-hidden layers and 1 dense layer of
size 512. We did not perform any transformation (e.g., removing symbols,
or transforming all characters to lowercase) on the training set for the sake
of consistency with the other tools. Once trained, FLA enumerates a subset
of its output space defined by a probability threshold p: a password belongs
to FLA’s output if and only if its estimated probability is at least p. In
our experiments, we set p = 10−10 . This resulted in a total of 747,542,984
passwords of length 10 characters or less. Before using these passwords in
our evaluation, we sorted them by probability in descending order.
– We generated 494,369,794 unique passwords of length 10 or less using the 3-
gram Markov model. We ran this model using its standard configuration [12].
– We generated 109 unique passwords of length 10 or less using the PCFG
implementation of Weir et al. [67].
4 Evaluation
LinkedIn Dataset We also tested each tool on passwords from the LinkedIn
dataset [35], of length up to 10 characters, and that were not present in the
training set. The LinkedIn dataset consists of 60,065,486 total unique passwords
4
We consider the use of publicly available password datasets to be ethical, and con-
sistent with security research best practices (see, e.g., [10,38,6]).
PassGAN: A Deep Learning Approach for Password Guessing 9
140000
120000
80000
60000
40000
20000
0
15 00
25 00
35 00
45 00
55 00
65 00
75 00
85 00
95 00
10 000
5 0
5 0
5 0
5 0
5 0
5 0
5 0
5 0
5 0
90 0
00
11 0
12 00
13 00
14 00
15 00
16 00
17 00
18 00
19 00
19 00
50
0
0
0
0
0
0
0
0
50
Checkpoint number
However, increasing the number of steps also increases the probability of over-
fitting [18,69].
To evaluate this tradeoff on password data, we stored intermediate training
checkpoints and generated 108 passwords at each checkpoint. Figure 1 shows how
many of these passwords match with the content of the RockYou testing set. In
general, the number of matches increases with the number of iterations. This
increase tapers off around 125,000-135,000 iterations, and then again around
190,000-195,000 iterations, where we stopped training the GAN. This indicates
that further increasing the number of iterations will likely lead to overfitting,
thus reducing the ability of the GAN to generate a wide variety of highly likely
passwords. Therefore, we consider this range of iterations adequate for the Rock-
You training set.
Our results show that, for each of the tools, PassGAN was able to generate
at least the same number of matches. Additionally, to achieve this result, Pass-
GAN needed to generate a number of passwords that was within one order of
magnitude of each of the other tools. This holds for both the RockYou and the
LinkedIn testing sets. This is not unexpected, because while other tools rely on
prior knowledge on passwords for guessing, PassGAN does not. Table 2 summa-
rizes our findings for the RockYou testing set, while Table 3 shows our results
for the LinkedIn test set.
Our results also show that PassGAN has an advantage with respect to rule-
based password matching when guessing passwords from a dataset different from
the one it was trained on. In particular, PassGAN was able to match more
passwords than HashCat within a smaller number of attempts (2.1·109 – 3.6·109
for LinkedIn, compared to 4.8 · 109 – 5.06 · 109 for RockYou).
from the “new” LinkedIn dataset. This confirms that combining rules with ma-
chine learning password guessing is an effective strategy. Moreover, it confirms
that PassGAN can capture portions of the password space not covered by rule-
based approaches. With this in mind, a recent version of HashCat [23] introduced
a generic password candidate interface called “slow candidates”, enabling the use
of tools such as PCFGs [68], OMEN [14], PassGAN, and more with HashCat.
dataset, being roughly one every 66-passwords. Similarly, the most common pass-
words produced by FLA start with “123” or use the word “love”. In contrast,
PassGAN’s most commonly generated passwords, tend to show more variability
with samples composed of names, the combination of names and numbers, and
more. When compared with the RockYou training set, the most likely samples
from PassGAN exhibit closer resemblance to the training set and its probabilities
than FLA does. We argue that due to the Markovian structure of the password
generation process in FLA, any password characteristic that is not captured
within the scope of an n−gram, might not be encoded by FLA. For instance, if a
meaningful subset of 10-character passwords is constructed as the concatenation
of two words (e.g., MusicMusic), any Markov process with n ≤ 5 will not be
able to capture this behavior properly. On the other hand, given enough exam-
ples, the neural network used in PassGAN will be able to learn this property.
As a result, while password pookypooky was assigned a probability p ≈ 10−33
by FLA (with an estimated number of guessing attempts of about 1029 ), it was
guessed after roughly 108 attempts by PassGAN.
To investigate further on the differences between PassGAN and FLA, we
computed the number of passwords in the RockYou testing set for which FLA re-
quired at least 1010 attempts and that PassGAN was able to guess within its
first 7 · 109 samples. These are the passwords to which FLA assigns low proba-
bilities, despite being chosen by some users. Because PassGAN can model them,
we conclude that the probabilities assigned by FLA to these passwords are in-
correct. Figure 2 presents our result as the ratio between the passwords matched
by FLA at a particular number of guessing attempts, and by PassGAN within
its first 7 · 109 attempts. Our results show that PassGAN can model a number of
passwords more correctly than FLA. However, this advantage decreased as the
number of attempts required for FLA to guess a password increased, i.e., as the
estimated probability of that password decreased. This shows that, in general,
the two tools agree on assigning probabilities to passwords.
16%
Low prob. passwords guessed by PassGAN
14%
12%
10%
8%
6%
4%
2%
0%
10 11 12 13 14
10 10 10 10 10
Table 4: Sample of passwords generated by PassGAN that did not match the
testing sets.
love42743 ilovey2b93 paolo9630 italyit
sadgross usa2598 s13trumpy trumpart3
ttybaby5 dark1106 vamperiosa ~dracula
saddracula luvengland albania. bananabake
paleyoung @crepess emily1015 enemy20
goku476 coolarse18 iscoolin serious003
nyc1234 thepotus12 greatrun babybad528
santazone apple8487 1loveyoung bitchin706
toshibaod tweet1997b 103tears 1holys01
We inspected a list of passwords generated by PassGAN that did not match any
of the testing sets and determined that many of these passwords are reasonable
candidates for human-generated passwords. As such, we speculate that a possibly
large number of passwords generated by PassGAN, that did not match our test
sets, might still match user accounts from services other than RockYou and
LinkedIn. We list a small sample of these passwords in Table 4.
5 Remarks
In this section, we summarize the findings from our experiments, and discuss
their relevance in the context of password guessing.
Character-level GANs are well suited for generating password guesses. In our
experiments, PassGAN was able to match 34.2% of the passwords in a testing
set extracted from the RockYou password dataset, when trained on a different
subset of RockYou. Further, we were able to match 21.9% of the password in
the LinkedIn dataset when PassGAN was trained on the RockYou password set.
This is remarkable because PassGAN was able to achieve these results with no
additional information on the passwords that are present only in the testing
dataset. In other words, PassGAN was able to correctly guess a large number
of passwords that it did not observe given access to nothing more than a set of
samples.
Current rule-based password guessing is very efficient but limited. In our ex-
periments, rule-based systems were able to match or outperform other password
guessing tools when the number of allowed guesses was small. This is a tes-
tament to the ability of skilled security experts to encode rules that generate
correct matches with high probability. However, our experiments also confirmed
that the main downside of rule-based password guessing is that rules can gener-
ate only a finite, relatively small set of passwords. In contrast, PassGAN was able
to eventually surpass the number of matches achieved using password generation
rules.
PassGAN: A Deep Learning Approach for Password Guessing 15
6 Conclusion
References
1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghe-
mawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine
learning. In: OSDI. vol. 16, pp. 265–283 (2016)
2. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. CoRR abs/1701.07875
(2017), http://arxiv.org/abs/1701.07875
3. Berthelot, D., Schumm, T., Metz, L.: BEGAN: Boundary equilibrium generative
adversarial networks. arXiv preprint arXiv:1703.10717 (2017)
4. Binkowski, M., Sutherland, D., Arbel, M., Gretton, A.: Demystifying MMD GANs.
International Conference on Learning Representations (ICLR) (2018)
5. Cao, Y., Ding, G.W., Lui, Y.C., Huang, R.: Improving GAN training via binarized
representation entropy (BRE) regularization. International Conference on Learning
Representations (ICLR) (2018)
6. Castelluccia, C., Dürmuth, M., Perito, D.: Adaptive password-strength meters from
markov models. In: NDSS (2012)
7. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan:
Interpretable representation learning by information maximizing generative adver-
sarial nets. In: Advances in Neural Information Processing Systems. pp. 2172–2180
(2016)
8. Ciaramella, A., D’Arco, P., De Santis, A., Galdi, C., Tagliaferri, R.: Neural network
techniques for proactive password checking. IEEE Transactions on Dependable and
Secure Computing 3(4), 327–339 (2006)
9. Daskalakis, C., Ilyas, A., Syrgkanis, V., Zeng, H.: Training GANs with optimism.
International Conference on Learning Representations (ICLR) (2018)
10. Dell’Amico, M., Michiardi, P., Roudier, Y.: Password strength: An empirical anal-
ysis. In: Proceedings IEEE INFOCOM. pp. 1–9. IEEE (2010)
11. Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using
a laplacian pyramid of adversarial networks. In: Advances in neural information
processing systems. pp. 1486–1494 (2015)
18 Hitaj et al.
31. Lab, C.: Fast, lean, and accurate: Modeling password guessability using neural net-
works (source code). https://github.com/cupslab/neural_network_cracking
(2016)
32. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.,
Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural
computation 1(4), 541–551 (1989)
33. LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard,
W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation net-
work. In: Advances in neural information processing systems. pp. 396–404 (1990)
34. Li, Y., Swersky, K., Zemel, R.: Generative moment matching networks. In: Inter-
national Conference on Machine Learning. pp. 1718–1727 (2015)
35. LinkedIn: Linkedin, https://hashes.org/public.php
36. Ma, J., Yang, W., Luo, M., Li, N.: A study of probabilistic password models. In:
IEEE Symposium on Security and Privacy (SP). pp. 689–704. IEEE (2014)
37. position Markov Chains, H.P.: https://www.trustwave.com/Resources/
SpiderLabs-Blog/Hashcat-Per-Position-Markov-Chains/ (2017)
38. Melicher, W., Ur, B., Segreti, S.M., Komanduri, S., Bauer, L., Christin, N., Cra-
nor, L.F.: Fast, lean, and accurate: Modeling password guessability using neural
networks. In: USENIX Security Symposium. pp. 175–191 (2016)
39. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint
arXiv:1411.1784 (2014)
40. Miyato, T., Koyama, M.: cGANs with projection discriminator. International Con-
ference on Learning Representations (ICLR) (2018)
41. Morris, R., Thompson, K.: Password security: A case history. Communications of
the ACM 22(11), 594–597 (1979)
42. Mroueh, Y., Li, C.L., Sercu, T., Raj, A., Cheng, Y.: Sobolev GAN. International
Conference on Learning Representations (ICLR) (2018)
43. Mroueh, Y., Sercu, T., Goel, V.: Mcgan: Mean and covariance feature matching
GAN. In: Proceedings of the 34th International Conference on Machine Learning,
ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. pp. 2527–2535 (2017),
http://proceedings.mlr.press/v70/mroueh17a.html
44. Murphy, K.P.: Handbook of Information Security, Information Warfare, Social,
Legal, and International Issues and Security Foundations. John Wiley & Sons
(2006)
45. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press (2012)
46. Nagarajan, V., Kolter, J.Z.: Gradient descent GAN optimization is locally stable.
In: Advances in Neural Information Processing Systems. pp. 5585–5595 (2017)
47. Narayanan, A., Shmatikov, V.: Fast dictionary attacks on passwords using time-
space tradeoff. In: Proceedings of the 12th ACM conference on Computer and
communications security. pp. 364–372. ACM (2005)
48. Nowozin, S., Cseke, B., Tomioka, R.: f-GAN: Training generative neural samplers
using variational divergence minimization. In: Advances in Neural Information
Processing Systems. pp. 271–279 (2016)
49. Percival, C., Josefsson, S.: The scrypt password-based key derivation function.
Tech. rep. (2016)
50. Perez, S.: Google plans to bring password-free logins to android
apps by year-end (2017), https://techcrunch.com/2016/05/23/
google-plans-to-bring-password-free-logins-to-android-apps-by-year-end/
51. Petzka, H., Fischer, A., Lukovnikov, D.: On the regularization of wasserstein GANs.
International Conference on Learning Representations (ICLR) (2018)
20 Hitaj et al.