Code for our 2018 NDSS paper "A Security Analysis of Honeywords" and 2022 IEEE S&P paper "How to Attack and Generate Honeywords"
If you decide to use our code, please cite our paper 🤝.
Honeywords are decoy passwords associated with each user account, and they contribute a promising approach to detecting password leakage. This approach has been covered by hundreds of medias and also been adopted in various research domains. The idea of honeywords looks deceptively simple, but it is a deep and sophisticated challenge to automatically generate honeywords that are hard to differentiate from real passwords. In Juels-Rivest’s work, four main honeyword-generation methods are suggested but only justified by heuristic security arguments.
In this work, we for the first time develop a series of experiments using 10 large-scale password datasets, a total of 104 million real-world passwords, to evaluate the security that these four methods can provide. Our results reveal that they all fail to provide the expected security: real passwords can be distinguished with a success rate of 29.29%∼32.62% by our basic trawling-guessing attacker, but not the claimed 5%, with just one guess (when each user account is associated with 19 honeywords as recommended). This figure reaches 34.21%∼49.02% under the advanced trawling-guessing attackers who make use of various state-of-the-art probabilistic password models. We further evaluate the security of Juels-Rivest’s methods under a targeted-guessing attacker who can exploit the victim’ personal information, and the results are even more alarming: 56.81%∼67.98%. Overall, our work resolves three open problems in honeyword research, as defined by Juels and Rivest.
As shown in experiments_for_basic_trawling_attacks.pdf
, we further evaluate Juels-Rivest’s four honeyword methods by using three datasets: Tianya, Rockyou and 000webhost.
Here, we take the model syntax
method and the dataset data/test.txt
and data/train.txt
as example.
- makefile
$ make
- generate the honeywords file and checker file. Then you can see two files
test_honeywords.txt
andtest_checker.txt
in the folderdata/
.
$ ./gen ./data/test.txt
- calculate the probability of honeywords
calc
takes two parameters, the honeywords file and the training set. It generates the probability filetest_pr.txt
in the folderdata/
.
$ ./calc ./data/test_honeywords.txt ./data/train.txt
- attack
atk
takes five parameters, the probability file, the honeywords file, the checker file, the threshold of guess number for the single account and the threshold of guess number for the whole websites. Then you can find the filetest_crack_num.txt
in the folderdata/
, each line in the file means the the number of cracked account every time guessing wrong, andtest_result.txt
, each line shows the hit count, the line index, the position in that line and the probability of the honeyword.
$ ./atk ./data/test_pr.txt ./data/test_honeywords.txt ./data/test_checker.txt 3 100
To achieve perfect honeyword-generation, Imran proposed the honeyindex system. To generate honeywords for a given user, honeyindex directly uses the other users' passwords as this user's honeywords. So the distribution of honeywords is equal to the distribution of passwords.
There is a serious flaw in the Honeyindex system. If some sweetindex
-
Find an isolated sweetindex which appears only in one user's sweetindex list. It is certain that the password pairing with this isolated sweetindex is this user's real password.
-
Delete all the incoming sweetindexes pairing with this user's password. As a result, new isolated sweetindexes will appear.
-
Go to Step 1 until no isolated sweetindex exists.
Note that, the sweetindex of user
Imran realized that "passwords of newly created accounts would not be used as honeywords", so he suggested regenerating sweetindexes periodically for all users. But another problem arises: most users don't change their passwords periodically. So if the distinguishing attacker gets two password files that are leaked at different time points, the attacker only needs to compare a user's two sweetword lists in these two files, and can have a high confidence that the sweetword which is contained in both sweetword lists is the real password. Many websites don't realize that they have been compromised until several years later. So the attacker has enough time to steal the password file several times until the website realizes this.
There is a another more serious problem in the honeyindex system. The passwords are stored in salted hash in the (sweetindex, password)--table. So it's hardly impossible to keep one user's sweetwords different from each other. If a user has a honeyword that is the same with his password, the position of the honeyword in the sweetword list will be sent to the honeychecker. This will cause a false alarm. To overcome this problem, honeywords should be generated different from the real password. This can be done when the user registers, but it's hardly impossible when the sweetindex are regenerated periodically: the website does not know the plaintext of user's password, because all passwords have been hashed and salted.
The storage cost of the honeyindex system is less than the honeyword system, because honeyindex only stores one hash per user, while the honeyword system needs to store
The honeyword system uses the same salt for all the sweetwords of one user. When a user logins with password