In 2010, Coda Hale wrote How To Safely Store A Password which began with the repeated phrase, “Use bcrypt”, where the word bcrypt was linked to a different implementation for various programming languages.
This had two effects on the technology blogosphere at the time:
At the time, it was great advice!
In 2010, bcrypt was the only clearly good answer for password hashing in most programming languages.
In the intervening almost fifteen years, we’ve learned a lot more about passwords, password cracking, authentication mechanism beyond passwords, and password-based cryptography.
If you haven’t already read my previous post about password-based cryptography, you may want to give that one a once-over before you continue.
But we’ve also learned a lot more about bcrypt, its limitations, the various footguns involved with using it in practice, and even some cool shit you can build with it.
In light of a recent discussion about switching WordPress’s password hashing algorithm from PHPass (which is based on MD5) to bcrypt, I feel now is the perfect time to dive into this algorithm and its implications on real-world cryptography.
Bcrypt is a password hashing function, but not a password KDF or general-purpose cryptographic hash function.
If you’re using a sane password storage API, such as PHP’s password API, you don’t even need to think about salting your passwords, securely verifying passwords, or handling weird error conditions. Instead, you only need concern yourself with the “cost” factor, which exponentially increases the runtime of the algorithm.
There’s just one problem: bcrypt silently truncates after 72 characters. (Demo.)
<?php $example1 = str_repeat('A', 72); $example2 = $example1 . 'B'; $hash = password_hash($example1, PASSWORD_BCRYPT); var_dump(password_verify($example2, $hash));
This may sound ludicrous (“who uses 72 character passwords anyway?”) until you see security advisories like this recent one from Okta.
The Bcrypt algorithm was used to generate the cache key where we hash a combined string of userId + username + password. Under a specific set of conditions, listed below, this could allow users to authenticate by providing the username with the stored cache key of a previous successful authentication.
(…)
- The username is 52 characters or longer
The other thing to consider is that many people use passphrases, such as those generated from Diceware, which produce longer strings with less entropy per character.
If you use bcrypt as-is, you will inevitably run into this truncation at some point.
In response to this limitation, many developers will suggest pre-hashing the password with a general purpose cryptographic hash function, such as SHA-256.
And so, in pursuit of a way to avoid one footgun, developers introduced two more.
If you use the raw binary output of a hash function as your password hash, be aware that bcrypt will truncate on NUL (0x00
) bytes.
With respect to the WordPress issue linked above, the default for PHP’s hashing API is to output hexadecimal characters.
This is a bit wasteful. Base64 is preferable, although any isomorphism of the raw hash output that doesn’t include a 0x00
byte is safe from truncation.
When a system performs a migration from a cryptographic hash function (e.g., MD5) to bcrypt, they typically choose to re-hash the existing hash with bcrypt.
Because users typically reuse passwords, you can often take the fast, unsalted hashes from another breach and use it as your password dictionary for bcrypt.
If then you succeed in verifying the bcrypt password for a fast hash, you can then plug the fast hash into software like Hashcat, and then crack the actual password at a much faster rate (tens of billions of candidates/second, versus thousands per second).
This technique is called hash shucking (YouTube link).
You can avoid hash shucking by using HMAC with a static key–either universal for all deployments of your software, or unique per application.
It doesn’t really matter which you choose; all you really need from it is domain separation from naked hashes.
I frequently see this referred to as “peppering”, but the term “pepper” isn’t rigidly defined anywhere.
One benefit of using a per-application HMAC secret does make your hashes harder to crack if you don’t know this secret.
For balance, one downside is that your hashes are no longer portable across applications without managing this static key.
Altogether, it’s quite straightforward to avoid bcrypt’s footguns, as I had recommended to WordPress last week.
Easy, straightforward, and uncontroversial. Right?
The linked discussion was tedious, so I will briefly describe the objections raised to my suggestion.
When you develop a popular CMS, library, or framework, you cannot possibly know all the ways that your software will be used by others. It’s almost always better to be misuse-resistant.
There were some other weird arguments (such as “Bcrypt is approved by NIST for FIPS”, which is just plain false).
If you have a random secret key, HMAC-SHA-512 is a secure pseudorandom function that you can treat as a Random Oracle.
Because it’s HMAC, you don’t have to worry about Length Extension Attacks at all. Therefore, the best known attack strategy is to produce a collision.
The raw binary output of SHA-512 is 64 characters, but may contain NUL characters (which would truncate the hash). To avoid this, we base64-encode the output.
When you base64-encode a SHA-512 hash, the output is 88 characters (due to base64 padding). This is longer than the 72 characters supported by bcrypt, so it will truncate silently after 72 characters.
This is still secure, but to prove this, I need to use math.
There are 64 possible characters in the base64 alphabet. That’s tautology, after all.
If you have a string of length 72, for which each character can be one of 64 values, you can represent the total probability space of possible strings as .
If you know that , you can do a little bit of arithmetic and discover this quantity equal to .
As I discussed in my deep dive on the birthday bound, you can take the cube root of this number to find what I call the Optimal Birthday Bound.
This works out to samples in order to find a probability of a single collision.
This simply isn’t going to happen.
Argon2 and scrypt don’t have the bcrypt footguns. You can hash passwords of arbitrary length and not care about NUL characters. They’re great algorithms.
Several people involved in the Password Hashing Competition (that selected Argon2 as its winner) have since lamented the emphasis on memory-hardness over cache-hardness. Cache-hardness is more important for short run-times (i.e., password-based authentication), while memory-hardness is more important for longer run-times (i.e., key derivation).
Ironically, bcrypt is cache-hard, while scrypt and the flavors of Argon2 that most people use are not.
Most of my peers just care that you use a password hashing algorithm at all. They don’t particularly care which. The bigger, and more common, vulnerability is not using one of them in the first place.
I’m mostly in agreement with them, but I would prefer that anyone that chooses bcrypt takes steps to disarm its footguns.
Earlier, I noted that bcrypt is not a KDF. That doesn’t mean you can’t make one out of bcrypt. Ryan Castellucci is an amazing hacker; they managed to do just that.
To understand why this is difficult, and why Ryan’s hack works, you need to understand what bcrypt actually is.
Bcrypt is a relatively simple algorithm at its heart:
"OrpheanBeholderScryDoubt"
64 times in ECB mode using the expanded key from step 1.Most of the heavy work in bcrypt is actually done in the key schedule; the encryption of three blocks (remember, Blowfish is a 64-bit block cipher) just ensures you need the correct resultant key from the key schedule.
“So how do you get an encryption key out of bcrypt?”
It’s simple: we, uh, hash the S-box.
static void BF_kwk(struct BF_data *data, uint8_t kwk[BLAKE2B_KEYBYTES]) { BF_word *S = (BF_word *)data->ctx.S; BF_htobe(S, 4*256); // it should not be possible for this to fail... int ret = blake2b_simple(kwk, BLAKE2B_KEYBYTES, S, sizeof(BF_word)*4*256); assert(ret == 0); BF_betoh(S, 4*256); }
Using BLAKE2b to hash the S-box from the final Blowfish key expansion yields a key-wrapping key that can be used to encrypt whatever data is being protected.
The only feasible way to recover this key is to provide the correct password and salt to arrive at the same key schedule.
Any attack against the selection of S implies a cryptographic weakness in bcrypt, too. (I’ve already recommended domain separation in a GitHub issue.)
Although bcrypt is still an excellent cache-hard password hashing function suitable for interactive logins, it does have corner cases that sometimes cause vulnerabilities in applications that misuse it.
If you’re going to use bcrypt, make sure you use bcrypt in line with my recommendations to WordPress: HMAC-SHA-512, base64 encode, then bcrypt.
Here’s a quick proof-of-concept for PHP software:
<?php declare(strict_types=1); class SafeBcryptWrapperPoC { private $staticKey; private $cost = 12; public function __construct( #[\SensitiveParameter] string $staticKey, int $cost = 12 ) { $this->staticKey = $staticKey; $this->cost = $cost; } /** * Generate password hashes here */ public function hash( #[\SensitiveParameter] string $password ): string { return \password_hash( $this->prehash($password), PASSWORD_BCRYPT, ['cost' => $this->cost] ); } /** * Verify password here */ public function verify( #[\SensitiveParameter] string $password, #[\SensitiveParameter] string $hash ): bool { return \password_verify( $this->prehash($password), $hash ); } /** * Pre-hashing with HMAC-SHA-512 here * * Note that this prefers the libsodium base64 code, since * it's implemented in constant-time */ private function prehash( #[\SensitiveParameter] string $password ): string { return \sodium_bin2base64( \hash_hmac('sha512', $password, $this->staticKey, true), \SODIUM_BASE64_VARIANT_ORIGINAL_NO_PADDING ); } }
You can see a modified version of this proof-of-concept on 3v4l, which includes the same demo from the top of this blog post to demonstrate the 72-character truncation bug.
Header image: Art by Jim and CMYKat; a collage of some DEFCON photos, as well as Creative Commons photos of Bruce Schneier (inventor of the Blowfish block cipher) and Niels Provos (co-designer of bcrypt, which is based on Blowfish).