What We Do in the /etc/shadow – Cryptography with Passwords
2022-12-29 23:44:25 Author: soatok.blog(查看原文) 阅读量:9 收藏

Ever since the famous “Open Sesame” line from One Thousand and One Nights, humanity was doomed to suffer from the scourge of passwords.

Something you know. Password.
Something you have. RSA token.
Something you are. Fingerprint.

Something you pretend to be. Happy.
Courtesy of SwiftOnSecurity

Even in a world where we use hardware tokens with asymmetric cryptography to obviate the need for passwords in modern authentication protocols, we’ll still need to include “something you know” for legal and political reasons.

In the United States, we have the Fifth Amendment to the US Constitution, which prevents anyone from being forced to testify against oneself. This sometimes includes revealing a password, so it is imperative that passwords remain a necessary component of some encryption technologies to prevent prosecutors from side-stepping one’s Fifth Amendment rights. (Other countries have different laws about passwords. I’m not a lawyer.)

If you’ve ever read anything from the EFF, you’re probably keenly aware of this, but the US government loves to side-step citizens’ privacy rights in a court of law. They do this not out of direct malice towards civil rights (generally), but rather because it makes their job easier. Incentives rule everything around me.

Thus, even in a passwordless future, anyone who truly cares about civil liberties will not want to dispense with them entirely. Furthermore, we’ll want these password-based technologies to pass the Mud Puddle test, so that we aren’t relying on large tech companies to protect us from government backdoors.

Given all this, most security professionals will agree that passwords suck. Not only are humans bad sources of randomness, but we’re awful at memorizing a sequence of characters or words for every application, website, or operating system that demands a password.

None of what I’ve written so far is the least bit novel or interesting. But here’s a spicy take:

The only thing that sucks worse than passwords is the messaging around password-based cryptographic functions.

Password-Based Cryptographic Functions

That isn’t a term of the art, by the way. I’m kind of coining it as a superset that includes both Password-Based Key Derivation Functions and Password Hashing Functions.

You might be wondering, “Aren’t those two the same thing?” No, they are not. I will explain in a moment.

The intersection of human-memorable secrets (passwords) and cryptographic protocols manifests in a lot of needless complexity and edge-cases that, in order to sufficiently explain anything conversationally, will require me to sound either drunk or like a blue deck player in Magic: The Gathering.

To that end, what I’m calling Password-Based Cryptographic Functions (PBCFs) includes all of the following:

  • Password-Hashing Functions
    • Bcrypt
  • Password-Based Key Derivation Functions
  • Balanced Password Authenticated Key Exchanges
    • CPace
  • Augmented Password Authenticated Key Exchanges
  • Doubly-Augmented Password Authenticated Key Exchanges

If you tried to approach categorization starting with simply “Key Derivation Functions”, you’d quickly run into a problem: What about HKDF? Or any of the NIST SP 800-108 KDFs for that matter?

If you tighten it to “Password-Based KDFs” or “Key-stretching functions”, that doesn’t include bcrypt. Bcrypt is a password hashing function, but it’s not suitable for deriving secret keys. I’ve discussed these nuances previously.

And then you have some password KDF and/or hashing algorithms that are memory-hard, some that are cache-hard.

And then some Password Authenticated Key Exchanges (PAKEs) are quantum-annoying, while others are not.

To make heads or tails of the different algorithms, their properties, and when and where you should use them, you’d either need to secure postdoc funding for cryptography research or spend a lot of time reading and watching passwords-focused conference talks.

So let’s talk about these different algorithms, why they exist to begin with, and some of their technical details.

Why Does Any of This Matter?

The intersection of passwords and cryptography is one of the foundations of modern society, and one that most Internet users experience everywhere they go.

You’ll know you have succeeded in designing a cryptographic algorithm when the best way to attack it is to try every possible key. This is called an exhaustive search, or a brute-force attack, depending on the disposition of the author.

Remember earlier when I said passwords suck because humans are bad at creating or remembering strong, unique passwords for every website they visit?

Well, it turns out, that some passwords are extremely common. And even worse, people who choose common passwords tend to reuse them everywhere.

This presented a challenge to early web applications: In order to have authenticated users, you need to collect a password from the user and check it on subsequent visits. If you ever get hacked, then it’s likely that all of your users will be impacted on other websites they used the same passwords for.

Including their bank accounts.

So the challenge for any password-based scheme is simply stated: How can you safely authenticate users based on a secret they know?

Enter: Hashing

Storing a hash of a password is an obvious solution to this predicament. Early solutions involved using the user’s password as an encryption key, rather than building atop cryptographic hash functions.

However, it’s important not to call it “password encryption”.

The reason isn’t merely pedantic; it’s meant to prevent a disaster from repeating: Adobe famously encrypted passwords with Triple-DES instead of hashing them.

In early UNIX-based operating systems, your password hashes were stored in a file called /etc/passwd, along with other user account data. These days, your actual password hashes are kept in a second file, /etc/shadow, which is only readable by root.

Unfortunately, the tool used to create password hashes was called crypt, so the “encryption” lingo stuck.

When Hashing Isn’t Enough

Although encrypting passwords is bad, using a fast cryptographic hash function (e.g. SHA256 or BLAKE2) is also bad.

LinkedIn learned this lesson the hard way in 2012, when an attacker leaked 117 million users’ hashed passwords. It turns out, they used SHA1 as their password hashing function.

Hash functions are deterministic: If you feed them the same input, you will always get the same output.

When your password hashing strategy is simply “just use SHA1”, two users with the same password will have an identical password hash.

People are bad at creating, or remembering, unpredictable passwords.

When you combine these observations, there are some extremely obvious candidates (123456, password, etc.) to prioritize when guessing.

Additionally, you could precompute these hashes once and build a reverse index of the passwords that hash to a specific output. This is called a rainbow table.

Unlike most of symmetric cryptography, where the goal ends at forcing the attacker to perform an exhaustive brute-force search of every possible key, password-based cryptography has to go the extra mile to prevent weak secrets from being compromised; a very tall order for anyone to contend with.

Towards Modern Password Hashing

None of this was a surprise to security experts in 2012. There were a couple generally known best practices since I first learned programming in the 2000s:

  • Salt your passwords (to mitigate rainbow tables)
  • Use an iterated hashing scheme, rather than a fast hash (to make each step of an exhaustive search slower)

PBKDF2, the first widely used password hashing and key derivation function, did exactly that:

  • PBKDF2 was designed to support randomly-generated per-user salts.
  • It used HMAC in a loop to force attackers to spend extra CPU cycles per guess. If an attacker could guess 10 million passwords per second against SHA-256, then using 100,000 iterations of PBKDF2 slowed them down to less than 100 guesses per second (due to HMAC calling the hash function twice).

And that might sound like the end of the story (“PBKDF2 wins”), but the password cracking community is extremely clever, so it’s only just begun.

Parallel Attacks and GPUs

Since your CPU can only guess a few hashes per second, PBKDF2 is a thorny problem to contend with.

However, there’s an easy way for attackers to use commodity hardware to bypass this limitation: Graphics cards are specially designed to perform a lot of low-memory calculations in parallel.

If you can write your password cracking code to leverage the benefits of GPUs instead of CPUs, then PBKDF2 becomes easy potatoes.

The other secure password hashing algorithm in use at the time, bcrypt, had only a linear improvement over PBKDF2’s security against GPU attacks.

Memory-Hard Password Hashing Algorithms

In 2009, Colin Percival proposed scrypt to mitigate this risk. Scrypt (pronounced “ess crypt”) builds atop PBKDF2-SHA256 by hashing psuedorandom regions of memory with Salsa20/8 (hence the S in Scrypt) in a way that makes it difficult to precompute.

Scrypt hashes require large amounts of memory to compute (at least 16 megabytes, in the recommended minimum configuration, versus the few kilobytes of incumbent password hashes).

While GPUs (and FPGAs, and ASICs, and …) are great for cracking password hashes that use CPU-bounded algorithms, algorithms that access large amounts of memory are difficult to attack effectively.

It’s still possible for clever programmers to work around this memory/bandwidth limitation, but to do so, you must compute more operations, which makes it not worthwhile.

Cryptographers and the password-cracking community call this expensive time-memory trade-off memory hardness. In 2016, it was determined that scrypt is maximally memory-hard.

Password Hashing Competition

The Password Hashing Competition (PHC) was kicked off by JP Aumasson in 2012 to develop a new standard for password hashing and password-based key derivation.

In 2015, they selected Argon2 as its recommendation, with four finalists receiving special recognition (Catena, Lyra2, Makwa, and yescrypt–which is based on scrypt).

Cache-Hardness

In the years since the PHC concluded, the advances in password cracking techniques have not slowed down.

One of the PHC judges recently proposed a new algorithm called bscrypt which purports a property called “cache hard” rather than “memory hard”. The reasoning is that, for shorter run times, memory bandwidth and small CPU caches is a tighter bottleneck than overall memory requirements.

In fact, there’s an Argon2 variant (Argon2ds) which offers cache-hardness.

Two Hard Choices

So which of the two should you care about, as a defender?

If you’re validating password hashes in an online service, a cache-hard algorithm might make more sense. You’re incentivized to have shorter server-side run times to avoid DoS attacks, so the benefits of cache-hardness seem more impactful than memory-hardness.

If you’re deriving a sensitive cryptography key from a password and can afford to take several seconds of wall-clock time, a memory-hard algorithm is likely to be a better fit for your threat model.

(Or, like, run both a cache-hard and memory-hard algorithm and use a fast KDF, such as HKDF, to mix the two into your final cryptography key.)

Of course, the story doesn’t end here. Password Hashing and Password-Based Key Derivation continues to mature as the password cracking community discovers new attacks and engineers defenses against them.

A Strange Game; The Only Winning Move is to PAKE

Password hashing is a defense-in-depth against a system compromise that enables attackers to perform offline attacks against a cryptographic output to determine a user’s password.

However, for many applications, the bigger question is, “Why even play this game in the first place?”

Password-Authenticated Key Exchanges

A password authenticated key exchange (PAKE) algorithm is an interactive protocol to establish keys based on at least one party’s knowledge of a password.

PAKEs are what enable you to log into encrypted WiFi connections and encrypt your traffic. PAKEs prevent you from having to reveal the password (or a password-equivalent value, such as a password hash) to the other party.

Although there are a lot of proposed PAKE algorithms, the one that most people implemented was SRP (“Secure Remote Password”), which was intuitively a lot like Diffie-Hellman but also not Diffie-Hellman (since SRP used addition).

For a good teardown on SRP, Matthew Green’s blog is highly recommended.

There are a few real-world cryptography problems with using SRP:

  1. You need to use a special kind of prime number for your protocol. The standard Diffie-Hellman moduli are not suitable for SRP; you want a safe prime (i.e. a number of the form N = 2q + 1, where q is also prime).
  2. One of the steps in SRP, client-side, is to compute A = g^a (mod N). Here, A is a public value while a is a secret (usually the SHA1 hash of the username and pasword).

    If your software implementation of modular exponentiation (a^x mod P) isn’t constant-time, you leak a, which is a password-equivalent value.

Additionally, it’s not possible to use SRP with elliptic curve groups. (Matthew Green’s post I linked above covers this in detail!)

Thus, SRP is generally considered to be deprecated by security experts, in favor of other PAKEs.

IETF, CFRG, PAKE Selection

As early as IETF 104, the Crypto Forum Research Group (CFRG) began exploring different PAKE algorithms to select for standardization.

One of the motivators for this work is that the WiFi alliance had shoehorned a new PAKE called Dragonfly into their WPA3, which turned out to be badly designed.

The CFRG’s candidate algorithms and reviews were publicly tracked on GitHub, if you’re curious.

TL;DR – They settled on recommending OPAQUE as an augmented PAKE and CPace as a balanced PAKE.

Sc00bz has done multiple talks at security conferences over the years to talk about PAKEs:

Quantum Annoyance

The PAKEs we’ve discussed involved a cyclic group (and the newer ones involved an elliptic curve group). The security of these schemes is dependent on the hardness of a discrete logarithm problem.

If a cryptography relevant quantum computer (CRQC) is developed, discrete logarithm problems will cease to be hard.

Some PAKEs fall apart the moment a discrete log is solvable. This is par for the course for Diffie-Hellman and elliptic curve cryptography.

Others require an attacker to solve a discrete log for every guess in an offline attack (after capturing a successful exchange). This makes them annoying for quantum attackers.

While they don’t provide absolute security like post-quantum cryptography, they do make attackers armed with a CRQC work harder.

OPAQUE isn’t quantum annoying. Simply observe all of the packets from making an online guess, solve the DLP offline, and you’re back to the realm of classical offline guessing.

The State of the Art

At this point, you may feel somewhat overwhelmed by all of this information. It’s very tempting to simplify all of this historical context and technical detail.

You might be telling yourself, “Okay, Scrypt, Argon2, CPace, OPAQUE. Got it. That’s all I need to remember. Everything else is questionable at best. Ask a cryptographer. I’m bailing out, yo!”

But the story gets worse on two fronts: Real-world operational requirements, and compliance.

Your Hands Are Tied, Now Swim

If you’re selling a product or service to the US government that uses cryptography, you need to first clear a bare minimum bar called FIPS 140.

FIPS 140 is opinionated about which algorithms you use. For password hashing and password-based key derivation, FIPS defers to NIST SP 800-209, which means you’re only permitted to use PBKDF2 in any FIPS modules (with a SHA2- family hash function). (Update: See below.)

So all of the information about memory-hard and cache-hard password hashing algorithms? This is useless for you if you ever try to sell to the US government.

An open question is whether Scrypt is FIPSable due to it building atop PBKDF2. To be blunt: I’m not sure. Ask a NIST Cryptographic Module Validation Program lab for their opinion.

Update (2022-01-07): A Brief Correction

Thanks to FreakLegion for pointing this out.

According to NIST SP 800-63B, memory-hard hash functions are recommended.

Examples of suitable key derivation functions include Password-based Key Derivation Function 2 (PBKDF2) [SP 800-132] and Balloon [BALLOON]. A memory-hard function SHOULD be used because it increases the cost of an attack.

NIST SP 800-63B

Don’t use Balloon, though. It’s under-specified.

FreakLegion indicates in their Hacker News comment that scrypt and yescrypt are both acceptable.

End Update

In the realm of PAKEs, both CPace and OPAQUE are frameworks that can be used with multiple elliptic curve groups.

Both frameworks support NIST P-256, but CPace supports X25519 and OPAQUE supports Ristretto255; these are currently not FIPSable curves.

“I don’t care about FIPS. Argon2 all the things!”

JavaScript runtimes don’t provide a built-in Argon2 implementation. This poses several challenges.

  • Do you care about iOS users? Lockdown mode prevents you from using WebAssembly, even in a browser extension.
  • Trying to polyfill scrypt or Argon2 in a scripting language is a miserable experience. We’re talking quadratic performance penalties here. Several minutes of CPU time to calculate a hash that is far too weak to be acceptable in any context.

Consequently, not a single password manager I’ve evaluated that provides a browser extension uses a memory-hard algorithm for deriving encryption keys.

Update: I’ve been told Dashlane uses Argon2.

This Is Frustrating

When I write about cryptography, my goal is always to be clear, concise, and helpful. I want whoever reads my writing to, regardless of their level of expertise, walk away more comfortable with the subject matter.

At minimum, I want you to be able to intuit when you need to ask an expert for help, and have a slightly better understanding of how to word your questions to get the answer you need. In an ideal world, you’d even be able to spot the same kind of weaknesses I do in cryptography designs and implementations after reading enough of this blog.

I can’t do that with password-based cryptography. The use-cases and threat models are too varied to make a clear-cut recommendation, and it’s too complicated to parcel out in a way that doesn’t feel reductive or slightly contradictory. This leads to too many caveats or corner cases.

Passwords suck.

If you ask a password cracking expert and describe your use case, you’ll likely get an opinionated and definitive answer. But every time I’ve asked generally about this subject, without a specific use case in mind, I got an opinionated answer followed by a long chain of caveats and alternatives.

With that in mind, here’s a few vague guidelines that are up-to-date as of the end of 2022.

Recommendations for Password-Based Cryptography in 2023

Password Hashing

If you’re authenticating users in an online service (storing the hashes in a database), use any of the following algorithms:

  • bcrypt with a cost factor appropriate to take about 100 milliseconds to compute on your server hardware (typically at least 12)
  • scrypt with N = 32768, r = 8, p = 1 and a random salt at least 128 bits in length (256 preferred)
  • Argon2id with a memory cost of 64 MiB, ops cost of 3, and parallelism of 1

If you’re forced to use PBKDF2 for whatever dumb reason, the parameters you want are at least:

  • SHA-512 with 210,000 rounds (preferred)
  • SHA-256 with 600,000 rounds
  • SHA-1 (if you must) with 1,300,000 rounds

These numbers are based on constraining attackers to less than 10,000 hashes per second, per GPU.

I’m not currently recommending any algorithms deliberately designed for cache-hardness because they need further analysis from other cryptographers. (Bcrypt is minimally cache-hard, but that wasn’t a stated design goal.)

I didn’t evaluate the other PHC finalists nor other designs (Balloon hashing, Ball and Chain, etc.). Ask your cryptographer if those are appropriate instead.

Password-Based Key Derivation

If you’re deriving an encryption key from a password, use any of the following algorithms:

  • scrypt with N = 1048576, r = 8, p = 1 and a random salt at least 128 bits in length (256 preferred)
  • Argon2id with a memory cost of 1024 MiB, ops cost of 4, and parallelism of 1

If you’re forced to use PBKDF2 for whatever dumb reason, the parameters you want should target between 1 and 10 seconds on the defender’s hardware. If this is somehow slower than the numbers given for password hashing above, go with the password hashing benchmarks instead.

Here’s a dumb PHP script you can use to quickly find some target numbers.

<?php
/* Dumb PBKDF2 benchmarking script by Soatok. 2022-12-29 */

$salt = random_bytes(32);
$password = 'correct horse battery staple';
$c = '';
$start = $end = microtime(true);

foreach(['sha1', 'sha256', 'sha512'] as $alg) {
	foreach ([
		1 << 14,
		1 << 15,
		1 << 16,
		1 << 17,
		1 << 18,
		1 << 19,
		1 << 20,
		1200000,
		1 << 21,
		2500000,
		3000000,
		3500000,
		1 << 22,
		5000000,
		1 << 23,
		1 << 24,
		1 << 25,
	] as $n) {
	   $start = microtime(true);
	   $c ^= hash_pbkdf2($alg, $password, $salt, $n, 32, true);
	   $end = microtime(true);
	   
	   $time = $end - $start;
	   echo str_pad($n, 12, ' ', STR_PAD_LEFT), 
           " iterations ({$alg}) -> \t",
           $time, 
           PHP_EOL;
	   
	   if ($time > 5.5) {
		   echo PHP_EOL;
		   break;
	   }
	}
}

On my laptop, targeting 5.0 seconds means at least 4,000,000 rounds of PBKDF2 with SHA1, SHA256, or SHA512 (with SHA256 coming closer to 5,000,000).

If you’re aiming for less than 1,000 hashes per second per GPU, against an attacker armed with an RTX 4090:

  • SHA-512 with 3,120,900 iterations (preferred)
  • SHA256 with 8,900,000 iterations
  • SHA-1 with 20,000,000 iterations

Sc00bz tells me that this generation of GPU has better performance/cost than previous generations (which had seen diminishing returns), but requires 1.5x the power, 1.78x the price, and 2x the physical space. Sc00bz refers to this as “1.5 GPUs”.

If you adjust for these factors, your final PBKDF2 target becomes:

  • SHA-512 with 2,100,000 iterations (preferred)
  • SHA-256 with 6,000,000 iterations
  • SHA-1 with 13,000,000 iterations

Password Authenticated Key Exchanges

In a client-server model, where the server isn’t meant to see password-equivalent data, you want an augmented PAKE (or perhaps a doubly-augmented PAKE).

You’re overwhelmingly more likely to find OPAQUE support in the immediate future, due to the direction the IETF CFRG is moving today.

In a more peer-to-peer model, you want a balanced PAKE. CPace is the best game in town.

Don’t use SRP if you can help it.

If you can’t help it, make sure you use one of the groups defined in RFC 5054, rather than a generic Diffie-Hellman group. You’ll also want an expert to review your implementation to make sure your BigInt library isn’t leaking your users’ private exponents.

Also, feel free to deviate from SHA1 for calculating the private exponent from their username and password. SHA1 sucks. Nothing is stopping you from using a password KDF (or, at worst, SHA256) here, as long as you do so consistently.

(But as always, ask your cryptography expert first.)

TL;DR

Password-Based Cryptography is a mess.

If you’re not aware of the history of the field and the technical details that went into the various cryptographic algorithms that experts recommend today, it’s very easy to say something wrong that sounds correct to an untrained ear.

At the end of the day, the biggest risk you face with password-based cryptography is not using a suitable algorithm in the first place, rather than using a less optimal algorithm.

Experts have good reasons to hate PBDKF2 (slow for defenders, fast for attackers) and SRP (we have better options today, which also come with actual security proofs), but they certainly hate unsalted SHA1 and Triple-DES more.

My only real contribution to this topic is an effort to make it easier to talk about to non-experts by offering a new term for the superset of all of these password-based algorithms: Password-Based Cryptographic Functions.

Finally, the password cracking community is great. There are a lot of smart and friendly people involved, and they’re all working to make this problem easier to get right. If you’re not sure where to start, check out PasswordsCon.


文章来源: https://soatok.blog/2022/12/29/what-we-do-in-the-etc-shadow-cryptography-with-passwords/
如有侵权请联系:admin#unsafe.sh