CCS Paper Part #2: Password Entropy

This is part #2 in a (mumble, cough, mumble) part serious of posts discussing the results published in the paper I co-authored on the effectiveness of passwords security metrics. Part #1 can be found here.

I received a lot of insightful comments on the paper since my last post, (one of the benefits of having a slow update schedule), and one thing that stands out is people really like the idea of password entropy. Here’s a good example:

“As to entropy, I think it would actually be a good measure of password complexity, but unfortunately there's no way to compute it directly. We would need a password database comparable in size (or preferably much larger than) the entire password space in order to be able to do that. Since we can't possibly have that (there are not that many passwords in the world), we can't compute the entropy - we can only try to estimate it in various ways (likely poor)”

First of all I want to thank everyone for their input and support as I really appreciate it. This is one of the few cases though where I’m going to have to disagree with most of you. In fact, as conceited as it sounds, my main takeaway has been that I've done a poor job of making my argument, (or I’m wrong which is always a possibility). So the end result is another post on the exciting topic of password entropy ;)

When I first started writing this post, I began with a long description on the history of Shannon Entropy, how it’s used, and what it measures. I then proceeded to delete what I had written since it was really long, boring, and quite honestly not that helpful. All you need to know is:

Claude Shannon was a smart dude.
No seriously, he was amazing; He literally wrote the first book on modern code-breaking techniques.
Shannon entropy is a very powerful tool used to measure information entropy/ information leakage.
Another way of describing Shannon entropy is that it attempts to quantify how much information is unknown about a random variable.
It’s been effectively used for many different tasks; from proving one time pads secure, to estimating the limits of data compression.
Despite the similar sounding names, information entropy and guessing entropy are not the same thing.
Yes, I’m actually saying that knowing how random a variable is doesn’t tell you how likely it is for someone to guess it in N number of guesses, (with the exception of the boundary cases where the variable is always known – aka the coin is always heads- or when the variable has an even distribution – aka a perfectly fair coin flip).

Ok, I’ll add one more completely unnecessary side note about Shannon Entropy. Ask a crypto guy, (or gal), if the Shannon entropy of a message encrypted with a truly random and properly applied one time pad is equal to the size of the key. If they say “yes”, point and laugh at them. The entropy is equal to that of original message silly!

Hey, do you know how hard it is to make an entropy related joke? I’m trying here…

Anyways, to calculate the entropy of a variable you need to have a fairly accurate estimate of the underlying probabilities of each possible outcome. For example a trick coin may land heads 70% of the time, and tails the other 30%. The resulting Shannon entropy is just a summation of the probability of each event multiplied by the log2 of its probability, (and then multiplied by -1 to make it a positive value). Aka:

So the Shannon entropy of the above trick coin would be -(.7 x log2(.7) + .3 x log2(.3)) which is equal to 0.8812 bits. A completely fair coin flip’s entropy would be equal to 1.0. In addition, the total entropy of different independent variables is additive. This means the entropy of flipping the trick coin and then the fair coin would be .8812 + 1.0 = 1.8812 bits worth of entropy.

I probably should have put a disclaimer above to say that you can live a perfectly happy life without understanding how entropy is calculated.

The problem is that while the Shannon entropy of a system is determined using the probability of the different outcomes, the final entropy measurement does not tell you about the underlying probability distribution. People try to pretend it does though, which is where they get into trouble. Here is a picture, (and a gratuitous South Park reference), that I used in my CCS presentation to describe NIST’s approach to using Shannon entropy in the SP800-63 document:

Basically they take a Shannon entropy value, assume the underlying probability distribution is even, and go from there. Why this is an issue is that when it comes to human generated passwords, the underlying probability distribution is most assuredly not evenly distributed. People really like picking “password1”, but there is always that one joker out there that picks a password like “WN%)vA0pnwe**”. That’s what I’m trying to say when I show this graph:

The problem is not that the Shannon value is wrong. It’s that an even probability distribution is assumed. To put it another way, unless you can figure out a method to model the success of a realistic password cracking session using just a straight line, you’re in trouble.

Let me make this point in another way. A lot of people get hung up on the fact that calculating the underlying probability distribution of a password set is a hard problem. So I want to take a step back and show you this holds true even if that is not the case.

For an experiment, I went ahead and designed a variable that has 100 possible values that occur at various probabilities, (thanks Excel). This means I know exactly what the underlying probability distribution is. This also means I’m able to calculate the exact Shannon entropy as well. The below graph shows the expected guessing success rate against one such variable compared to the expected guessing success generated by assuming the underlying Shannon entropy had an even distribution.

Now tell me again what useful information the Shannon entropy value tells the defender about the resistance of this variable to a guessing attack? What’s worse is the graph below that shows 3 different probability distributions that have approximately the same entropy, (I didn’t feel like playing around with Excel for a couple of extra hours to generate the EXACT same entropy; This is a blog and not a research paper after-all).

These three variables have very different resistance to cracking attacks, even though their entropy values are essentially the same. If I want to get really fancy, I can even design the variables in such a way that the variable with a higher Shannon entropy value is actually MORE vulnerable to a shorter cracking session.

This all comes back to my original point that the Shannon entropy doesn’t provide “actionable” information to a defender when it comes to selecting a password policy. Even if you were able to perfectly calculate the Shannon entropy of a password set, the resulting value still wouldn’t tell you how secure you were against a password cracking session. What you really want to know as a defender is the underlying probably distribution of those passwords instead. That's something I've been working on, but I’ll leave my group’s attempts to calculate that for another post, (hint, most password cracking rule-sets attempt to model the underlying probability distribution because they want to crack passwords as quickly as possible).