Password Encryption, Hashing, Salting Explained

5 min read Nov 10, 2009

Definitions

First, I'm going to explain a few terms and their definitions.

Hash/Signature/Digest – This is an encrypted representation of data. It is technically impossible to return the original data using this string of characters, but you will see methods around this. Two common algorithms for this are the MD5 and SHA1 algorithms.
Salting – If you run a string through and MD5 (or SHA1), you will get the same result. E.g. MD5(“password”) = “5f4dcc3b5aa765d61d8327deb882cf99“. A salt is basically a string that you add to the input string before hashing, e.g. MD5(“password-salted”) = “0f538766ee062336f22a75bd73efddcb“.
Reversible Encryption – This is when you apply an algorithm to a string, and get a different string, which can later be reversed and you can get the original string. For these to be secure, they will require some sort of key/certificate (think password) to get the original value back. An example of this would be the SSH encryption (uses certificates) or the rot13 algorithm (simply shifts characters 13 positions).
Collision – Two different input strings having the same Digest.

Concepts

Now that you know some of the terms, lets dig in and start explaining some things. For one thing, ever wonder why on most websites you can't get a copy of your original password back, but you must “reset” it? Most application (at least good/secure applications) do not store your actual password in their database. What they do is store a hash of your password.

Technically, a hash cannot be reversed. Lets say, for example, we use a very very simplistic hashing algorithm. What this does is takes input numbers, adds them up, and keeps doing this until we are left with a one digit number. Look at the following example:

orig:        16341
1+6+3+4+1 =  15
1+5 =        6
hash:        6

Obviously, it is impossible to take the outputted “6” and return to the inputted “16341”, as there are an infinite number of input strings that can return the value “6”. Now take this concept, and use a very complex algorithm, and you'll get an MD5 hash.

Also, as you can tell, this is a very poor algorithm. Our hash only has 10 possible combination's (0,1,2,3,4,5,6,7,8,9), so anything we throw at it has a 10% chance of being correct. For example, if we store the number 6, and someone authenticates with “16347” or “15”, they will pass the test. (by the way, this algorithm is used for such things as UPC check-sums and quickly seeing if a number is divisible by 9 in your head :p). But, it does illustrate a very simplistic hashing algorithm.

Attacks / Password Cracking

Dictionary Attacks

However, these are not a perfect method for encrypting passwords. If you give me theMD5 hash “72b302bf297a228a75730123efef7c41”, I can tell you that your password is most likely “banana”. How did I know? I performed a dictionary attack (aka I pasted it into Google). You see, people will build large databases of known common strings and their hashes.

However, if you give me the hash value “0ee12ddebf5d0232655a1b9ff6ded348” I won't have a clue what it is by doing a dictionary attack. The value of this password is “<span><span>]0sH)#:m</span></span>“, which obviously isn't a common word and isn't likely to be in any of our dictionaries (aka Google). This can, however, be broken using a brute force attack.

There are two ways to reduce the possibility of a dictionary attack if someone gets your password hash. The first is to use a very complex password with letters and numbers and make it as long as possible to remember. The second is to have the application which stores the password use a salt. Basically, if your salt is “this-is-a-very-long-salt”, and you encrypt “mypa55w0rd-this-is-a-very-long-salt”, the odds of this being in the dictionary drop to zero.

Brute Force Attacks

To do a brute force attack, you simply encrypt every single string combination possible until you find the password. Basically, you run MD5(a), MD5(b), … MD5 (zzzzzzzz) until one of the hashes that is generated equals the password. This method of attack will take an extremely long time, depending on the possible characters that can be used (letters, numbers, symbols, etc.) and the length of the password. Add one more character length or symbol, and the amount of time required is exponentially larger (although the Playstation 3 is supposed to do a nice job). You could run through all of the encryption possibilities and store the data for a very accurate dictionary, however storage requirements become huge (each MD5 digest is 32 bytes, each SHA1 is 40).

The best way to prevent brute force attacks is to use as many character types as possible. Most password brute forcers don't use symbols by default, and if they do, attacks will take a long time to execute. Using a long salt with symbols in it will make brute force attacks nearly impossible, unless they have your salt string.

Different Algorithms

Storing passwords as direct md5 hashes is not recommended. MD5 is an older algorithm, the hash-pool is smaller (32 characters, but not all combination's are possible). The use of SHA1 is a lot better. It is a newer algorithm, supposedly has less collisions, and has a larger pool (40 characters and a higher percent of possible combination's).

Depending on your environment, you may be limited to using MD5 and SHA1 hashing algorithms. Some other common encryption include AES and DES (which are reversible using a key, MySQL Users check this out).

Best Practice

Application Developers

As a best practice for password encryption, enforce the following rules in your systems:

Require users to use letters, numbers, and symbols in their passwords
Salt your passwords using your application
Use SHA1 or something better instead of MD5
Do not use reversible encryption, only use hashes

Some things not to do:

Do not enforce password expiration (otherwise users will use simpler and simpler passwords)
Never, ever store plaintext passwords!

Make sure you use a commonly found algorithm. If you use an obscure algorithm, A) an attacker can usually deduce your algorithm from the hashes and B) you may not be able to implement it in a new system when you decide to scale your application.

Never use reversible excryption for your passwords, even if you have no concern about damage control if someone compromises your user database (e.g. for silly web applications). Users will commonly use one password for everything, and it is not unheard of for attackers to compromise a user database, then brute force paypal credentials using your user table of emails and passwords as a dictionary.

End Users

Use letters, numbers, and symbols, and make the password as long as possible. I know you really want to use your cats name as a password, but just do a quick google search for SHA1(snowball) : cbf41f5b461cea4e1e261d2918d5334bee8c6a06, see we have six dictionaries worth of results. besides, if you use a password at least ten times you're going to memorize it.

Try to use a different password for everything, in case someone steals a password for one of the websites you use.

Never use a password/email combination for a website, which uses the same password for your email account. There are a lot of shady application developers out there who will login to your email accounts using your password (ever wonder why MySpace accounts were being hacked all the time?)

Tags: #security

Thomas has contributed to dozens of enterprise Node.js services and has worked for a company dedicated to securing Node.js. He has spoken at several conferences on Node.js and JavaScript and is an O'Reilly published author.