Frank talks about Storing Passwords in MySQL. He does, however, miss something that’s really, really important. I’m talking about the salting of passwords.
If I want to find out what 5d41402abc4b2a76b9719d911017c592 or 015f28b9df1bdd36427dd976fb73b29d MD5s mean, the first thing I’m going to try is a dictionary attack (especially if i’ve seen a table with only user and password columns). Guess what? A list of words and their MD5SUMS can be used to very quickly find what these hashes represent.
I’ll probably have this dictionary in a MySQL database with an index as well. Try it yourself – you’ll probably find a dictionary with the words “hello” and “fire” in it to help. In fact, do this:
mysql> create table words (word varchar(100));
Query OK, 0 rows affected (0.13 sec)
mysql> load data local infile ‘/usr/share/dict/words’ into table words;
Query OK, 98326 rows affected (0.85 sec)
Records: 98326Â Deleted: 0Â Skipped: 0Â Warnings: 0
mysql> alter table words add column md5hash char(32);
Query OK, 98326 rows affected (0.39 sec)
Records: 98326Â Duplicates: 0Â Warnings: 0
mysql> update words set md5hash=md5(word);
Query OK, 98326 rows affected (3.19 sec)
Rows matched: 98326Â Changed: 98326Â Warnings: 0
mysql> alter table words add index md5_idx (md5hash);
Query OK, 98326 rows affected (2.86 sec)
Records: 98326Â Duplicates: 0Â Warnings: 0
mysql> select * from words where md5hash=’5d41402abc4b2a76b9719d911017c592′;
+——-+———————————-+
| word | md5hash                         |
+——-+———————————-+
| hello | 5d41402abc4b2a76b9719d911017c592 |
+——-+———————————-+
1 row in set (0.11 sec)
mysql> select * from words where md5hash=’015f28b9df1bdd36427dd976fb73b29d’;
+——+———————————-+
| word | md5hash                         |
+——+———————————-+
| fire | 015f28b9df1bdd36427dd976fb73b29d |
+——+———————————-+
1 row in set (0.00 sec)
$EXCLAMATION I hear you go.
Yes, this is not a good way to “secure” passwords. Oddly enough, people have known about this for a long time and there’s a real easy solution. It’s called salting.
Salting is prepending a random string to the start of the password when you store it (and when you check it).
So, let’s look at how our new password table may look:
mysql> select * from passwords;
+——+——–+———————————-+
| user | salt  | md5pass                         |
+——+——–+———————————-+
| u1Â Â | ntuk24 | ce6ac665c753714cb3df2aa525943a12 |
| u2Â Â | drc,3Â | 7f573abbb9e086ccc4a85d8b66731ac8 |
+——+——–+———————————-+
2 rows in set (0.00 sec)
As you can see, the MD5s are different than before. If we search these up in our dictionary, we won’t find a match.
mysql> select * from words where md5hash=’ce6ac665c753714cb3df2aa525943a12′;
Empty set (0.01 sec)
instead, we’d have to get the salt and do an md5 of the salt and the dictionary word and see if the md5 matches. Guess what, no index for that! and with all the possible values for salt, we’ve substantially increased the problem space to construct a dictionary (i won’t go into the maths here).
mysql> create view v as select word, md5(CONCAT(‘ntuk24′,word)) as salted from words;
Query OK, 0 rows affected (0.05 sec)
mysql> select * from v where salted=’ce6ac665c753714cb3df2aa525943a12’;
+——-+———————————-+
| word | salted                          |
+——-+———————————-+
| hello | ce6ac665c753714cb3df2aa525943a12 |
+——-+———————————-+
1 row in set (2.04 sec)
mysql> create or replace view v as select word, md5(CONCAT(‘drc,3′,word)) as salted from words;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from v where salted=’7f573abbb9e086ccc4a85d8b66731ac8’; +——+———————————-+
| word | salted                          |
+——+———————————-+
| fire | 7f573abbb9e086ccc4a85d8b66731ac8 |
+——+———————————-+
1 row in set (2.12 sec)
So we’ve gone from essentially instantaneous retreival, to now taking about 2 seconds. Even if I assume that one of your users is going to be stupid enough to have a dictionary password, It’s going to take me 2 seconds to check each user – as the salt is different for each user! So it could take me hours just to find that user. Think about how many users are in your user table – with 1000 users, it’s over 1/2hr. For larger systems, it’s going to be hours.