Saturday, February 25, 2012

MAC used as search index for Encrypted data: how secure?

I have read recommendations about searching encrypted data. Typically, they involve creating a MAC (message authentication code) table. One of the elements of that table is a HASH of the encrypted data (plus a Mac key) that is used as an index for searching. Is that HASH as secure as the encrypted data itself, or is this approach less secure? If it is less secure, then may I assume that approach is the only feasible way to search data encrypted by nondeterministic algorithms?

TIA,

Barkingdog

Is the hash as secure as the encrypted data? This is a difficult question, because the answer depends on what you are trying to secure against, and what algorithms you are using, and what vulnerabilities they develop over time. One thing that a hash would disclose, and that encryption would normally not disclose, is data identity. That is, you can encrypt the same piece of data and you won't be able to determine from the resulting blobs whether they are corresponding to identical data, but if you hash the data, the result will be identical.

Strictly speaking, the only way to search encrypted data (and I mean non-deterministic, although I should not need to specify this because encryption is always intended to be a non-deterministic operation) is to decrypt all data and search through the decrypted text. Using hashes is a workaround that allows you to search hashes instead of the encrypted data, to answer an equality search. You can probably devise other alternative search schemes, depending on what searches you want to allow and how much information you are willing to give away.

Thanks
Laurentiu

|||

>>>Using hashes is a workaround

Yes, I agree with that. I'm just concerned that the creation of hashes (out of practical necessity) results in a workable solution because the hash codes are easier to "crack" than the encrypted data itself. Maybe "cracking" the hash can lead to cracking the encrypted data?

Barkingdog.

|||

Strictly speaking, yes it is possible to user a rainbow attack against a pure hash on the data we are trying to protect, especially if the domain of the plaintext is finite and well defined (i.e. SSN, CCN, etc.). For example, I can easily create an offline rainbow table with all possible hashes (SHA1, MD5, etc.) for every possible SSN, once I find your hashed value I just need to find the corresponding value on the rainbow table.

This is the reason why I was suggesting using an HMAC, that way the domain of the plaintext is different and creating a dictionary is far more expensive that the example above. You are still giving away some information to the potential attacker, but the difference is that now there is a random bag of bits of arbitrary length (the HMAC key) combined with the data we want to protect, that way the hash input does not belong to the same domain as the original plaintext.

Assuming that the key is truly random, large enough and well protected, the previous attack is rendered useless, and to the best of our knowledge the attacker will need to brute force all possibilities (i.e. hash( ‘111-111-111’ + 0x00…001 ), hash( ‘111-111-111’ + 0x00…002 ) … hash( ‘999-999-999’ + 0xFF…FFF ), as you can see this is far more expensive, and the larger the key, the better.

Is this better or worse than trying to brute force a symmetric key? That is a really difficult question as Laurentiu mentioned. Hashes are less expensive (computational wise) than decryptions, but there are too many factors to consider (domain of your plaintext, domain of the key, reusing keys, new attack methods being discovered, etc.).

The best option if you want to create an index over the encrypted data may be to create a completely new identifier for it, i.e. a customer ID that is completely unrelated to the data being protected, but we understand this option is not always possible due to business reasons.

I would recommend defining what assets you are trying to protect and against what kind of threats. From there you need to evaluate what options you have to protect against such threats, as well as what mechanisms (preferably use defense in depth) you have to prevent, detect and halt a possible attack, as well as what are the steps to follow after the situation is back under control.

Thanks a lot,

-Raul Garcia

SDE/T

SQL Server Engine

No comments:

Post a Comment