One friend called me yesterday that he can’t believe what’s happening to him: he is saving various strings to the table and to check if the string already exists he is generating hashes using the SHA2_256 algorithm. But today he has seen that for the same string stored in two different rows SHA2_256 is giving him different hash values. He is thinking there must be some “magic” around this particular string.
A simple demonstration of the situation:
DECLARE @String1 NVARCHAR(200) DECLARE @String2 NVARCHAR(200) SET @String1 = '<tsql_stack><frame database_name = ''Custom_PL''/></tsql_stack>' SET @String2 = '<tsql_stack><frame database_name = ''Custom_Pl''/></tsql_stack>' SELECT @String1 AS StringValue UNION SELECT @String2 SELECT HASHBYTES('SHA2_256', @String1) HashValue UNION SELECT HASHBYTES('SHA2_256', @String2) GO
We have declared two string variables and set the value of both to the “magical” string. Then we are comparing these two strings using the UNION operator and the result is as expected: only one row is returned. But when we do the same with hashes we can see two different values instead of a single one like with the previous UNION:
What’s going wrong?
Oki. Let’s stop joking. I will give you small help:
SELECT CONVERT (varchar, DATABASEPROPERTYEX(DB_NAME(),'collation')) Collation
Any idea? I have one: don’t trust what you see: double-check our strings:
Do you see it? There is the same letter “L” one time as capital and other times the lower case.
If you will check the collation then it’s obvious: CI = case insensitive which means that the UNION statement will compare these two strings as equal. Bit this isn’t true for the SHA2_256 algorithm which is a binary function knowing zero about case sensitivity.
Don’t trust what you see:)