One friends called me yesterday that he can’t believe what’s happening to him: he is saving various strings to table and to check if the string already exists he is generating hashes using SHA2_256 algorithm. But today he has seen that for the same string stored in two different rows SHA2_256 is giving him different hash values. He is thinking there must be some “magic” around this particular string.
Simple demonstration of the situatio:
DECLARE @String1 NVARCHAR(200)
DECLARE @String2 NVARCHAR(200)
SET @String1 = '<tsql_stack><frame database_name = ''Custom_PL''/></tsql_stack>'
SET @String2 = '<tsql_stack><frame database_name = ''Custom_Pl''/></tsql_stack>'
SELECT @String1 AS StringValue
SELECT HASHBYTES('SHA2_256', @String1) HashValue
SELECT HASHBYTES('SHA2_256', @String2)
We have declared two string variables and set value of both to the “magical” string. Then we are comparing these two strings using UNION operator and the result is as expected: only one row is returned. But when we do the same with hashes we can see two different values instead of single one like with previous UNION:
What’s going wrong?
Oki. Let’s stop to joke. I will give you small help:
SELECT CONVERT (varchar, DATABASEPROPERTYEX(DB_NAME(),'collation')) Collation
Any idea? I have one: don’t trust to what you see: double check our strings:
Do you see it? There is the same letter “L” one times as capital and other times lower case.
If you will check the collation then it’s obvious: CI = case insensitive which means that UNION statement will compare these two strings as equal. Bit this isn’t true for SHA2_256 algorithm which is binary function knowing zero about case sensitivity.
Don’t trust to what you see:)