SQL Server

Why it matters if COLLATE is used in WHERE

There is a lot of things we shouldn’t do or use in WHERE clause, i.e scalar functions or data types conversion. They can lead to a suboptimal execution plan and cause serious performance degradation. A similar situation exists for the COLLATE statement. If it is used in WHERE it prevents query optimizer to use Index Seek operator and it replaces it with Index Scan which is significantly slower on large tables and a small subset of rows to be selected.

Let’s demonstrate it.

We will prepare a table with sample values first. The table has [String] columns with Slovak_CI_AS collation.

-- create sample table
CREATE TABLE [dbo].[SampleTable]
(
    [Id] INT IDENTITY(1,1) PRIMARY KEY,
    [String] VARCHAR(200) COLLATE Slovak_CI_AS NULL
)
GO

-- populate test data
INSERT [dbo].[SampleTable]
    SELECT TOP 2000
        [o1].[name] + LEFT(CAST(NEWID() AS VARCHAR(36)), 6)
    FROM [sys].[objects] [o1]
        CROSS JOIN [sys].[objects] [o2]
GO

-- create simple index on search argument column
CREATE NONCLUSTERED INDEX [IX_String] ON [dbo].[SampleTable] ( [String] )
GO

We can run simple query and check its execution plan now:

SELECT *
FROM [dbo].[SampleTable]
WHERE [String] LIKE 'sysrow%'
GO

CollateAndExecutionPlan1

It has returned about 4 rows from a total of 2 thousand rows and Index Seek operator was selected by the optimizer.

Let’s modify the query and use COLLATE statement to convert it Czech_CI_AS collation first. This can be useful if we need to search for some language-specific letters like ř, š, or č.

-- execute query 2 and check execution plan
SELECT *
FROM dbo.SampleTable
WHERE String LIKE 'sysrow%' COLLATE Czech_CI_AS
GO

CollateAndExecutionPlan2

Index Seek operator was replaced by Index Scan operator which is much less effective for this particular query. You can see that it takes 75% from the total batch, 3x more than the one with Index Seek.

In the latest versions of SQL Server, there is an improvement in query plans explorer which will warn us about such a situation. There is and warning sign displayed for SELECT operator:

If we will move the mouse over it more detailed explanation pops up:

The warning is the same as for standard type conversion using CAST() or CONVERT() functions. When COLLATE is used, then SQL Server converts the data in columns to different collations internally.  That means that COLLATE should be avoided in WHERE clause and if this isn’t possible then its use should be rethought twice and performance impact considered.

Leave a Reply

Your email address will not be published. Required fields are marked *