exists optimizations

by mark wilkinson, 2015-02-24 estimated read time: 4-5 minutes

tag(s): performance, development, m82labs archive

If you've done any amount of SQL development, you've probably seen an EXISTS clause. It might have been part of an IF statement, or a WHERE clause, but I'm sure you've seen it, and maybe used it yourself. One thing you'll notice is that all developers have their own way of writing it. Some developers swear by ... EXISTS( SELECT 1 ..., another might use ... EXISTS( SELECT TOP(1) * .... So what's the deal? Which method is best?

The Setup

First things first, lets see some of the different variations of the EXISTS clause, then we'll take a look at the execution plans and see what's going on behind the scenes.

For these demo scripts we'll be using the AdventureWorks database

Example 1: SELECT *

SELECT
    a.FirstName,
    a.LastName
FROM Person.Person AS a
WHERE EXISTS (
    SELECT  *
    FROM    HumanResources.Employee AS b
    WHERE   a.BusinessEntityID = b.BusinessEntityID
            AND a.LastName = 'Johnson'
    );

Example 2: SELECT TOP(1) *

SELECT
    a.FirstName,
    a.LastName
FROM Person.Person AS a
WHERE EXISTS (
    SELECT  TOP(1) *
    FROM    HumanResources.Employee AS b
    WHERE   a.BusinessEntityID = b.BusinessEntityID
            AND a.LastName = 'Johnson'
    );

Example 3: SELECT TOP(1) with defined column

SELECT
    a.FirstName,
    a.LastName
FROM Person.Person AS a
WHERE EXISTS (
    SELECT  TOP(1) b.BusinessEntityID
    FROM    HumanResources.Employee AS b
    WHERE   a.BusinessEntityID = b.BusinessEntityID
            AND a.LastName = 'Johnson'
    );

Example 4: SELECT 1

SELECT
    a.FirstName,
    a.LastName
FROM Person.Person AS a
WHERE EXISTS (
    SELECT  1
    FROM    HumanResources.Employee AS b
    WHERE   a.BusinessEntityID = b.BusinessEntityID
            AND a.LastName = 'Johnson'
    );

I'm sure there are more examples, but this will be enough for our purposes here. Let's take a look at the plans and io stats for these queries now and see how each performed.

The Plans

Below is the plan for the first query in our list. Nothing too exciting here:

Query Plan

IO Stats

Scan Count	Logical Reads	Physical Reads
2	6	0

The interesting thing happens when we take a look at the other query plans. Because of optimizations to the EXISTS clause, all 4 of these queries have an identical execution plan (other that the statement text included in the plan XML of course). From these tests it would seem that the optimizer doesn't care what you select, it's just looking at the query predicates to determine if something exists.

That sounds fine and good, but what if we make things a little more complicated? Here we are going to create a function that does some pointless work:

ALTER FUNCTION fnWaitForSomeTime( @num INT )
RETURNS INT
AS
BEGIN
    DECLARE @count_num INT;
    DECLARE @loop_i INT = 0;

    WHILE ( @loop_i < @num )
    BEGIN
        SET @count_num = (SELECT COUNT(*) FROM sys.columns);
        SET @loop_i += 1;
    END
    RETURN ABS(CHECKSUM(@count_num))
END

To test this function I ran the following:

SELECT TOP(1000)
    dbo.fnWaitForSomeTime(10000)
FROM
    HumanResources.Employee AS b

On my system this ran for 60 seconds before I stopped it. So what happens if we add this to our EXISTS? What if we also add a CPU heavy ORDER BY?

SELECT
    a.FirstName,
    a.LastName
FROM Person.Person AS a
WHERE EXISTS (
    SELECT TOP(1000)
        dbo.fnWaitForSomeTime(10000)
    FROM
        HumanResources.Employee AS b
    WHERE
        a.BusinessEntityID = b.BusinessEntityID
        AND a.LastName = 'Johnson'
    ORDER BY
        HASHBYTES('MD5',CAST(CHECKSUM(NEWID()) AS varbinary))
    );

Lets see the plan:

Query Plan

Notice anything familiar? Again we have the exact same plan. If you run this script yourself you'll see that it also takes the same amount of time to execute as the rest.

To take this a little further, lets try something we know shouldn't work (Thanks Kenneth Fischer for this great example!):

SELECT
    a.FirstName,
    a.LastName
FROM Person.Person AS a
WHERE EXISTS (
    SELECT  1/0 -- <---- What?!
    FROM    HumanResources.Employee AS b
    WHERE   a.BusinessEntityID = b.BusinessEntityID
            AND a.LastName = 'Johnson'
    );

Again, same plan, same execution time, and no error. This example just further illustrates that the optimizer doesn't care what you are selecting in an EXISTS clause, it only cares about the predicate.

Conclusion

We looked at a few variations of the EXISTS clause, and then saw how SQL Server handles them. Due to some nice optimizations SQL seems to do the least amount of work possible to check if something exists. This means that it is really up to you and your team to decide which is the most readable.

Personally I like to avoid * whenever I can, even if it doesn't cause a performance hit, so I tend to use the syntax in example 4 ... EXISTS( SELECT 1 ....