MS-SQL provides a MERGE command that lets you perform three different operations that depend on two tables. The three conditions are MATCHED, NOT MATCHED BY TARGET, and NOT MATCHED BY SOURCE.
The syntax is:
MERGE <target> USING <source> ON <condition> WHEN MATCHED THEN <operation> WHEN NOT MATCHED BY TARGET THEN <operation> WHEN NOT MATCHED BY SOURCE THEN <operation>
To be a good SQL coder requires that one understands the implications of the code that is written, and the best way to do this is to study execution plans. Also by trying to implement a MERGE without the MERGE command we can recreate the conditions that developers would have struggled with before its adoption.
Lets look at some setup code for our test:
IF OBJECT_ID('mTarget') IS NOT NULL DROP TABLE mTarget; IF OBJECT_ID('mSource') IS NOT NULL DROP TABLE mSource;
CREATE TABLE mTarget ( ID int, a nvarchar(10) );
CREATE TABLE mSource ( ID int, a nvarchar(10) );
INSERT INTO mTarget (ID, a) VALUES (0, 'apple'), (2, 'banana'), (3, 'orange'), (5, 'avacado');
INSERT INTO mSource (ID, a) VALUES (0, 'grape'), (1, 'peach'), (4, 'orange');
CREATE INDEX ndx_ID ON mTarget (ID); CREATE INDEX ndx_ID ON mSource (ID);
SET STATISTICS IO ON SET STATISTICS TIME ON
Let us check out the merge command to get a baseline of a sturdy execution plan:
MERGE <- compute scalar(0) <- table merge(44) <- compute scalar(0) <- compute scalar(0) <- assert(0) <- [C1] [C1] <- Sequence Project (Compute Scalar) (0) <- Segment (0) <- Filter (0) <- Sequence Project (Compute Scalar) (0) <- [C2] [C2] <- Segment (0) <- Sort (0) <- Compute Scalar (0) <- Compute Scalar (0) <- concatenation (0) <- [C3] [C3] <- [C4] <- Compute Scalar (0) <- [C5] [C4] Nested Loops (Left Outer Join) (0) <- Compute Scalar (0) <- Table Scan [mSource] [s] (7) <- Compute Scalar (0) <- Table Scan [mTarget] [t] (8) [C5] Nested Loops (Left Anti Semi Join) (0) <- Compute Scalar (0) <- Table Scan [mTarget] [t] (7) <- Top (0) <- Table Scan [mSource] [s] (8)
The execution plans can be confusing because they are not always a direct translation of your code because of optimization, however let us start by understanding [C5] and the Left Anti Semi Join. The Left Anti Semi Join has two inputs. This is the mSource and mTarget tables. The execution plan omits the condition which is mTarget.ID = mSource.ID. If the top input, which is mSource has no match in the bottom input mTarget then the row is returned.
We can get the same effect with:
SELECT ID, a FROM mSource WHERE NOT EXISTS ( SELECT TOP(1) ID FROM mTarget WHERE mTarget.ID = mSource.ID )
mTarget 0 apple 2 banana 3 orange 5 avacado
mSource 0 grape 1 peach 4 orange
The next major process is [C4].