In this article, we will learn about UNION and UNION all and comparison between both of them in the context of performance, behavior, and syntax.
What is UNION Operator
Union operator in SQL Server allows to combine the result sets generated by multiple select queries. It requires only similar column set or columns with similar datatypes in all the select statements. It is not required to have any logical relationship between any of select statements or any data. Specially it generates final result set with unique rows collection. In the end after generating the unique row set it also does sort operation.
Here we have two tables, lets run separate select query on both the tables.
We have Stephen Watson in common in both.
As we can see here that the result of both the select statements is combined and the duplicate records are removed.
What is UNION ALL Operator
This operator in SQL Server does the same job. It is used between multiple select statements having similar column structure. It combines all results generated by multiple select statements but it does not remove the duplicate rows between all the results. Therefor, result generated by this will have more rows than UNION.
Lets execute the UNION ALL statement on the same two select statements
We can see that result of both the select statements is combined, but number of rows generated by this statement are more than UNION operator. As mentioned above, UNION ALL did not remove the duplicate rows between result sets.
Importance of Sequence of Data Type
It is very important to maintain the proper sequence of columns between all the select statements used with UNION operators.
Because, it does not see any logical relations between all the result sets but it just combines the result sets based on the similar column structure in select statements.
We have to take care of 3 things while using it
- Maintain the sequence of actual columns
- The same number of columns in all the statements
- and The sequence of datatypes between all the select statements
There is one difference between both these images, Merge Join operation added in execution.
As we discussed above that UNION operator does distinct operation to remove duplicate rows and also it does the sort operation over unique data. This Distinct Sort (Merge Join in above pic) operation costs the 46% of select statement which is very big load. Here you can see the execution plan for both. UNION has extra (Merge Join)operation in between, which is having more load percentage.
UNION With ORDER BY
We can also use ORDER BY along with select statement along with UNION operator. But, keep in mind that ORDER BY must be used in the end of last select statement.
As we know UNION already does the sorting by default but also you can do sorting on other columns as per your requirement. Here is the example of ORDER BY operator.
Difference between UNION and JOIN
UNION combines the result-set of two or more select queries into a single result-set which includes all the rows from all the queries in the union, where as JOINS, retrieve data from two or more tables based on logical relationships between the tables.
In short, it combines rows from 2 or more tables, where JOIN combines columns from 2 or more table with some logical relationship.
You can also read importance of ROW_NUMBER function to find duplicate rows