In this article we describe a situation where some care is required in writing the SQL to perform an outer aggregate SUM calculation around an inner union. This is a situation which we illustrate using Access, but applies equally well to SQL Server.
Consider performing a simple sum over a single table of data as shown below.
Then we use a very simple aggregate Sum function.
Giving the following correct answer.
Now introduce a second table as shown below.
We now want to now perform the aggregate over all records in both tables, so we use a nested query, where the inner part of the query is a UNION of the data in both tables.
Giving the following unexpected result!
In order to investigate this in more detail, it is worth viewing the result of running the inner part of the nested query without the aggregate.
To simplify things, let us start by looking at the union in more detail; this also acts as a nice example of how to break down a problem in SQL into simpler steps.
Which displays only 2 records.
Now we can see that the problem is that the default UNION clause has eliminated any duplicate records, although this is often a desirable behaviour, in this situation it is the source of our problems.
Changing the SQL to perform a UNION ALL including any duplicate records works as follows.
This gives the following result.
Now returning to our original aggregate in our nested query we can modify the SQL as follows :-
Which gives the following result.
In many circumstances the power of a UNION to exclude duplicate records is very useful, for example when bringing together a list of all customers from active and historical tables into a single list; in which case eliminating duplicates gives the desired results.
However, when working with aggregate functions such as a SUM over a UNION we normally need to ensure that a UNION ALL is used; in the above example it is only because some of the transaction dates and amounts are the same, that duplicates get removed. This is an example of a subtle problem which very much depends on the data values.
by Andrew Couch Office Access MVP, Director ASC Associates
andy@ascassociates.biz