How to Use NOT IN With Subquery in PostgreSQL
-
Use the
NOT IN
Operator With Subquery in PostgreSQL -
Use the
NOT EXISTS
Operator as a Better Alternative
The NOT
in the NOT IN
reverses what would result from simply using the IN
operator. The right-hand side of the NOT IN
operator has to have a subquery where more than one column is returned to check if the expression matches the data or not.
NOT IN
tends to return true if the expression is not found in the subquery data returned.
Let us go ahead and try to understand some common problems and issues faced by users when using the standard SQL NOT IN
in PostgreSQL.
Use the NOT IN
Operator With Subquery in PostgreSQL
In PostgreSQL, if you use NOT IN
to ensure that none of your expressions matches a particular set of data, you must ensure no NULL value is in the subquery data returned.
What does that mean exactly?
Let’s go ahead and try to understand this with the help of an example. We will be creating a table HORSE
with two columns; ID
and Color
.
CREATE TABLE horse (
ID int PRIMARY KEY,
Colour TEXT
);
Now let us insert some values as well.
INSERT INTO horse values (1, 'black'), (2, 'brown'), (3, 'white');
Let us go ahead and CREATE
another table for RIDER
.
CREATE TABLE rider (
ID int PRIMARY KEY,
horse_id int
);
You can run any of the codes given above in PGADMIN
and PSQL
.
Let us insert some values in both tables:
INSERT INTO horse values (1, 'black'), (2, 'brown'), (3, 'white');
INSERT INTO rider values (1, 1), (2, 2), (3, 4)
Here, you can see that the id 3
in the RIDER
has the horse 4
, which does not exist in the HORSE
table. We have done this to ensure the use of NOT IN
in our example.
Suppose we want to trim out this RIDER 3
from our RIDER table. How do we do that?
select * from rider
where horse_id not in (select id from horse)
Output:
Now, what if, instead of the clean values in the HORSE
table, we also had some nulls. Let us modify our INSERT
statement for the HORSE
table.
INSERT INTO horse values (1, 'black'), (2, 'brown'), (3, 'white'), (NULL, NULL);
On a side note, when INSERTING NULL values in the PRIMARY KEY
column, remove the PRIMARY KEY
constraint from the table to allow NULL insertion.
Here, our NULL HORSE
has a NULL Color
, so when we run the query for NOT IN
as above, we get the following:
Output:
So what just happened? Wasn’t it supposed to return the ID 4
as that doesn’t exist in the HORSE
table?
Well, let’s understand how the NOT IN
works. The NOT IN
operator works using the AND
operator. If all the rows to be searched return true, it will return true.
So something like this would substitute the NOT IN
:
NOT IN (ROW 1) AND NOT IN (ROW 2) AND NOT IN (ROW 3) .....
In our case, where the NOT IN
returns true for all the first three data sets, it won’t return any value for the NULL column as the PostgreSQL documentation quotes:
If all the per-row results are either unequal or null, with at least one null, then the result of `NOT IN` is null
Having a NULL return from the NOT IN
would render all other true’s false, so our table returns nothing.
How do we solve this?
The first methodology would be to prevent any NULL insertions into the table. Still, that is useless if we already have tables created in the database and now want to run queries to get data.
Hence, we have to look at other solutions to solve this effectively.
Use the NOT EXISTS
Operator as a Better Alternative
Let’s use the following query:
select horse_id from rider r
where not exists
(select id from horse h
where h.id = r.horse_id)
This tends to finally return the value 4
to us even though nulls are present and is an effective strategy. Let’s see how it works.
The EXISTS
clause returns true if the subquery returns anything, meaning any single row, and false otherwise. So when we want to find the HORSE
that is missing, we tend to return the values from the HORSE
table equal to the IDs
in the RIDER
table.
The subquery returns more than one row, and EXISTS
becomes true, making NOT EXISTS FALSE
.
Eventually, our final query selects the HORSE_ID
from the RIDER
that is not equal to the IDs
provided from the HORSE
table. In our case, that is 4
, and thus our query works perfectly.
However, NOT EXISTS
has caused performance losses when used against the NOT IN
operator.
select horse_id from rider
full join horse on rider.horse_id = horse.ID
where horse.ID is null
Output:
So it returns a NULL and our value that is not found; 4
. How?
When we do a full join on the condition that both IDs
are similar, it also tends to return the rows that aren’t similar. It will return the set (NULL, NULL)
from HORSE
and (3, 4)
from RIDER
as they both are unmatched.
Hence we can use this to our benefit and write the NULL condition at the end to return these unmatched rows.
When we write that the Horse.ID is NULL
, it will select the HORSE_ID
from the rider that is NULL. In this case, the first set (NULL, NULL)
is included; so is the set (3,4)
. Why?
This set does not contain a NULL but is also unmatched. So our FULL JOIN
also sets NULL values for this in its returning table.
Hence, we get this value output to us as a result.
Today we looked at implementing the NOT IN
operator for values that contain NULL. We hope that this helps you and expands your knowledge base.
Hello, I am Bilal, a research enthusiast who tends to break and make code from scratch. I dwell deep into the latest issues faced by the developer community and provide answers and different solutions. Apart from that, I am just another normal developer with a laptop, a mug of coffee, some biscuits and a thick spectacle!
GitHub