How to Insert on Duplicate Update in PostgreSQL
-
the
ON CONFLICT
Clause in PostgreSQL -
Use Either a Custom Made
Trigger
orFunction
to Implement Upsert in PostgreSQL - Use Sequential Statements (Less - Recommended) to Implement Upsert in PostgreSQL
-
the
COMMON TABLE EXPRESSIONS (CTE)
to ImplementUPSERT
in PostgreSQL
This article will introduce the INSERT ON DUPLICATE UPDATE
in PostgreSQL. What does INSERT ON DUPLICATE UPDATE
mean? And why is it used?
Whenever you INSERT
records into a table, you add a data set with a unique PRIMARY KEY
. It could be an ID or an auto-generated number that is different for each data set.
Let’s suppose you insert a record with a PRIMARY KEY
that conflicts with a data set already present with that PRIMARY KEY
.
In that case, you have two options. Either update that row with your new data set or leave the update and keep the original.
What you tend to choose depends on how you want to sort your data and keep it in your database.
the ON CONFLICT
Clause in PostgreSQL
Upsert
, the short term for UPDATE ON INSERT VIOLATION
, can be implemented in PostgreSQL in the following query. First, let’s suppose we make a table for CAR
and insert a row.
create table car(
id int PRIMARY KEY,
owner TEXT
);
insert into car values (1, 'John');
Now let’s go ahead and try inserting a value (1, 'Mark')
into this table. What do you think will happen?
The first thing that will happen is that there will be a VIOLATION
error like the one given below.
Output:
ERROR: duplicate key value violates unique constraint "car_pkey" DETAIL: Key (id)=(1) already exists.
Now, do you understand why the PRIMARY KEY
violation happens? Because Mark
also has key 1
already present for John
.
To avoid this, we’ll use the ON CONFLICT
clause.
insert into car values (1, 'Mark')
on conflict (id) do update
set id = excluded.id,
owner = excluded.owner;
Now, whenever our query finds a conflict, it will UPDATE
the row present in the table for that conflicting PRIMARY KEY
and set the ID
and the OWNER
of the pre-existing data to these new keys.
But, what is that EXCLUDED
table after all? The EXCLUDED
table represents the rows proposed for insertion, as specified by the PostgreSQL documentation.
Thus, we use the EXCLUDED
table to get the rows pushed from the INSERT
query and then use them.
Suppose you don’t want to do anything about the violation and go about keeping the original. You are better off using this syntax.
insert into car values (1, 'Mark')
on conflict (id) do nothing;
The DO NOTHING
ignores any changes possible.
To perform these same queries in psql
, use the above statements but capitalize the KEYWORDS
. Never write INSERT
as insert
, or you’ll run into an error.
Use Either a Custom Made Trigger
or Function
to Implement Upsert in PostgreSQL
Here, we will explain how to make a FUNCTION
that will fire off whenever INSERT
or UPDATE
is called for implementing UPSERT
. Similarly, you can make a TRIGGER
by changing the return type to TRIGGER
.
create or replace function upsert_imp (idt int, ownert TEXT) returns void as
$$
Begin
loop
update car set owner = ownert where id = idt;
if found then
return;
end if;
begin
insert into car values(ownert, idt);
return;
end;
end loop;
end;
$$ language plpgsql;
The function is pretty simple, with a loop that runs for checking the ID
for each row; if it matches the parameter given, then it returns the table after UPDATE
; else, it INSERTS
.
You can call the above like this.
select * from upsert_imp(1, 'Mark');
While making a TRIGGER
, make sure to replace LOOP
with a FOR
or a valid IF
check so that it does not spin indefinitely in violation of a condition.
Use Sequential Statements (Less - Recommended) to Implement Upsert in PostgreSQL
You can use the UPDATE
call but combine it with the following INSERT
statement to work.
insert into car values(4, 'Toyota Supra') on conflict do nothing;
You may even use a NOT EXISTS
clause, but this easy statement works fine. It will either INSERT
the row if there is no duplication or skip entirely.
insert into car select 2, 'Toyota Supra' where not exists (Select 1 from CAR where id=2);
However, there’s a condition. In many systems, there could be a RACE
condition.
If someone DELETES
a row while you INSERT
a row, your row will be lost. For that, you can use a TRANSACTION
.
Enclose the INSERT
statement with a BEGIN
and COMMIT
to ensure it is now a TRANSACTION
.
begin;
insert into car select 2, 'Toyota Supra' where not exists (Select 1 from CAR where id=2);
commit;
However, PostgreSQL already has implicit BEGIN
and COMMIT
added to each statement, so there is no need for explicit declarations.
the COMMON TABLE EXPRESSIONS (CTE)
to Implement UPSERT
in PostgreSQL
First of all, what are COMMON TABLE EXPRESSIONS
?
The CTE
works as a temporary table inside a QUERY
to store values that may be used later on. It behaves like the NEW
table used in TRIGGER
.
The query can be written as follows. We first define WITH
, which creates a temporary table with the values 2, Jackson
and 5, Maria
.
Then these values are passed into the UPSERT
query where it creates NV
from the NEW_VALUES
table, and if they turn out to be already present in the CAR
table, it updates the values accordingly.
WITH new_values (id, owner) as (
values
(2, 'Jackson'),
(5, 'Maria')
),
upsert as
(
update car
set id = nv.id,
owner = nv.owner
FROM new_values nv
WHERE car.id = nv.id
RETURNING car.*
)
This returns a CAR.*
, meaning all the rows from the CAR
table back.
To call the above:
INSERT INTO car (id, owner)
SELECT id, owner
FROM new_values
WHERE NOT EXISTS (SELECT 1
FROM upsert up
WHERE up.id = new_values.id);
Here, we check if the table created in the UPSERT
when it returns the CAR.*
contains the value already; if it doesn’t, the INSERT INTO
works. However, if it does, the UPSERT
already handles the modification inside.
a Note on ON CONFLICT DO UPDATE
in PostgreSQL
As with each solution, some problems tend to interfere either with performance or space or the coding efficiency in general.
In our queries, whenever we call the ON CONFLICT DO UPDATE
, we can assume that behind this, in our system, there can be multiple users accessing the database and issuing the commands at the same time.
So if USER 1
issues INSERT DO UPDATE
simultaneously as USER 2
, there will be a VIOLATION
error again. Why?
Because when USER 2
calls the INSERT
, it does not know whether the row exists or not at that time; hence it comes into a conflict. Both users issue the queries simultaneously, and the database runs them concurrently.
This sort of issue is called a RACE CONDITION
, and you can read up on them later if you like.
Along with the RACE CONDITION
come different methods to solve this issue. People argue that a TRANSACTION
could be a possible answer.
But even if the TRANSACTION
guarantees separating the query, it won’t ensure that the query is safe from VIOLATION
. Why?
If USER 1
TRANSACTS
is its query, you can’t guarantee that once the TRANSACTION
runs, it will lock the query and won’t let the others proceed. Because TRANSACTIONS
only prevent the changes from being shown to other users and just the option of abandoning them and rolling back.
Even if you use SERIAL TRANSACTIONS
, which claims to run QUERIES
in order, there can be errors such as the one below.
commit failed: ERROR: could not serialize access due to read/write dependencies among transactions
Hence, you are better off using ADVISORY LOCKS
and SAVE POINTS
. ADVISORY LOCKS
tend to lock, preventing your queries from interfering and working effectively.
Even though they might reproduce errors, but still have a very low tendency to do so.
SAVE POINTS
, on the other hand, are contained inside TRANSACTIONS
. You can use these SAVEPOINTS
to revert to your changes whenever you want to abandon a change during a TRANSACTION
and rollback.
So in case an INSERT
fails, for example, you just don’t run past the error, you can now roll back to the SAVE POINT
, perform the UPDATE
, and then go about working as usual.
We hope you’re now clear on the basics of UPSERT
and have an in-depth understanding of how it works and the possible CONSTRAINTS
.
Our team works effectively to deliver the best of both worlds to you in terms of knowledge and efficient coding algorithms. We believe that the better you learn the nooks and crannies of something, the more effective you will be in delivering solutions and working around them.
Hello, I am Bilal, a research enthusiast who tends to break and make code from scratch. I dwell deep into the latest issues faced by the developer community and provide answers and different solutions. Apart from that, I am just another normal developer with a laptop, a mug of coffee, some biscuits and a thick spectacle!
GitHub