How to Create a Pivot Table in PostgreSQL
A pivot table tends to aggregate some values given in an original table already, used to summarize large data flows. In PostgreSQL, it is a table returned with custom N
value columns, which is the data type of the row summarized in our result.
Today we will be learning how to create a pivot table in PostgreSQL using a predefined table with existing values.
Use crosstab()
to Create a Pivot Table in PostgreSQL
You can find crosstab()
under the TABLE_FUNC
heading in the PostgreSQL documentation. It is a function that returns a table with multiple rows.
Syntax:
crosstab ( sql text ) ? setof record
This query produces a pivot table containing row names plus N
value columns, where N
is determined by the row type specified in the calling query.
crosstabN ( sql text ) , setof table_crosstab_N
This query produces a pivot table containing row names plus N
value columns. crosstab2
, crosstab3
, and crosstab4
are predefined.
Now, let’s see how we can use it. We will create a simple table called APARTMENT
with columns; ID
, UNIT
, PRICE
, AREA
.
Example:
CREATE TABLE apartment (
ID int PRIMARY KEY,
UNIT int,
PRICE int,
AREA int
)
Now, let’s add a few values to our table:
INSERT INTO apartment VALUES (1, 20, 200, 10) , (2, 20, 200, 9), (3, 50, 190, 8);
So, if we look at our table now, it would be something as follows.
Output:
id unit price area
1 20 200 10
2 20 200 9
3 50 190 8
And in our pivot table, this table would be represented as:
unit ..180 190 200 210...
20 - - 19/2 = 9.5 -
50 - 8 - -
So here, we are taking the average of the AREA
for each UNIT
at their specific PRICES
. We want to see the average AREA
that we get on the PRICE
mentioned.
So if you find the average AREA
for a PRICE
of 200
for a UNIT
of 20
, you’ll get the average as (10 + 9) / 2 = 8
. In this way, you find the pivot table with the aggregates.
Now, you have understood the concept, so let us go ahead and implement it. We can write a query as follows:
Select UNIT, PRICE, avg(area)
from apartment
group by UNIT, PRICE
This query will implement the basic functionality of what we want. It uses the GROUP BY
clause to take those columns in the table against which we wish to aggregate.
We wish to find the average of the AREA
for a specific UNIT
and PRICE
; hence, we group using these two columns. A table is returned as follows.
Output:
unit price avg
50 190 8.0000000000000000
20 200 9.5000000000000000
Now, this table works properly as well, but let’s go ahead and use the crosstab()
. You have to look at how it works and the constraints.
To use crosstab()
, we have to ensure two crucial points:
crosstab()
needs to have distinct values for each row.crosstab()
needs to have the same data types for each column.
So, make sure that no column has a different data type. Now, to pivot our results from this GROUP BY
query, let’s go ahead and write something as follows:
select *
from crosstab
(
'Select UNIT::float, PRICE::float, avg(AREA)::float
from apartment
group by UNIT, PRICE') as ct(
UNIT float,
avge float
);
There are some critical points to consider here. We select the columns from the crosstab
returned as the result table CT
as defined by us with two columns: UNIT
and avge
.
After grouping, we get the columns from the APARTMENT
and find the average. This query table is then pivoted for our final result.
Also, notice that we ensure to cast each column to the same data type. Of course, an average will either be a double-precision or float.
Hence, it is better to cast the int columns to float, even if unnecessary. If you run the query without casting, it will return an error as below.
Output:
ERROR: return and sql tuple descriptions are incompatible
SQL state: 42601
Hence to avoid this, we make sure to cast to float or create a new table with values inserted into it with the type float. The difference in data types is what makes the tuple incompatible.
Output:
unit avg
50 8
20 9.5
In other cases, there is no need to use crosstab()
. Even using the GROUP BY
is effective as it returns us a table with unique values for getting an average.
Without using the crosstab()
function, we still would have gotten a result as follows:
unit price avg
50 190 8
20 200 9.5
You can notice in crosstab()
that using a return list with three columns as we would want still returns us just values that contain the UNIT
and average of the area
. Why?
The SQL statement takes a set of categories and values, and in our case, the category becomes the UNIT
, and the values tend to be the average column. Hence, only two columns are kept in our return table.
We can use it now that we know how crosstab()
works for our needs. If we had UNITS
with different PRICES
and then took the AVG()
, we would have an N
number of columns, depending on our data.
Let’s change the values in our table to be as follows:
id unit price area
1 20 200 10
2 20 170 9
3 50 190 8
Now, running the crosstab()
query on this will return something as follows.
Output:
unit avge avge1
50 8 [NULL]
20 10 9
You can notice that changing the PRICE
tends to spread out the AREA
values into different averages. This happens because we also group our average by prices.
Same PRICE
values will have an average of different values if present for all of them, but other PRICE
values will only consist of averages of AREAS
respective to each PRICE
. Hence the table is sorted in this way.
Hello, I am Bilal, a research enthusiast who tends to break and make code from scratch. I dwell deep into the latest issues faced by the developer community and provide answers and different solutions. Apart from that, I am just another normal developer with a laptop, a mug of coffee, some biscuits and a thick spectacle!
GitHub