PostgreSQL Distinct Count of Field Values
- Get the Distinct Count of Field Values in PostgreSQL
- Get the Distinct Count of Field Values Based on Another Field in PostgreSQL
Getting the distinct values in a field is an important query to know. This tutorial discusses how to get the distinct count of values in a field.
Get the Distinct Count of Field Values in PostgreSQL
Consider a quiz_score
table that keeps track of the score of each participant in a quiz game.
id | player_id | score |
---|---|---|
1 | 1 | 10 |
2 | 2 | 10 |
3 | 3 | 18 |
4 | 4 | 69 |
5 | 5 | 24 |
6 | 6 | 67 |
7 | 7 | 94 |
8 | 8 | 68 |
9 | 9 | 33 |
10 | 10 | 5 |
Here is the CREATE
statement for the table:
CREATE TABLE quiz_score
(
id integer NOT NULL GENERATED ALWAYS AS IDENTITY,
player_id integer NOT NULL,
score integer NOT NULL,
CONSTRAINT quiz_score_pkey PRIMARY KEY (id)
);
And here is the INSERT
statement to populate the table with the data:
INSERT INTO quiz_score (player_id, score)
SELECT i, floor(random()*(100-0+1))
FROM generate_series(1,10000000) i;
We have inserted ten million random scores into our table. Let us find out how many distinct scores there are in our table by running this query:
SELECT COUNT(DISTINCT score) FROM quiz_score;
This query took 3 seconds and 391 milliseconds with a count of 101 distinct values.
We can also run this other query to find the number of distinct scores in the table:
SELECT COUNT(*) FROM (SELECT DISTINCT score FROM quiz_score) AS DistinctScores;
This new query took 1 second and 572 milliseconds with a count of 101 distinct values.
As we can see, the second query was faster. Either query works fine, but the second query was faster in this case.
Get the Distinct Count of Field Values Based on Another Field in PostgreSQL
We will now introduce a new column called Expertise
. This field is populated based on the player’s score.
The Expertise
values are: Beginner
, Intermediary
, Expert
, and Master
. The expertise of a player is determined by a player’s score, as shown here:
Expertise | score |
---|---|
Beginner | 0 - 50 |
Intermediary | 51 - 80 |
Expert | 81 - 90 |
Master | 91 - 100 |
The newly updated quiz_score
table is:
id | player_id | score | player_rank |
---|---|---|---|
1 | 1 | 10 | Beginner |
2 | 2 | 10 | Beginner |
3 | 3 | 18 | Beginner |
4 | 4 | 69 | Intermediary |
5 | 5 | 24 | Beginner |
6 | 6 | 67 | Intermediary |
7 | 7 | 94 | Master |
8 | 8 | 68 | Intermediary |
9 | 9 | 33 | Beginner |
34 | 34 | 89 | Expert |
Here is the ALTER TABLE
statement to add the new column to the table:
ALTER TABLE quiz_score ADD COLUMN expertise text;
And here is the UPDATE
statement to populate the expertise field:
UPDATE quiz_score
SET expertise =
(CASE
WHEN score >= 0
AND score <= 50 THEN 'Beginner'
WHEN score > 50
AND score <= 80 THEN 'Intermediary'
WHEN score > 80
AND score <= 90 THEN 'Expert'
WHEN score > 90 THEN 'Master'
END);
In the previous example, we looked at distinct scores in the table. In this example, let us find out how many distinct scores there are for each expertise in our table by running this query:
SELECT expertise, COUNT(DISTINCT score)
FROM quiz_score
GROUP BY expertise
Here is the result:
expertise | count
--------------+-------
Beginner | 51
Intermediary | 10
Expert | 30
Master | 10
From this query, we can tell that the Beginner
expertise has 51 distinct scores, Intermediary
has 10 distinct scores, Expert
has 30 distinct scores, and Master
has 10 distinct scores. The query took 14 seconds and 515 milliseconds.
We can also run this other query to find the number of distinct scores in the table:
SELECT
DISTINCT ON (expertise) expertise,
COUNT(DISTINCT score)
FROM
quiz_score
GROUP BY expertise
This new query took 12 seconds and 165 milliseconds. In the second example, the second query was faster, but either query works fine.
In this tutorial, we have discussed how to get distinct values in a field and how to get distinct values in a field based on distinct values of another field.