MySQL DISTINCT
This article describes how to remove duplicate rows in the result using MySQL DISTINCT
keyword.
Sometimes, there are some duplicate rows in the result set returned by the SELECT
statement. For example, there are many repeated last names in the actor
table. If you want a unique result set that has no duplicate rows, you need to use DISTINCT
keywords.
DISTINCT syntax
In the SELECT
statement, you can specify on or more columns which you are expected unique after DISTINCT
keyword . The syntax of the DISTINCT
clause is as follows:
SELECT DISTINCT
columns_list
FROM
table_name
Here:
- The
DISTINCT
keyword is between theSELECT
keyword and columns list. - The
columns_list
specifies one or multiple columns which will be unique in the result set. - The
columns_list
can also be*
.
DISTINCT examples
In the following examples, we use actor
table from Sakila sample database as a demonstration.
DISTINCT one column
There are many duplicate last names in the actor
table. Let’s take a look at them:
SELECT last_name FROM actor;
+--------------+
| last_name |
+--------------+
| AKROYD |
| AKROYD |
| AKROYD |
| ALLEN |
| ALLEN |
...
| ZELLWEGER |
| ZELLWEGER |
+--------------+
200 rows in set (0.01 sec)
Now, we use the DISTINCT
keywords delete duplicate data:
SELECT DISTINCT last_name FROM actor;
+--------------+
| last_name |
+--------------+
| AKROYD |
| ALLEN |
| ASTAIRE |
| BACALL |
| BAILEY |
...
| ZELLWEGER |
+--------------+
121 rows in set (0.00 sec)
We can see that there is no duplicate last names in the output result set. The number of rows in the result set has also changed from 200 rows to 121 rows.
DISTINCT multiple columns
You can also clear duplicate values in multiple columns using DISTINCT
. When multiple column values are specified after DISTINCT
keywords, a combination value of multiple columns is used to determine the uniqueness of the row.
In reality world, there are not only many duplicate last names, but also many duplicate first names. Let us retrieve first names and last names from the actor
table:
SELECT last_name, first_name FROM actor;
+--------------+-------------+
| last_name | first_name |
+--------------+-------------+
| GUINESS | PENELOPE |
| WAHLBERG | NICK |
| CHASE | ED |
| DAVIS | JENNIFER |
| LOLLOBRIGIDA | JOHNNY |
| NICHOLSON | BETTE |
...
| TEMPLE | THORA |
+--------------+-------------+
200 rows in set (0.01 sec)
There are 200 rows in the actor
table.
Now, let us use the DISTINCT
keywords remove duplicate rows:
SELECT DISTINCT last_name, first_name FROM actor;
+--------------+-------------+
| last_name | first_name |
+--------------+-------------+
| GUINESS | PENELOPE |
| WAHLBERG | NICK |
| CHASE | ED |
| DAVIS | JENNIFER |
| LOLLOBRIGIDA | JOHNNY |
...
| TEMPLE | THORA |
+--------------+-------------+
199 rows in set (0.01 sec)
Now, there are 199 rows in the actor
table.
We can see that the number of rows in the output result set has also changed from 200 rows to 199 rows. This shows that there is a duplicate row in the actor
table. We can also prove this using the following SELECT
statement with group.
SELECT last_name, first_name, COUNT(*)
FROM actor
GROUP BY last_name , first_name
HAVING COUNT(*) > 1;
+-----------+------------+----------+
| last_name | first_name | COUNT(*) |
+-----------+------------+----------+
| DAVIS | SUSAN | 2 |
+-----------+------------+----------+
1 row in set (0.00 sec)
DISTINCT and NULL
When DISTINCT
meets NULL
value, only one NULL
value will be kept. Because DISTINCT
thinks that all NULL
values are the same, regardless of the data type of the column.
For example, the following SQL return multiple rows using the NULL
record:
SELECT *
FROM (
SELECT NULL
UNION ALL
SELECT NULL
UNION ALL
SELECT NULL
) t;
+------------+
| NULL |
+------------+
| NULL |
| NULL |
| NULL |
+------------+
3 rows in set (0.00 sec)
Let us use DISTINCT
:
SELECT DISTINCT *
FROM (
SELECT NULL
UNION ALL
SELECT NULL
UNION ALL
SELECT NULL
) t;
+------------+
| NULL |
+------------+
| NULL |
+------------+
1 row in set (0.00 sec)
In this example, we use UNION
keyword to create a temprorary table that includes many NULL
values.
Conclusion
In this article, you leared how to remove duplicate rows in the result set using MySQL DISTINCT
clause. The main points of the DISTINCT
clause are as follows:
DISTINCT
can be used to remove duplicate rows in the result set.- You can specify one or multiple columns after
DISTINCT
keywords. You can alse use*
; DISTINCT
thinks that allNULL
values are the same.