How to Get Pandas Unique Values in Column and Sort Them
-
Get Unique Values in Pandas DataFrame Column With
unique
Method -
Get Unique Values in Pandas DataFrame Column With
drop_duplicates
Method - Sort a Column in Pandas DataFrame
This article will introduce how to get unique values in the Pandas DataFrame column.
For example, suppose we have a DataFrame consisting of individuals and their professions, and we want to know the total number of professions. In that case, we cannot simply use the total row-count to determine the total unique professions because many people can have the same job. For such situations, we can use the unique()
and drop_duplicates()
functions provided by Pandas library.
It’s also important to know how to sort your DataFrame since it can help visualize and understand the data. sorted()
and sort_values()
functions can help achieve this.
We will sort and remove the following DataFrame in this tutorial.
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": [7, 1, 5, 4, 2, 1, 4, 4, 8], "B": [1, 2, 8, 5, 3, 4, 2, 6, 8]})
print(df)
Output:
A B
0 7 1
1 1 2
2 5 8
3 4 5
4 2 3
5 1 4
6 4 2
7 4 6
8 8 8
Get Unique Values in Pandas DataFrame Column With unique
Method
Pandas Series’ unique()
method is used when we deal with a single column of a DataFrame and returns all unique elements of a column. The final output using the unique()
function is an array.
Example:
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": [7, 1, 5, 4, 2, 1, 4, 4, 8], "B": [1, 2, 8, 5, 3, 4, 2, 6, 8]})
print(df["A"].unique())
print(type(df["A"].unique()))
Output:
[7 1 5 4 2 8]
numpy.ndarray
Get Unique Values in Pandas DataFrame Column With drop_duplicates
Method
drop_duplicates()
can be applied to the DataFrame or its subset and preserves the type of the DataFrame object. It is also considered a faster option when dealing with huge data sets to remove duplicate values.
Example:
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": [7, 1, 5, 4, 2, 1, 4, 4, 8], "B": [1, 2, 8, 5, 3, 4, 2, 6, 8]})
print(df.drop_duplicates(subset="A"))
print(type(df.drop_duplicates(subset="A")))
Output:
A B
0 7 1
1 1 2
2 5 8
3 4 5
4 2 3
8 8 8
pandas.core.frame.DataFrame
Sort a Column in Pandas DataFrame
We can use the sorted()
method to sort a column, but it converts the final result to a list type object. We can also sort the column values in descending order by putting the reversed
parameter as True
.
The following example sorts the column in ascending order and removes the duplicate values:
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": [7, 1, 5, 4, 2, 1, 4, 4, 8], "B": [1, 2, 8, 5, 3, 4, 2, 6, 8]})
df_new = df.drop_duplicates(subset="A")
print(sorted(df_new["A"]))
print(type(sorted(df_new["A"])))
Output:
[1, 2, 4, 5, 7, 8]
list
sort_values()
is another flexible option to sort a DataFrame. Here we can specify the column to be sorted using the by
parameter and whether the order is ascending or descending using the ascending
parameter. It preserves the object type as Pandas DataFrame.
The following example sorts the column in descending order and removes the duplicate values:
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": [7, 1, 5, 4, 2, 1, 4, 4, 8], "B": [1, 2, 8, 5, 3, 4, 2, 6, 8]})
df_new = df.drop_duplicates(subset="A")
print(df_new.sort_values(by="A", ascending=False))
type(df_new.sort_values(by="A"))
Output:
A B
8 8 8
0 7 1
2 5 8
3 4 5
4 2 3
1 1 2
pandas.core.frame.DataFrame
Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.
LinkedIn