How to Count the Frequency a Value Occurs in Pandas Dataframe
Sometimes when you are working with dataframe you might want to count how many times a value occurs in the column or in other words to calculate the frequency. Majorly three methods are used for this purpose. Two out of them are from the DataFrame.groupby()
methods. Let us take a look at them one by one.
df.groupby().count()
Series.value_counts()
df.groupby().size()
We will use the same DataFrame
in the next sections as follows,
import pandas as pd
df = pd.DataFrame(
{
"A": ["jim", "jim", "jim", "jim", "sal", "tom", "tom", "sal", "sal"],
"B": ["a", "b", "a", "b", "b", "b", "a", "a", "b"],
}
)
df.groupby().count()
Method
If you want to calculate the frequency over a single column then this method is best.
import pandas as pd
df = pd.DataFrame(
{
"A": ["jim", "jim", "jim", "jim", "sal", "tom", "tom", "sal", "sal"],
"B": ["a", "b", "a", "b", "b", "b", "a", "a", "b"],
}
)
freq = df.groupby(["A"]).count()
print(freq)
freq = df.groupby(["B"]).count()
print(freq)
The following will be output.
B
A
jim 4
sal 3
tom 2
A
B
a 4
b 5
Series.value_counts()
Method
As every dataframe object is a collection of Series
objects, this method is best used for pandas.Series
object.
Now use Series.values_counts()
function
import pandas as pd
df = pd.DataFrame(
{
"A": ["jim", "jim", "jim", "jim", "sal", "tom", "tom", "sal", "sal"],
"B": ["a", "b", "a", "b", "b", "b", "a", "a", "b"],
}
)
freq = df["A"].value_counts()
print(freq)
freq = df["B"].value_counts()
print(freq)
The following will be output.
jim 4
sal 3
tom 2
Name: A, dtype: int64
b 5
a 4
Name: B, dtype: int64
df.groupby().size()
Method
The above two methods cannot be used to count the frequency of multiple columns but we can use df.groupby().size()
for multiple columns at the same time.
import pandas as pd
df = pd.DataFrame(
{
"A": ["jim", "jim", "jim", "jim", "sal", "tom", "tom", "sal", "sal"],
"B": ["a", "b", "a", "b", "b", "b", "a", "a", "b"],
}
)
freq = df.groupby(["A", "B"]).size()
print(freq)
The following will be output.
A B
jim a 2
b 2
sal a 1
b 2
tom a 1
b 1
dtype: int64