Pandas DataFrame.describe() Function
Minahil Noor
Jan 30, 2023
-
Syntax of
pandas.DataFrame.describe()
: -
Example Codes:
DataFrame.describe()
Method to Find the Statistics of a Data Frame -
Example Codes:
DataFrame.describe()
Method to Find the Statistics of Each Column -
Example Codes:
DataFrame.describe()
Method to Find the Statistics of Numeric Columns
Python Pandas DataFrame.describe()
function tells about the statistical data of a DataFrame.
Syntax of pandas.DataFrame.describe()
:
DataFrame.describe(
percentiles=None, include=None, exclude=None, datetime_is_numeric=False
)
Parameters
percentiles |
This parameter tells about the percentiles to include in the output. All values should be between 0 and 1. The default is [.25, .5, .75] , which returns the 25th, 50th, and 75th percentiles. |
include |
It specifies the data types to include in the output. It has three options. all : all columns of the input will be included in the output. A list-like of data types: limits the results to the provided data types. None : The result will include all numeric columns. |
exclude |
It specifies the data types to exclude from the output. It has two options. A list-like of data types: excludes the provided data types from the result. None : The result will exclude nothing. |
datetime_is_numeric |
A boolean parameter. It tells whether to treat datetime data types as numeric. |
Return
It returns the summary of statistics of the Series
or Dataframe passed.
Example Codes: DataFrame.describe()
Method to Find the Statistics of a Data Frame
import pandas as pd
dataframe=pd.DataFrame({'Attendance': {0: 60, 1: 100, 2: 80,3: 78,4: 95},
'Name': {0: 'Olivia', 1: 'John', 2: 'Laura',3: 'Ben',4: 'Kevin'},
'Obtained Marks': {0: 90, 1: 75, 2: 82, 3: 64, 4: 45}})
print("The Original Data frame is: \n")
print(dataframe)
dataframe1 = dataframe.describe()
print("Statistics are: \n")
print(dataframe1)
Output:
The Original Data frame is:
Attendance Name Obtained Marks
0 60 Olivia 90
1 100 John 75
2 80 Laura 82
3 78 Ben 64
4 95 Kevin 45
Statistics are:
Attendance Obtained Marks
count 5.000000 5.000000
mean 82.600000 71.200000
std 15.773395 17.484279
min 60.000000 45.000000
25% 78.000000 64.000000
50% 80.000000 75.000000
75% 95.000000 82.000000
max 100.000000 90.000000
The function has returned the summary of the statistics of the DataFrame. We have passed no parameters, so, the function has used all the default values.
Example Codes: DataFrame.describe()
Method to Find the Statistics of Each Column
We will find the statistics of all columns using the include
parameter.
import pandas as pd
dataframe=pd.DataFrame({'Attendance': {0: 60, 1: 100, 2: 80,3: 78,4: 95},
'Name': {0: 'Olivia', 1: 'John', 2: 'Laura',3: 'Ben',4: 'Kevin'},
'Obtained Marks': {0: 90, 1: 75, 2: 82, 3: 64, 4: 45}})
print("The Original Data frame is: \n")
print(dataframe)
dataframe1 = dataframe.describe(include='all')
print("Statistics are: \n")
print(dataframe1)
Output:
The Original Data frame is:
Attendance Name Obtained Marks
0 60 Olivia 90
1 100 John 75
2 80 Laura 82
3 78 Ben 64
4 95 Kevin 45
Statistics are:
Attendance Name Obtained Marks
count 5.000000 5 5.000000
unique NaN 5 NaN
top NaN Kevin NaN
freq NaN 1 NaN
mean 82.600000 NaN 71.200000
std 15.773395 NaN 17.484279
min 60.000000 NaN 45.000000
25% 78.000000 NaN 64.000000
50% 80.000000 NaN 75.000000
75% 95.000000 NaN 82.000000
max 100.000000 NaN 90.000000
The function has returned the summary of statistics of all columns of the DataFrame.
Example Codes: DataFrame.describe()
Method to Find the Statistics of Numeric Columns
Now we will find the statistics of numeric columns only using the exclude
parameter.
import pandas as pd
dataframe=pd.DataFrame({'Attendance': {0: 60, 1: 100, 2: 80,3: 78,4: 95},
'Name': {0: 'Olivia', 1: 'John', 2: 'Laura',3: 'Ben',4: 'Kevin'},
'Obtained Marks': {0: 90, 1: 75, 2: 82, 3: 64, 4: 45}})
print("The Original Data frame is: \n")
print(dataframe)
dataframe1 = dataframe.describe(exclude=[object])
print("Statistics are: \n")
print(dataframe1)
Output:
The Original Data frame is:
Attendance Name Obtained Marks
0 60 Olivia 90
1 100 John 75
2 80 Laura 82
3 78 Ben 64
4 95 Kevin 45
Statistics are:
Attendance Obtained Marks
count 5.000000 5.000000
mean 82.600000 71.200000
std 15.773395 17.484279
min 60.000000 45.000000
25% 78.000000 64.000000
50% 80.000000 75.000000
75% 95.000000 82.000000
max 100.000000 90.000000
We have excluded the data type object
.