Pandas pandas.melt() Function
-
Syntax of
pandas.melt()
-
Example Codes:
pandas.melt()
-
Example Codes:
pandas.melt()
With Single Column asid_vars
-
Example Codes:
pandas.melt()
With Skipping Columns -
Example Codes:
pandas.melt()
With Multiple Columns
pandas.melt()
function reshapes or transforms an existing DataFrame
. It changes the orientation of the DataFrame
from a wide format to a long format.
Syntax of pandas.melt()
pandas.melt(dataframe, id_vars, value_vars, var_name, value_name, col_level)
Parameters
Dataframe |
mandatory | It is the DataFrame that we wish to change into the long format. |
id_vars |
optional | It can be a tuple , list , or an N-dimensional array. It is the column used for identifier variables. You can select more than one identifier column. |
value_vars |
optional | It can be a tuple , list , or an N-dimensional array. By default, the columns not specified as identifier variables are value variables. You can also select these. |
var_name |
optional | It is a scalar type variable. It is the name of the identifier column. By default, it is variable . |
value_name |
optional | It is a scalar type variable. It is the name of the non-identifier column. By default, it is value . |
col_level |
optional | It is an integer or a string. In the case of multi-index columns, we can use this parameter to transform our DataFrame . |
Return
It returns a transformed DataFrame
that contains one or more identifier columns and only two non-identifier columns named variable
and value
.
Example Codes: pandas.melt()
At first, we will check this function by only passing the mandatory parameter i.e DataFrame
.
import pandas as pd
dataframe=pd.DataFrame({'Attendance': {0: 60, 1: 100, 2: 80,3: 78,4: 95},
'Name': {0: 'Olivia', 1: 'John', 2: 'Laura',3: 'Ben',4: 'Kevin'},
'Obtained Marks': {0: '90%', 1: '75%', 2: '82%',3: '64%',4: '45%'}})
dataframe1 = pd.melt(dataframe)
print(dataframe1)
The example DataFrame
is as below.
Attendance Name Obtained Marks
0 60 Olivia 90%
1 100 John 75%
2 80 Laura 82%
3 78 Ben 64%
4 95 Kevin 45%
Output:
variable value
0 Attendance 60
1 Attendance 100
2 Attendance 80
3 Attendance 78
4 Attendance 95
5 Name Olivia
6 Name John
7 Name Laura
8 Name Ben
9 Name Kevin
10 Obtained Marks 90%
11 Obtained Marks 75%
12 Obtained Marks 82%
13 Obtained Marks 64%
14 Obtained Marks 45%
Here, you can see that in output there is no identifier column. We have two non-identifier columns. Each column of the original DataFrame
is now a row in the output DataFrame
.
Now we will pass the optional parameters and check the results.
Example Codes: pandas.melt()
With Single Column as id_vars
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: 100, 2: 80, 3: 78, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: "90%", 1: "75%", 2: "82%", 3: "64%", 4: "45%"},
}
)
dataframe1 = pd.melt(dataframe, id_vars=["Name"])
print(dataframe1)
Output:
Name variable value
0 Olivia Attendance 60
1 John Attendance 100
2 Laura Attendance 80
3 Ben Attendance 78
4 Kevin Attendance 95
5 Olivia Obtained Marks 90%
6 John Obtained Marks 75%
7 Laura Obtained Marks 82%
8 Ben Obtained Marks 64%
9 Kevin Obtained Marks 45%
The identifier column is specified as Name
, and the variable
and value
columns are next to it with the values extracted from the original dataframe.
We could also assign names of var_name
and value_name
to replace the default variable
and values
.
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: 100, 2: 80, 3: 78, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: "90%", 1: "75%", 2: "82%", 3: "64%", 4: "45%"},
}
)
dataframe1 = pd.melt(
dataframe, id_vars=["Name"], var_name="Performance", value_name="Success"
)
print(dataframe1)
Output:
Name Performance Success
0 Olivia Attendance 60
1 John Attendance 100
2 Laura Attendance 80
3 Ben Attendance 78
4 Kevin Attendance 95
5 Olivia Obtained Marks 90%
6 John Obtained Marks 75%
7 Laura Obtained Marks 82%
8 Ben Obtained Marks 64%
9 Kevin Obtained Marks 45%
Example Codes: pandas.melt()
With Skipping Columns
If we want to check the attendance only, we need to specify the value_vars
.
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: 100, 2: 80, 3: 78, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: "90%", 1: "75%", 2: "82%", 3: "64%", 4: "45%"},
}
)
dataframe1 = pd.melt(
dataframe,
id_vars=["Name"],
value_vars="Attendance",
var_name="Performance",
value_name="Success",
)
print(dataframe1)
Output:
Name Performance Success
0 Olivia Attendance 60
1 John Attendance 100
2 Laura Attendance 80
3 Ben Attendance 78
4 Kevin Attendance 95
It only shows the information of the Attendance
column in the original dataframe.
Example Codes: pandas.melt()
With Multiple Columns
We add an extra column ID
to the demo dataframe.
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: 100, 2: 80, 3: 78, 4: 95},
"ID": {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: "90%", 1: "75%", 2: "82%", 3: "64%", 4: "45%"},
}
)
dataframe1 = pd.melt(dataframe, id_vars=["ID", "Name"])
print(dataframe1)
Output:
ID Name variable value
0 1 Olivia Attendance 60
1 2 John Attendance 100
2 3 Laura Attendance 80
3 4 Ben Attendance 78
4 5 Kevin Attendance 95
5 1 Olivia Obtained Marks 90%
6 2 John Obtained Marks 75%
7 3 Laura Obtained Marks 82%
8 4 Ben Obtained Marks 64%
9 5 Kevin Obtained Marks 45%
Both ID
and Name
columns are assigned as the identifier columns.