Pandas DataFrame DataFrame.set_index() Function
-
Syntax of
pandas.DataFrame.set_index()
Method: -
Example Codes: Set Pandas DataFrame Index With Pandas
DataFrame.set_index()
Method -
Example Codes: Set
drop=False
in PandasDataFrame.set_index()
Method -
Example Codes: Set
inplace=True
in PandasDataFrame.set_index
Method -
Example Codes: Set Multiple Index Column Using Pandas
DataFrame.set_index()
Method -
Example Codes: Pandas
Dataframe.set_index()
Behavior Whenverify_integrity
IsTrue
The pandas.DataFrame.set_index()
method can be used to set arrays or columns of appropriate length as an index of DataFrame even after the creation of DataFrame. The newly set index can replace the existing index or also can be expanded on the existing one.
Syntax of pandas.DataFrame.set_index()
Method:
DataFrame.set_index(
keys, drop=True, append=False, inplace=False, verify_integrity=False
)
Parameters
keys |
column or list of columns to be set as index |
drop |
Boolean. The default value is True which deletes column to be set as index |
append |
Boolean. The default value is False , and it specifies whether to append columns to the existing index. |
inplace |
Boolean. If True , modify the caller DataFrame in-place |
verify_integrity |
Boolean. If True , raise ValueError on creating an index with duplicates. The default value is False . |
Return
If inplace
is True
, it returns a DataFrame
object with modified index column; otherwise None
.
Example Codes: Set Pandas DataFrame Index With Pandas DataFrame.set_index()
Method
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','ABC' ) ,
('banana', 14, 'No','ABC' ) ,
('Apple', 44, 'Yes',"XYZ" ) ]
df = pd.DataFrame(fruit_list,
columns = ['Name',
'Price',
'In_Stock',
'Supplier'])
print(df)
df_modified=df.set_index("Name")
print(df_modified)
Output:
Name Price In_Stock Supplier
0 Orange 34 Yes ABC
1 Mango 24 No ABC
2 banana 14 No ABC
3 Apple 44 Yes XYZ
4 Pineapple 64 No XYZ
5 Kiwi 84 Yes XYZ
Price In_Stock Supplier
Name
Orange 34 Yes ABC
Mango 24 No ABC
banana 14 No ABC
Apple 44 Yes XYZ
Pineapple 64 No XYZ
Kiwi 84 Yes XYZ
The original Dataframe
has the range of numbers as the default index column, and in modified_df
, we set the column Name
as the index using the set_index()
method.
Example Codes: Set drop=False
in Pandas DataFrame.set_index()
Method
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','ABC' ) ,
('banana', 14, 'No','ABC' ) ,
('Apple', 44, 'Yes',"XYZ" ) ]
df = pd.DataFrame(fruit_list,
columns = ['Name',
'Price',
'In_Stock',
'Supplier'])
print(df)
df_modified=df.set_index("Name",drop=False)
print(df_modified)
Output:
Name Price In_Stock Supplier
0 Orange 34 Yes ABC
1 Mango 24 No ABC
2 banana 14 No ABC
3 Apple 44 Yes XYZ
Name Price In_Stock Supplier
Name
Orange Orange 34 Yes ABC
Mango Mango 24 No ABC
banana banana 14 No ABC
Apple Apple 44 Yes XYZ
If we set drop=False
in the dataframe set_index
method, the Name
column still remains as a column in the Dataframe
even after it is set as the index
column.
Example Codes: Set inplace=True
in Pandas DataFrame.set_index
Method
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','ABC' ) ,
('banana', 14, 'No','ABC' ) ,
('Apple', 44, 'Yes',"XYZ" ) ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"])
print("Before Setting Index:")
print(df)
df.set_index("Name",inplace=True)
print("After Setting Index:")
print(df)
Output:
Before Setting Index:
Name Price In_Stock Supplier
0 Orange 34 Yes ABC
1 Mango 24 No ABC
2 banana 14 No ABC
3 Apple 44 Yes XYZ
After Setting Index:
Price In_Stock Supplier
Name
Orange 34 Yes ABC
Mango 24 No ABC
banana 14 No ABC
Apple 44 Yes XYZ
If we set inplace=True
in set_index()
method, the caller dataFrame
gets modified in-place.
Example Codes: Set Multiple Index Column Using Pandas DataFrame.set_index()
Method
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','ABC' ) ,
('banana', 14, 'No','ABC' ) ,
('Apple', 44, 'Yes',"XYZ" ) ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"])
print("Before Setting Index:")
print(df)
df.set_index("Name",append=True,inplace=True,drop=False)
print("After Setting Index:")
print(df)
Output:
Before Setting Index:
Name Price In_Stock Supplier
0 Orange 34 Yes ABC
1 Mango 24 No ABC
2 banana 14 No ABC
3 Apple 44 Yes XYZ
After Setting Index:
Name Price In_Stock Supplier
Name
0 Orange Orange 34 Yes ABC
1 Mango Mango 24 No ABC
2 banana banana 14 No ABC
3 Apple Apple 44 Yes XYZ
If we set append=True
in the set_index
method, it appends the newly set index column to the existing index and has multiple index columns for the single DataFrame.
Example Codes: Pandas Dataframe.set_index()
Behavior When verify_integrity
Is True
import pandas as pd
fruit_list = [
("Orange", 34, "Yes", "ABC"),
("Mango", 24, "No", "ABC"),
("Apple", 14, "No", "ABC"),
("Apple", 44, "Yes", "XYZ"),
]
df = pd.DataFrame(fruit_list, columns=["Name", "Price", "In_Stock", "Supplier"])
df_modified = df.set_index("Name", verify_integrity=True)
print(df_modified)
Output:
Traceback (most recent call last):
.....line 3920, in set_index
dup=duplicates))
ValueError: Index has duplicate keys: Index(['Apple'], dtype='object', name='Name')
It raises ValueError
because the index has duplicate keys - Apple
. It has two Apple
in the column that is set to be the index; therefore, it raises an error if verify_integrity
is set to be True
in the set_index()
method.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn