Pandas DataFrame DataFrame.set_index() Function
-
Syntax of
pandas.DataFrame.set_index()Method: -
Example Codes: Set Pandas DataFrame Index With Pandas
DataFrame.set_index()Method -
Example Codes: Set
drop=Falsein PandasDataFrame.set_index()Method -
Example Codes: Set
inplace=Truein PandasDataFrame.set_indexMethod -
Example Codes: Set Multiple Index Column Using Pandas
DataFrame.set_index()Method -
Example Codes: Pandas
Dataframe.set_index()Behavior Whenverify_integrityIsTrue
The pandas.DataFrame.set_index() method can be used to set arrays or columns of appropriate length as an index of DataFrame even after the creation of DataFrame. The newly set index can replace the existing index or also can be expanded on the existing one.
Syntax of pandas.DataFrame.set_index() Method:
DataFrame.set_index(
keys, drop=True, append=False, inplace=False, verify_integrity=False
)
Parameters
keys |
column or list of columns to be set as index |
drop |
Boolean. The default value is True which deletes column to be set as index |
append |
Boolean. The default value is False, and it specifies whether to append columns to the existing index. |
inplace |
Boolean. If True, modify the caller DataFrame in-place |
verify_integrity |
Boolean. If True, raise ValueError on creating an index with duplicates. The default value is False. |
Return
If inplace is True, it returns a DataFrame object with modified index column; otherwise None.
Example Codes: Set Pandas DataFrame Index With Pandas DataFrame.set_index() Method
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','ABC' ) ,
('banana', 14, 'No','ABC' ) ,
('Apple', 44, 'Yes',"XYZ" ) ]
df = pd.DataFrame(fruit_list,
columns = ['Name',
'Price',
'In_Stock',
'Supplier'])
print(df)
df_modified=df.set_index("Name")
print(df_modified)
Output:
Name Price In_Stock Supplier
0 Orange 34 Yes ABC
1 Mango 24 No ABC
2 banana 14 No ABC
3 Apple 44 Yes XYZ
4 Pineapple 64 No XYZ
5 Kiwi 84 Yes XYZ
Price In_Stock Supplier
Name
Orange 34 Yes ABC
Mango 24 No ABC
banana 14 No ABC
Apple 44 Yes XYZ
Pineapple 64 No XYZ
Kiwi 84 Yes XYZ
The original Dataframe has the range of numbers as the default index column, and in modified_df, we set the column Name as the index using the set_index() method.
Example Codes: Set drop=False in Pandas DataFrame.set_index() Method
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','ABC' ) ,
('banana', 14, 'No','ABC' ) ,
('Apple', 44, 'Yes',"XYZ" ) ]
df = pd.DataFrame(fruit_list,
columns = ['Name',
'Price',
'In_Stock',
'Supplier'])
print(df)
df_modified=df.set_index("Name",drop=False)
print(df_modified)
Output:
Name Price In_Stock Supplier
0 Orange 34 Yes ABC
1 Mango 24 No ABC
2 banana 14 No ABC
3 Apple 44 Yes XYZ
Name Price In_Stock Supplier
Name
Orange Orange 34 Yes ABC
Mango Mango 24 No ABC
banana banana 14 No ABC
Apple Apple 44 Yes XYZ
If we set drop=False in the dataframe set_index method, the Name column still remains as a column in the Dataframe even after it is set as the index column.
Example Codes: Set inplace=True in Pandas DataFrame.set_index Method
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','ABC' ) ,
('banana', 14, 'No','ABC' ) ,
('Apple', 44, 'Yes',"XYZ" ) ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"])
print("Before Setting Index:")
print(df)
df.set_index("Name",inplace=True)
print("After Setting Index:")
print(df)
Output:
Before Setting Index:
Name Price In_Stock Supplier
0 Orange 34 Yes ABC
1 Mango 24 No ABC
2 banana 14 No ABC
3 Apple 44 Yes XYZ
After Setting Index:
Price In_Stock Supplier
Name
Orange 34 Yes ABC
Mango 24 No ABC
banana 14 No ABC
Apple 44 Yes XYZ
If we set inplace=True in set_index() method, the caller dataFrame gets modified in-place.
Example Codes: Set Multiple Index Column Using Pandas DataFrame.set_index() Method
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','ABC' ) ,
('banana', 14, 'No','ABC' ) ,
('Apple', 44, 'Yes',"XYZ" ) ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"])
print("Before Setting Index:")
print(df)
df.set_index("Name",append=True,inplace=True,drop=False)
print("After Setting Index:")
print(df)
Output:
Before Setting Index:
Name Price In_Stock Supplier
0 Orange 34 Yes ABC
1 Mango 24 No ABC
2 banana 14 No ABC
3 Apple 44 Yes XYZ
After Setting Index:
Name Price In_Stock Supplier
Name
0 Orange Orange 34 Yes ABC
1 Mango Mango 24 No ABC
2 banana banana 14 No ABC
3 Apple Apple 44 Yes XYZ
If we set append=True in the set_index method, it appends the newly set index column to the existing index and has multiple index columns for the single DataFrame.
Example Codes: Pandas Dataframe.set_index() Behavior When verify_integrity Is True
import pandas as pd
fruit_list = [
("Orange", 34, "Yes", "ABC"),
("Mango", 24, "No", "ABC"),
("Apple", 14, "No", "ABC"),
("Apple", 44, "Yes", "XYZ"),
]
df = pd.DataFrame(fruit_list, columns=["Name", "Price", "In_Stock", "Supplier"])
df_modified = df.set_index("Name", verify_integrity=True)
print(df_modified)
Output:
Traceback (most recent call last):
.....line 3920, in set_index
dup=duplicates))
ValueError: Index has duplicate keys: Index(['Apple'], dtype='object', name='Name')
It raises ValueError because the index has duplicate keys - Apple. It has two Apple in the column that is set to be the index; therefore, it raises an error if verify_integrity is set to be True in the set_index() method.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn