Pandas DataFrame DataFrame.set_index() Function

Suraj Joshi Jan 30, 2023
  1. Syntax of pandas.DataFrame.set_index() Method:
  2. Example Codes: Set Pandas DataFrame Index With Pandas DataFrame.set_index() Method
  3. Example Codes: Set drop=False in Pandas DataFrame.set_index() Method
  4. Example Codes: Set inplace=True in Pandas DataFrame.set_index Method
  5. Example Codes: Set Multiple Index Column Using Pandas DataFrame.set_index() Method
  6. Example Codes: Pandas Dataframe.set_index() Behavior When verify_integrity Is True
Pandas DataFrame DataFrame.set_index() Function

The pandas.DataFrame.set_index() method can be used to set arrays or columns of appropriate length as an index of DataFrame even after the creation of DataFrame. The newly set index can replace the existing index or also can be expanded on the existing one.

Syntax of pandas.DataFrame.set_index() Method:

DataFrame.set_index(
    keys, drop=True, append=False, inplace=False, verify_integrity=False
)

Parameters

keys column or list of columns to be set as index
drop Boolean. The default value is True which deletes column to be set as index
append Boolean. The default value is False, and it specifies whether to append columns to the existing index.
inplace Boolean. If True, modify the caller DataFrame in-place
verify_integrity Boolean. If True, raise ValueError on creating an index with duplicates. The default value is False.

Return

If inplace is True, it returns a DataFrame object with modified index column; otherwise None.

Example Codes: Set Pandas DataFrame Index With Pandas DataFrame.set_index() Method

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" ) ]

df = pd.DataFrame(fruit_list, 
                  columns = ['Name',
                             'Price',
                             'In_Stock',
                             'Supplier']) 
print(df)
df_modified=df.set_index("Name")
print(df_modified)

Output:

        Name  Price In_Stock Supplier
0     Orange     34      Yes      ABC
1      Mango     24       No      ABC
2     banana     14       No      ABC
3      Apple     44      Yes      XYZ
4  Pineapple     64       No      XYZ
5       Kiwi     84      Yes      XYZ
           Price In_Stock Supplier
Name                              
Orange        34      Yes      ABC
Mango         24       No      ABC
banana        14       No      ABC
Apple         44      Yes      XYZ
Pineapple     64       No      XYZ
Kiwi          84      Yes      XYZ

The original Dataframe has the range of numbers as the default index column, and in modified_df, we set the column Name as the index using the set_index() method.

Example Codes: Set drop=False in Pandas DataFrame.set_index() Method

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" )  ]

df = pd.DataFrame(fruit_list, 
                  columns = ['Name',
                             'Price',
                             'In_Stock',
                             'Supplier']) 
print(df)

df_modified=df.set_index("Name",drop=False)

print(df_modified)

Output:

     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      ABC
2  banana     14       No      ABC
3   Apple     44      Yes      XYZ
          Name  Price In_Stock Supplier
Name                                   
Orange  Orange     34      Yes      ABC
Mango    Mango     24       No      ABC
banana  banana     14       No      ABC
Apple    Apple     44      Yes      XYZ

If we set drop=False in the dataframe set_index method, the Name column still remains as a column in the Dataframe even after it is set as the index column.

Example Codes: Set inplace=True in Pandas DataFrame.set_index Method

import pandas as pd

fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" )  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"]) 
print("Before Setting Index:")
print(df)
df.set_index("Name",inplace=True)
print("After Setting Index:")
print(df)

Output:

Before Setting Index:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      ABC
2  banana     14       No      ABC
3   Apple     44      Yes      XYZ
After Setting Index:
        Price In_Stock Supplier
Name                           
Orange     34      Yes      ABC
Mango      24       No      ABC
banana     14       No      ABC
Apple      44      Yes      XYZ

If we set inplace=True in set_index() method, the caller dataFrame gets modified in-place.

Example Codes: Set Multiple Index Column Using Pandas DataFrame.set_index() Method

import pandas as pd

fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" )  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"]) 
print("Before Setting Index:")
print(df)
df.set_index("Name",append=True,inplace=True,drop=False)
print("After Setting Index:")
print(df)

Output:

Before Setting Index:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      ABC
2  banana     14       No      ABC
3   Apple     44      Yes      XYZ
After Setting Index:
            Name  Price In_Stock Supplier
  Name                                   
0 Orange  Orange     34      Yes      ABC
1 Mango    Mango     24       No      ABC
2 banana  banana     14       No      ABC
3 Apple    Apple     44      Yes      XYZ

If we set append=True in the set_index method, it appends the newly set index column to the existing index and has multiple index columns for the single DataFrame.

Example Codes: Pandas Dataframe.set_index() Behavior When verify_integrity Is True

import pandas as pd

fruit_list = [
    ("Orange", 34, "Yes", "ABC"),
    ("Mango", 24, "No", "ABC"),
    ("Apple", 14, "No", "ABC"),
    ("Apple", 44, "Yes", "XYZ"),
]

df = pd.DataFrame(fruit_list, columns=["Name", "Price", "In_Stock", "Supplier"])

df_modified = df.set_index("Name", verify_integrity=True)
print(df_modified)

Output:

Traceback (most recent call last):
  .....line 3920, in set_index
    dup=duplicates))
ValueError: Index has duplicate keys: Index(['Apple'], dtype='object', name='Name')

It raises ValueError because the index has duplicate keys - Apple. It has two Apple in the column that is set to be the index; therefore, it raises an error if verify_integrity is set to be True in the set_index() method.

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

Related Article - Pandas DataFrame