Pandas DataFrame DataFrame.set_index() Function

Suraj Joshi Jan 30, 2023 Pandas Pandas DataFrame

Syntax of pandas.DataFrame.set_index() Method:
Example Codes: Set Pandas DataFrame Index With Pandas DataFrame.set_index() Method
Example Codes: Set drop=False in Pandas DataFrame.set_index() Method
Example Codes: Set inplace=True in Pandas DataFrame.set_index Method
Example Codes: Set Multiple Index Column Using Pandas DataFrame.set_index() Method
Example Codes: Pandas Dataframe.set_index() Behavior When verify_integrity Is True

Pandas DataFrame DataFrame.set_index() Function

The pandas.DataFrame.set_index() method can be used to set arrays or columns of appropriate length as an index of DataFrame even after the creation of DataFrame. The newly set index can replace the existing index or also can be expanded on the existing one.

Syntax of `pandas.DataFrame.set_index()` Method:

DataFrame.set_index(
    keys, drop=True, append=False, inplace=False, verify_integrity=False
)

Parameters


`keys`	column or list of columns to be set as index
`drop`	Boolean. The default value is `True` which deletes column to be set as index
`append`	Boolean. The default value is `False`, and it specifies whether to append columns to the existing index.
`inplace`	Boolean. If `True`, modify the caller DataFrame in-place
`verify_integrity`	Boolean. If `True`, raise `ValueError` on creating an index with duplicates. The default value is `False`.

Return

If inplace is True, it returns a DataFrame object with modified index column; otherwise None.

Example Codes: Set Pandas DataFrame Index With Pandas `DataFrame.set_index()` Method

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" ) ]

df = pd.DataFrame(fruit_list, 
                  columns = ['Name',
                             'Price',
                             'In_Stock',
                             'Supplier']) 
print(df)
df_modified=df.set_index("Name")
print(df_modified)

Output:

        Name  Price In_Stock Supplier
0     Orange     34      Yes      ABC
1      Mango     24       No      ABC
2     banana     14       No      ABC
3      Apple     44      Yes      XYZ
4  Pineapple     64       No      XYZ
5       Kiwi     84      Yes      XYZ
           Price In_Stock Supplier
Name                              
Orange        34      Yes      ABC
Mango         24       No      ABC
banana        14       No      ABC
Apple         44      Yes      XYZ
Pineapple     64       No      XYZ
Kiwi          84      Yes      XYZ

The original Dataframe has the range of numbers as the default index column, and in modified_df, we set the column Name as the index using the set_index() method.

Example Codes: Set `drop=False` in Pandas `DataFrame.set_index()` Method

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" )  ]

df = pd.DataFrame(fruit_list, 
                  columns = ['Name',
                             'Price',
                             'In_Stock',
                             'Supplier']) 
print(df)

df_modified=df.set_index("Name",drop=False)

print(df_modified)

Output:

     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      ABC
2  banana     14       No      ABC
3   Apple     44      Yes      XYZ
          Name  Price In_Stock Supplier
Name                                   
Orange  Orange     34      Yes      ABC
Mango    Mango     24       No      ABC
banana  banana     14       No      ABC
Apple    Apple     44      Yes      XYZ

If we set drop=False in the dataframe set_index method, the Name column still remains as a column in the Dataframe even after it is set as the index column.

Example Codes: Set `inplace=True` in Pandas `DataFrame.set_index` Method

import pandas as pd

fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" )  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"]) 
print("Before Setting Index:")
print(df)
df.set_index("Name",inplace=True)
print("After Setting Index:")
print(df)

Output:

Before Setting Index:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      ABC
2  banana     14       No      ABC
3   Apple     44      Yes      XYZ
After Setting Index:
        Price In_Stock Supplier
Name                           
Orange     34      Yes      ABC
Mango      24       No      ABC
banana     14       No      ABC
Apple      44      Yes      XYZ

If we set inplace=True in set_index() method, the caller dataFrame gets modified in-place.

Example Codes: Set Multiple Index Column Using Pandas `DataFrame.set_index()` Method

import pandas as pd

fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" )  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"]) 
print("Before Setting Index:")
print(df)
df.set_index("Name",append=True,inplace=True,drop=False)
print("After Setting Index:")
print(df)

Output:

Before Setting Index:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      ABC
2  banana     14       No      ABC
3   Apple     44      Yes      XYZ
After Setting Index:
            Name  Price In_Stock Supplier
  Name                                   
0 Orange  Orange     34      Yes      ABC
1 Mango    Mango     24       No      ABC
2 banana  banana     14       No      ABC
3 Apple    Apple     44      Yes      XYZ

If we set append=True in the set_index method, it appends the newly set index column to the existing index and has multiple index columns for the single DataFrame.

Example Codes: Pandas `Dataframe.set_index()` Behavior When `verify_integrity` Is `True`

import pandas as pd

fruit_list = [
    ("Orange", 34, "Yes", "ABC"),
    ("Mango", 24, "No", "ABC"),
    ("Apple", 14, "No", "ABC"),
    ("Apple", 44, "Yes", "XYZ"),
]

df = pd.DataFrame(fruit_list, columns=["Name", "Price", "In_Stock", "Supplier"])

df_modified = df.set_index("Name", verify_integrity=True)
print(df_modified)

Output:

Traceback (most recent call last):
  .....line 3920, in set_index
    dup=duplicates))
ValueError: Index has duplicate keys: Index(['Apple'], dtype='object', name='Name')

It raises ValueError because the index has duplicate keys - Apple. It has two Apple in the column that is set to be the index; therefore, it raises an error if verify_integrity is set to be True in the set_index() method.

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe

Author: Suraj Joshi

Suraj Joshi is a backend software engineer at Matrice.ai.

Syntax of pandas.DataFrame.set_index() Method:

Parameters

Return

Example Codes: Set Pandas DataFrame Index With Pandas DataFrame.set_index() Method

Example Codes: Set drop=False in Pandas DataFrame.set_index() Method

Example Codes: Set inplace=True in Pandas DataFrame.set_index Method

Example Codes: Set Multiple Index Column Using Pandas DataFrame.set_index() Method

Example Codes: Pandas Dataframe.set_index() Behavior When verify_integrity Is True

Related Article - Pandas DataFrame

Syntax of `pandas.DataFrame.set_index()` Method:

Example Codes: Set Pandas DataFrame Index With Pandas `DataFrame.set_index()` Method

Example Codes: Set `drop=False` in Pandas `DataFrame.set_index()` Method

Example Codes: Set `inplace=True` in Pandas `DataFrame.set_index` Method

Example Codes: Set Multiple Index Column Using Pandas `DataFrame.set_index()` Method

Example Codes: Pandas `Dataframe.set_index()` Behavior When `verify_integrity` Is `True`