How to Get the Substring of a Column in Pandas
- Get the Substring of a Column in Pandas
-
Use the
str.slice()
Function to Get the Substring of a Column in Pandas - Use Square Brackets to Get the Substring of a Column in Pandas
-
Use the
str.extract()
Function to Get the Substring of a Column in Pandas
In this tutorial, we will learn how to obtain the substring of the column in Pandas.
Get the Substring of a Column in Pandas
This extraction can be helpful in many scenarios when working along with data. For instance, consider a case where we want to create a username from the user’s first name.
We will use multiple approaches to perform this.
To begin with, let us create a Pandas data frame on which we will work throughout our tutorial. We will include a name
column in our data frame and will aim to extract a username from that column.
Code:
import pandas as pd
dict = {"Name": ["Shivesh Jha", "Sanay Shah", "Rutwik Sonawane"]}
df = pd.DataFrame.from_dict(dict)
Let us have a look at our data frame.
print(df)
Output:
Name
0 Shivesh Jha
1 Sanay Shah
2 Rutwik Sonawane
Let us now go through various ways we can employ to obtain substring from the column.
Use the str.slice()
Function to Get the Substring of a Column in Pandas
In this approach, we will use the str.slice()
function to obtain the first three characters from the name
column and use it as the username for a particular user. In the slice()
function, we need to pass the string’s start and end indices that we want to extract.
We will use the below code to perform this function.
df["UserName"] = df["Name"].str.slice(0, 3)
print(df)
Let us now look at our updated data frame where we have a new username
column containing the first three characters of the name
column.
Output:
Name UserName
0 Shivesh Jha Shi
1 Sanay Shah San
2 Rutwik Sonawane Rut
We can see in the output that we have successfully extracted the first three characters from our name
column and used them in the new username
column.
Use Square Brackets to Get the Substring of a Column in Pandas
We use the square brackets to access the string and obtain the characters we wish to extract in this approach. We use the below code to perform this action.
df["UserName"] = df["Name"].str[:3]
Output:
Name UserName
0 Shivesh Jha Shi
1 Sanay Shah San
2 Rutwik Sonawane Rut
We can see in this code that we have obtained the new column with the first 3 characters of the existing column.
Use the str.extract()
Function to Get the Substring of a Column in Pandas
This approach will extract the user’s surname from the name. We will use the str.extract()
function to implement this.
Code:
df["LastName"] = df.Name.str.extract(r"\b(\w+)$", expand=True)
Now, let us check the updated data frame.
print(df)
Output:
Name LastName
0 Shivesh Jha Jha
1 Sanay Shah Shah
2 Rutwik Sonawane Sonawane
As seen above, we have successfully obtained the desired results. Therefore, we can get the substring of a column in Pandas using the above techniques.