How to Split a Single Column Into Multiple Columns in Pandas DataFrame Column
Pandas has a well-known method for splitting a string column or text column by dashes, whitespace, and return column (Series
) of lists; if we talk about pandas, the term Series
is called the Dataframe column.
We can use the pandas Series.str.split()
function to break up strings in multiple columns around a given separator or delimiter. It’s similar to the Python string split()
method but applies to the entire Dataframe column. We have the simplest way to separate the column below the following.
This method separates the Series
string from the initial index.
Series.str.split(pat=None, n=-1, expand=False)
Let’s try out to understand the working of this method
# import Pandas as pd
import pandas as pd
# innitilize Dataframe
df = pd.DataFrame(
{
"Email": [
"Alex.jhon@gmail.com",
"Hamza.Azeez@gmail.com",
"Harry.barton@hotmail.com",
],
"Number": ["+44-3844556210", "+44-2245551219", "+44-1049956215"],
"Location": ["Alameda,California", "Sanford,Florida", "Columbus,Georgia"],
}
)
print("Dataframe series:\n", df)
We have created a Dataframe df
with three columns, Email
, Number
and Location
. Note that the strings in the Email column have a specific pattern. However, if you take a closer look, this column can be split into two columns. We will solve the required problem very well.
Output:
Dataframe series :
Email Number Location
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia
We will use the Series.str.split()
function to separate the Number
column and pass the -
in split()
method . Make sure you pass True
to the expand
keyword.
Example 1:
print(
"\n\nSplit 'Number' column by '-' into two individual columns :\n",
df.Number.str.split(pat="-", expand=True),
)
This example will split every value of series (Number) by -
.
Output:
Split 'Number' column into two individual columns :
0 1
0 +44 3844556210
1 +44 2245551219
2 +44 1049956215
If we use only expand parameter Series.str.split(expand=True)
this will allow splitting whitespace but not feasible for separating with -
and ,
or any regular expression if exists in the string, you have to pass pat
parameter.
Let’s rename these split columns.
df[["Dialling Code", "Cell-Number"]] = df.Number.str.split("-", expand=True)
print(df)
We created two new series Dialling code
and Cell-Number
and assigned the values using Number
series.
Output:
Email Number Location Dialling Code \
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California +44
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida +44
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia +44
Cell-Number
0 3844556210
1 2245551219
2 1049956215
Example 2:
In this example we are going to split Location
series by ,
.
df[["City", "State"]] = df.Location.str.split(",", expand=True)
print(df)
Split Location
Series and store its values in individual series City
and State
.
Output:
Email Number Location City \
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California Alameda
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida Sanford
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia Columbus
State
0 California
1 Florida
2 Georgia
Let’s see the last example. We will separate the full name in the Email
series.
full_name = df.Email.str.split(pat="@", expand=True)
print(full_name)
Output:
0 1
0 Alex.jhon gmail.com
1 Hamza.Azeez gmail.com
2 Harry.barton hotmail.com
Now we separate first and last names by .
.
df[["First Name", "Last Name"]] = full_name[0].str.split(".", expand=True)
print(df)
Output:
Email Number Location First Name \
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California Alex
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida Hamza
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia Harry
Last Name
0 jhon
1 Azeez
2 barton
n=-1
parameter will not work if expand=True
is passed in the .split()
method.
print(df["Email"].str.split("@", n=-1, expand=True))
Output:
0 1
0 George Washington
1 Hamza Azeez
2 Harry Walker
The whole example code is below.
# import Pandas as pd
import pandas as pd
# create a new Dataframe
df = pd.DataFrame(
{
"Email": [
"Alex.jhon@gmail.com",
"Hamza.Azeez@gmail.com",
"Harry.barton@hotmail.com",
],
"Number": ["+44-3844556210", "+44-2245551219", "+44-1049956215"],
"Location": ["Alameda,California", "Sanford,Florida", "Columbus,Georgia"],
}
)
print("Dataframe series :\n", df)
print(
"\n\nSplit 'Number' column by '-' into two individual columns :\n",
df.Number.str.split(pat="-", expand=True),
)
df[["Dialling Code", "Cell-Number"]] = df.Number.str.split("-", expand=True)
print(df)
df[["City", "State"]] = df.Location.str.split(",", expand=True)
print(df)
full_name = df.Email.str.split(pat="@", expand=True)
print(full_name)
df[["First Name", "Last Name"]] = full_name[0].str.split(".", expand=True)
print(df)
Output:
Dataframe series :
Email Number Location
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia
Split 'Number' column by '-' into two individual columns :
0 1
0 +44 3844556210
1 +44 2245551219
2 +44 1049956215
Email Number Location Dialling Code \
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California +44
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida +44
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia +44
Cell-Number
0 3844556210
1 2245551219
2 1049956215
Email Number Location Dialling Code \
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California +44
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida +44
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia +44
Cell-Number City State
0 3844556210 Alameda California
1 2245551219 Sanford Florida
2 1049956215 Columbus Georgia
0 1
0 Alex.jhon gmail.com
1 Hamza.Azeez gmail.com
2 Harry.barton hotmail.com
Email Number Location Dialling Code \
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California +44
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida +44
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia +44
Cell-Number City State First Name Last Name
0 3844556210 Alameda California Alex jhon
1 2245551219 Sanford Florida Hamza Azeez
2 1049956215 Columbus Georgia Harry barton