How to Split Strings Into Two List Columns Using str.split in Python Pandas
-
Use the
str.split()
Function to Split Strings Into Two List/Columns in Python Pandas - Use the Basic Syntax to Split a String Column in a Pandas Dataframe Into Multiple Columns
- Convert a String to List in Python Pandas
- Create Separate Columns From Strings in Python Pandas
- Conclusion
Pandas have a method for splitting strings based on a separator/delimiter. We will use the pandas str.split()
function.
Use the str.split()
Function to Split Strings Into Two List/Columns in Python Pandas
The string can be saved as a series list or constructed from a single, separated string, multiple column dataframes.
Functions used are similar to Python’s default split()
method, but they can only be applied to a single string.
Syntax:
Syntax:Series.str.split(pat=None, n=-1, expand=False)
Let's define each of the parameters of syntax
Parameters:
pat:String value, separator, or delimiter used to separate strings
n=The maximum number of separations to make in a single string; the default is -1, which signifies all.
expand: If True, this Boolean value returns a data frame with different values in separate columns. Otherwise, it returns a series containing a collection of strings.
return: Depending on the expand parameter, a series of lists or a data frame will be generated.
First, we explain with a simple example and then a CSV file.
Use the Basic Syntax to Split a String Column in a Pandas Dataframe Into Multiple Columns
data[["A", "B"]] = data["A"].str.split(",", 1, expand=True)
See the examples below, which demonstrate the use of this syntax in practice.
Split Column by Comma:
import pandas as pd
df = pd.DataFrame(
{"Name": ["Anu,Ais ", "Bag, Box", "fox, fix"], "points": [112, 104, 127]}
)
df
Output:
# split team column into two columns
df[["Name", "lastname"]] = df["Name"].str.split(",", 2, expand=True)
df
Output:
For the CSV file download used in code, click here.
Student Performance data is contained in the dataframe used in the following examples. An image of the dataframe before any manipulations is attached.
We explain the splitting of string in two ways.
- Convert a string to a list
- Create separate columns from strings
Convert a String to List in Python Pandas
The split function is used in this data to split the lunch column at each d
. The option is set to 1, the maximum number of separations in a single string is 1.
The expand argument is set to False. Instead of a series of dataframe, a string list is returned.
import pandas as pd
df = pd.read_csv("/content/drive/MyDrive/StudentsPerformance.csv")
# dropping null value columns to avoid errors
df.dropna(inplace=True)
# new data frame with split value columns
df["lunch"] = df["lunch"].str.split("d", n=1, expand=True)
# df display
df.head(9)
Output:
The output image shows that the lunch column now has a list because the n option was set to 1.
The string was separated at the first occurrence of d
and not at subsequent occurrences (Max 1 separation in a string).
Create Separate Columns From Strings in Python Pandas
In this example, the parental level of education column is separated by a space " "
, and the expand option is set to True.
This means it will return a dataframe with all separated strings in different columns. The Dataframe is then used to build new columns.
While the old parental level of education column is removed using the drop()
method.
import pandas as pd
df = pd.read_csv("/content/drive/MyDrive/StudentsPerformance.csv")
# dropping null value columns to avoid errors
df.dropna(inplace=True)
new = df["parental level of education"].str.split(" ", n=1, expand=True)
df["educational level"] = new[0]
df["insititute"] = new[1]
# Dropping old Name columns
df.drop(columns=["parental level of education"], inplace=True)
# df display
df.head(9)
Output:
The split()
function provided a new dataframe that created two new columns (educational level and institute) in the dataframe.
The above fig shows new columns. We can also see the new columns using a new keyword which shows newly created columns.
new.head(9)
Output:
Conclusion
As a result, it’s common to have a section in your Pandas information casing that requires splitting into two segments in the information outline.
For instance, if one of the sections in your information outline is a full name, you may need to divide it into first and last names.