在 Pandas DataFrame 列中将单列拆分为多列
Pandas 有一种众所周知的方法,可以通过列表的破折号、空格和返回列(Series
)来拆分字符串列或文本列;如果我们谈论 pandas,术语 Series
被称为 Dataframe 列。
我们可以使用 pandas Series.str.split()
函数将字符串拆分为围绕给定分隔符或定界符的多列。它类似于 Python 字符串 split()
方法,但适用于整个 Dataframe 列。我们有最简单的方法来分隔下面的列。
此方法将 Series
字符串与初始索引分开。
Series.str.split(pat=None, n=-1, expand=False)
让我们尝试了解此方法的工作原理
# import Pandas as pd
import pandas as pd
# innitilize Dataframe
df = pd.DataFrame(
{
"Email": [
"Alex.jhon@gmail.com",
"Hamza.Azeez@gmail.com",
"Harry.barton@hotmail.com",
],
"Number": ["+44-3844556210", "+44-2245551219", "+44-1049956215"],
"Location": ["Alameda,California", "Sanford,Florida", "Columbus,Georgia"],
}
)
print("Dataframe series:\n", df)
我们创建了一个 Dataframe df
,包含三列,Email
、Number
和 Location
。请注意,电子邮件列中的字符串具有特定的模式。但是,如果你仔细观察,可以将此列拆分为两列。我们将很好地解决所需的问题。
输出:
Dataframe series :
Email Number Location
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia
我们将使用 Series.str.split()
函数来分隔 Number
列并在 split()
方法中传递 -
。确保将 True
传递给 expand
关键字。
示例 1:
print(
"\n\nSplit 'Number' column by '-' into two individual columns :\n",
df.Number.str.split(pat="-", expand=True),
)
这个例子将用 -
分割系列(数字)的每个值。
输出:
Split 'Number' column into two individual columns :
0 1
0 +44 3844556210
1 +44 2245551219
2 +44 1049956215
如果我们只使用扩展参数 Series.str.split(expand=True)
,这将允许拆分空格,但不能用 -
和 ,
或字符串中存在的任何正则表达式进行分隔,你必须通过 pat
参数。
让我们重命名这些拆分列。
df[["Dialling Code", "Cell-Number"]] = df.Number.str.split("-", expand=True)
print(df)
我们创建了两个新系列 Dialling code
和 Cell-Number
并使用 Number
系列分配值。
输出:
Email Number Location Dialling Code \
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California +44
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida +44
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia +44
Cell-Number
0 3844556210
1 2245551219
2 1049956215
示例 2:
在这个例子中,我们将用 ,
分割 Location
系列。
df[["City", "State"]] = df.Location.str.split(",", expand=True)
print(df)
拆分 Location
系列并将其值存储在单独的系列 City
和 State
中。
输出:
Email Number Location City \
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California Alameda
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida Sanford
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia Columbus
State
0 California
1 Florida
2 Georgia
让我们看看最后一个例子。我们将在 Email
系列中分隔全名。
full_name = df.Email.str.split(pat="@", expand=True)
print(full_name)
输出:
0 1
0 Alex.jhon gmail.com
1 Hamza.Azeez gmail.com
2 Harry.barton hotmail.com
现在我们用 .
分隔名字和姓氏。
df[["First Name", "Last Name"]] = full_name[0].str.split(".", expand=True)
print(df)
输出:
Email Number Location First Name \
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California Alex
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida Hamza
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia Harry
Last Name
0 jhon
1 Azeez
2 barton
如果在 .split()
方法中传递了 expand=True
,n=-1
参数将不起作用。
print(df["Email"].str.split("@", n=-1, expand=True))
输出:
0 1
0 George Washington
1 Hamza Azeez
2 Harry Walker
整个示例代码如下。
# import Pandas as pd
import pandas as pd
# create a new Dataframe
df = pd.DataFrame(
{
"Email": [
"Alex.jhon@gmail.com",
"Hamza.Azeez@gmail.com",
"Harry.barton@hotmail.com",
],
"Number": ["+44-3844556210", "+44-2245551219", "+44-1049956215"],
"Location": ["Alameda,California", "Sanford,Florida", "Columbus,Georgia"],
}
)
print("Dataframe series :\n", df)
print(
"\n\nSplit 'Number' column by '-' into two individual columns :\n",
df.Number.str.split(pat="-", expand=True),
)
df[["Dialling Code", "Cell-Number"]] = df.Number.str.split("-", expand=True)
print(df)
df[["City", "State"]] = df.Location.str.split(",", expand=True)
print(df)
full_name = df.Email.str.split(pat="@", expand=True)
print(full_name)
df[["First Name", "Last Name"]] = full_name[0].str.split(".", expand=True)
print(df)
输出:
Dataframe series :
Email Number Location
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia
Split 'Number' column by '-' into two individual columns :
0 1
0 +44 3844556210
1 +44 2245551219
2 +44 1049956215
Email Number Location Dialling Code \
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California +44
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida +44
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia +44
Cell-Number
0 3844556210
1 2245551219
2 1049956215
Email Number Location Dialling Code \
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California +44
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida +44
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia +44
Cell-Number City State
0 3844556210 Alameda California
1 2245551219 Sanford Florida
2 1049956215 Columbus Georgia
0 1
0 Alex.jhon gmail.com
1 Hamza.Azeez gmail.com
2 Harry.barton hotmail.com
Email Number Location Dialling Code \
0 Alex.jhon@gmail.com +44-3844556210 Alameda,California +44
1 Hamza.Azeez@gmail.com +44-2245551219 Sanford,Florida +44
2 Harry.barton@hotmail.com +44-1049956215 Columbus,Georgia +44
Cell-Number City State First Name Last Name
0 3844556210 Alameda California Alex jhon
1 2245551219 Sanford Florida Hamza Azeez
2 1049956215 Columbus Georgia Harry barton