Pandas DataFrame DataFrame.dropna() 函数
Minahil Noor
2023年1月30日
-
pandas.DataFrame.dropna()
语法 -
示例代码:
DataFrame.dropna()
删除行 -
示例代码:
DataFrame.dropna()
删除列 -
示例代码:
DataFrame.dropna()
与how=all
-
示例代码:
DataFrame.dropna()
与指定的子集或阈值 -
示例代码:
DataFrame.dropna()
与inplace=True
pandas.DataFrame.dropna()
函数通过丢弃包含空值的行或列,从 DataFrame
中删除空值(缺失值)。
NaN
(Not a Number
)和 NaT
(Not a Time
)代表空值。DataFrame.dropna()
检测这些值并相应地过滤 DataFrame
。
pandas.DataFrame.dropna()
语法
DataFrame.dropna(axis, how, thresh, subset, inplace)
参数
axis |
它决定轴是行还是列。 如果它是 0 或 'index' ,那么它将删除包含缺失值的行。如果它是 1 或 'column' ,那么它将删除包含缺失值的列。默认情况下,它的值是 0 |
how |
这个参数决定函数如何删除行或列。它只接受两个字符串,可以是 all 或 all 。默认情况下,它被设置为 any 。any - 如果行或列中有任何空值,就会删除它。all - 如果行或列中缺少所有值,则放弃该行或列 |
thresh |
它是一个整数,指定了防止行或列丢失的非缺失值的最少数量 |
subset |
它是一个数组,其中有行或列的名称,用于指定删除程序 |
inplace |
它是一个布尔值,如果设置为 True ,将就地改变调用者 DataFrame 。默认情况下,它的值是 False |
返回值
它根据传递的参数返回一个过滤后的 DataFrame
,其中包含删除的行或列。
示例代码:DataFrame.dropna()
删除行
默认情况下,轴为 0,即行,所以所有的输出都有行掉。
import pandas as pd
dataframe=pd.DataFrame({'Attendance': {0: 60, 1: None, 2: 80,3: None, 4: 95},
'Name': {0: 'Olivia', 1: 'John', 2: 'Laura',3: 'Ben',4: 'Kevin'},
'Obtained Marks': {0: None, 1: 75, 2: 82, 3: 64, 4: None}})
print(dataframe)
示例 DataFrame
如下。
Attendance Name Obtained Marks
0 60.0 Olivia NaN
1 NaN John 75.0
2 80.0 Laura 82.0
3 NaN Ben 64.0
4 95.0 Kevin NaN
这个函数的所有参数都是可选的。如果我们不传递任何参数,那么函数将丢弃所有包含一个空值的行。
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna()
print(dataframe1)
输出:
Attendance Name Obtained Marks
2 80.0 Laura 82.0
丢弃所有包含一个缺失值的行。
示例代码:DataFrame.dropna()
删除列
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna(axis=1)
print(dataframe1)
输出:
Name
0 Olivia
1 John
2 Laura
3 Ben
4 Kevin
因为我们在 DataFrame.dropna()
方法中设置了 axis=1
,所以它删除了所有包含一个缺失值的列。
示例代码:DataFrame.dropna()
与 how=all
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna(axis=1, how="all")
print(dataframe1)
输出:
Attendance Name Obtained Marks
0 60.0 Olivia NaN
1 NaN John 75.0
2 80.0 Laura 82.0
3 NaN Ben 64.0
4 95.0 Kevin NaN
包含缺失值的行没有被删除,因为 how
参数的值被设置为 all
,这意味着该行的所有值都应该是空的。
如果在指定的轴上缺少所有的值,那么 DataFrame.dropna()
方法会丢弃该轴,即使 how
被设置为 all
。
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: None, 2: None, 3: None, 4: None},
}
)
print(dataframe)
print("--------")
dataframe1 = dataframe.dropna(axis=1, how="all")
print(dataframe1)
输出:
Attendance Name Obtained Marks
0 60.0 Olivia None
1 NaN John None
2 80.0 Laura None
3 NaN Ben None
4 95.0 Kevin None Attendance Name
0 60.0 Olivia
1 NaN John
2 80.0 Laura
3 NaN Ben
4 95.0 Kevin
示例代码:DataFrame.dropna()
与指定的子集或阈值
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna(thresh=3)
print(dataframe1)
输出:
Attendance Name Obtained Marks
2 80.0 Laura 82.0
thresh
的值是 3,这意味着为了防止掉落,至少需要 3 个非空值。
我们也可以指定 subset
。
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna(subset=["Attendance", "Name"])
print(dataframe1)
输出:
Attendance Name Obtained Marks
0 60.0 Olivia NaN
2 80.0 Laura 82.0
4 95.0 Kevin NaN
根据 Attendance
和 Name
列,删除缺失值的行。如果只有其他列中的值比如这里的 Obtained Marks
列有缺失值,它就不会删除记录。
示例代码:DataFrame.dropna()
与 inplace=True
DataFrame.dropna()
如果 inplace
被设置为 True
,则调用者 DataFrame
就地改变。
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna(inplace=True)
print(dataframe1)
输出:
None
该参数对调用者 DataFrame
进行了原地修改,返回 None
。