Pandas 刪除帶有 NaN 的行
Suraj Joshi
2023年1月30日
-
Pandas 使用
DataFrame.notna()
方法刪除帶有 NaN 的行 -
Pandas 使用
DataFrame.dropna()
方法只刪除所有列都是NaN
值的行 -
Pandas 使用
DataFrame.dropna()
方法僅在某一列的值為NaN
的情況下才刪除行 -
Pandas 使用
DataFrame.dropna()
方法刪除任意列為NaN
值的行
本教程解釋了我們如何使用 DataFrame.notna()
和 DataFrame.dropna()
方法刪除所有帶有 NaN
值的行。
我們將在下面的示例程式碼中使用 DataFrame。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
"Age": [19, None, 18, 21, None],
"Income($)": [4000, 5000, None, 3500, None],
"Expense($)": [3000, 2000, 2500, 25000, None],
}
)
print(data)
輸出:
Name Age Income($) Expense($)
0 Alice 19.0 4000.0 3000.0
1 Steven NaN 5000.0 2000.0
2 Neesham 18.0 NaN 2500.0
3 Chris 21.0 3500.0 25000.0
4 Alice NaN NaN NaN
Pandas 使用 DataFrame.notna()
方法刪除帶有 NaN 的行
DataFrame.notna()
方法返回一個布林物件,其行數和列數與呼叫者 DataFrame 相同。如果元素不是 NaN
,它將被對映到布林物件中的 True
值,如果元素是 NaN
,它將被對映到 False
值。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
"Age": [19, None, 18, 21, None],
"Income($)": [4000, 5000, None, 3500, None],
"Expense($)": [3000, 2000, 2500, 25000, None],
}
)
print("Initial DataFrame:")
print(data)
print("")
data = data[data["Income($)"].notna()]
print("DataFrame after removing rows with NaN value in Income Field:")
print(data)
輸出:
Initial DataFrame:
Name Age Income($) Expense($)
0 Alice 19.0 4000.0 3000.0
1 Steven NaN 5000.0 2000.0
2 Neesham 18.0 NaN 2500.0
3 Chris 21.0 3500.0 25000.0
4 Alice NaN NaN NaN
DataFrame after removing rows with NaN value in Income Field:
Name Age Income($) Expense($)
0 Alice 19.0 4000.0 3000.0
1 Steven NaN 5000.0 2000.0
3 Chris 21.0 3500.0 25000.0
這裡,我們將 notna()
方法應用於 data
的 Income($)
列,它將返回一個系列物件,根據該列的值,有 True
或 False
值。當我們將布林物件作為索引傳遞給原始 DataFrame 時,我們只得到 Income($)
列沒有 NaN
值的行。
Pandas 使用 DataFrame.dropna()
方法只刪除所有列都是 NaN
值的行
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Id": [621, 645, 210, 345, None],
"Age": [19, None, 18, 21, None],
"Income($)": [4000, 5000, None, 3500, None],
"Expense($)": [3000, 2000, 2500, 25000, None],
}
)
print("Initial DataFrame:")
print(data)
print("")
data = data.dropna(how="all")
print("DataFrame after removing rows with NaN value in All Columns:")
print(data)
輸出:
Initial DataFrame:
Id Age Income($) Expense($)
0 621.0 19.0 4000.0 3000.0
1 645.0 NaN 5000.0 2000.0
2 210.0 18.0 NaN 2500.0
3 345.0 21.0 3500.0 25000.0
4 NaN NaN NaN NaN
DataFrame after removing rows with NaN value in All Columns:
Id Age Income($) Expense($)
0 621.0 19.0 4000.0 3000.0
1 645.0 NaN 5000.0 2000.0
2 210.0 18.0 NaN 2500.0
3 345.0 21.0 3500.0 25000.0
它只刪除 DataFrame 中所有欄位中含有 NaN
值的行。我們在 dropna()
方法中設定 how='all'
,讓該方法只在行的所有列值都是 NaN
時才刪除行。
Pandas 使用 DataFrame.dropna()
方法僅在某一列的值為 NaN
的情況下才刪除行
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Id": [621, 645, 210, 345, None],
"Age": [19, None, 18, 21, None],
"Income($)": [4000, 5000, None, 3500, None],
"Expense($)": [3000, 2000, 2500, 25000, None],
}
)
print("Initial DataFrame:")
print(data)
print("")
data = data.dropna(subset=["Id"])
print("DataFrame after removing rows with NaN value in Id Column:")
print(data)
輸出:
Initial DataFrame:
Id Age Income($) Expense($)
0 621.0 19.0 4000.0 3000.0
1 645.0 NaN 5000.0 2000.0
2 210.0 18.0 NaN 2500.0
3 345.0 21.0 3500.0 25000.0
4 NaN NaN NaN NaN
DataFrame after removing rows with NaN value in Id Column:
Id Age Income($) Expense($)
0 621.0 19.0 4000.0 3000.0
1 645.0 NaN 5000.0 2000.0
2 210.0 18.0 NaN 2500.0
3 345.0 21.0 3500.0 25000.0
它將刪除 DataFrame 中所有僅在 Id
列中具有 NaN
值的列。
Pandas 使用 DataFrame.dropna()
方法刪除任意列為 NaN
值的行
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Id": [621, 645, 210, 345, None],
"Age": [19, None, 18, 21, None],
"Income($)": [4000, 5000, None, 3500, None],
"Expense($)": [3000, 2000, 2500, 25000, None],
}
)
print("Initial DataFrame:")
print(data)
print("")
data = data.dropna()
print("DataFrame after removing rows with NaN value in any column:")
print(data)
輸出:
Initial DataFrame:
Id Age Income($) Expense($)
0 621.0 19.0 4000.0 3000.0
1 645.0 NaN 5000.0 2000.0
2 210.0 18.0 NaN 2500.0
3 345.0 21.0 3500.0 25000.0
4 NaN NaN NaN NaN
DataFrame after removing rows with NaN value in any column:
Id Age Income($) Expense($)
0 621.0 19.0 4000.0 3000.0
3 345.0 21.0 3500.0 25000.0
預設情況下,dropna()
方法將刪除所有至少有一個 NaN
值的行。
作者: Suraj Joshi
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn相關文章 - Pandas DataFrame Row
- 如何獲取 Pandas DataFrame 的行數
- 如何對 Pandas 中的 DataFrame 行隨機排序
- 如何根據 Pandas 中的列值過濾 DataFrame 行
- 如何在 Pandas 中遍歷 DataFrame 的行
- Pandas 中如何獲取特定列滿足給定條件的所有行的索引
- Pandas DataFrame 刪除某行