在 Pandas DataFrame 中將多列中的值合併為一列

Fariba Laiq 2024年2月15日 Pandas Pandas DataFrame

從頭開始編寫程式碼以將多列中的值合併到 Pandas DataFrame 中的一列中
在 Pandas DataFrame 中使用 DuckDB 執行 SQL 查詢以將多列中的值合併為一列
在 Pandas DataFrame 中使用 combine_first() 方法將多列中的值合併為一列
在 Pandas DataFrame 中使用 bfill() 方法將多列中的值合併為一列
在 Pandas DataFrame 中使用 mask() 方法將多列中的值合併為一列

本教程將演示將多列中的第一個非空值合併或返回到 Python Pandas DataFrame 中的另一列。

例如，如果它不為空，則將第 1 列的值用於新的第 3 列；否則，如果第 1 列為空，則將第 2 列的值用於新的第 3 列。

我們可以在 Pandas DataFrame 中以多種方式完成此任務。

從頭開始編寫程式碼以將多列中的值合併到 Pandas DataFrame 中的一列中

我們可以從頭開始編寫邏輯來合併值。我們在以下程式碼中建立了一個 Pandas DataFrame，其中包含三列，名為 Age_in_Years、Age_in_Months 和 Age_in_Days。

DataFrame 也有一些缺失值。如果我們要顯示年齡，首先，我們將輸出年齡，以年為單位。

如果該列中的值為 Null，我們將以月為單位顯示年齡。同樣，如果以月為單位的值為 Null，我們將以天為單位顯示年齡。

為此，我們從頭開始編寫程式碼來獲取第一個非空列的值。該函式正在遍歷所有 DataFrame 列，並在找到非空值的地方返回該值；否則，它會檢查其他列中的值。

示例程式碼：

# Python 3.x
import pandas as pd

df_age = pd.DataFrame(
    {
        "Age_in_Years": ["4 y", None, None, None],
        "Age_in_Months": ["48 m", "24 m", None, None],
        "Age_in_Days": ["1440 d", None, "2520 d", None],
    }
)


def get_first_non_null(dfrow, cols):
    for c in cols:
        if pd.notnull(dfrow[c]):
            return dfrow[c]
    return None


cols = ["Age_in_Years", "Age_in_Months", "Age_in_Days"]
df_age["Age"] = df_age.apply(lambda x: get_first_non_null(x, cols), axis=1)
display(df_age)

輸出：

Pandas 合併

在 Pandas DataFrame 中使用 DuckDB 執行 SQL 查詢以將多列中的值合併為一列

示例程式碼：

DuckDB 是一個 Python API 和一個使用 SQL 查詢與資料庫互動的資料庫管理系統。這個包有一個內建的合併方法，可以從列中選擇第一個非空值。

我們將在 SQL 查詢中將列名傳遞給 coalesce 方法。

# Python 3.x
import pandas as pd
import duckdb

df_age = pd.DataFrame(
    {
        "Age_in_Years": ["4 y", None, None, None],
        "Age_in_Months": ["48 m", "24 m", None, None],
        "Age_in_Days": ["1440 d", None, "2520 d", None],
    }
)
df_age = duckdb.query(
    """SELECT Age_in_Years, Age_in_Months, Age_in_Days, coalesce(Age_in_Years, Age_in_Months, Age_in_days) as Age from df_age"""
).to_df()
display(df_age)

輸出：

Pandas 合併

在 Pandas DataFrame 中使用 `combine_first()` 方法將多列中的值合併為一列

combine_first() 方法用來自第二個 DataFrame 的非空資料填充一個 DataFrame 中的空值，以組合兩個 DataFrame 物件。

在下面的程式碼中，我們將返回列值。我們將把 Age_in_Years 與 Age_in_Months 結合起來，將 Age_in_Months 與 Age_in_Days 結合起來。

它將返回來自 Age_in_years 的值。如果為 Null，它將返回來自 Age_in_Months 的值。同樣，如果這也是 Null，它將從 Age_in_Days 返回一個值。

實際 DataFrame 中的資料不會改變，我們將在 Age 列中獲得我們想要的值。

示例程式碼：

# Python 3.x
import pandas as pd

df_age = pd.DataFrame(
    {
        "Age_in_Years": ["4 y", None, None, None],
        "Age_in_Months": ["48 m", "24 m", None, None],
        "Age_in_Days": ["1440 d", None, "2520 d", None],
    }
)
df_age["Age"] = (
    df_age["Age_in_Years"]
    .combine_first(df_age["Age_in_Months"])
    .combine_first(df_age["Age_in_Days"])
)
df_age

輸出：

Pandas 合併

在 Pandas DataFrame 中使用 `bfill()` 方法將多列中的值合併為一列

bfill 代表反向填充。此方法將 NaN 替換為下一行或下一列值。

在這裡，如果當前列中的值為 Null，我們將指定 axis=1 從下一列返回值。

示例程式碼：

# Python 3.x
import pandas as pd

df_age = pd.DataFrame(
    {
        "Age_in_Years": ["4 y", None, None, None],
        "Age_in_Months": ["48 m", "24 m", None, None],
        "Age_in_Days": ["1440 d", None, "2520 d", None],
    }
)
df_age["Age"] = df_age.bfill(axis=1).iloc[:, 0]
df_age

輸出：

Pandas 合併

在 Pandas DataFrame 中使用 `mask()` 方法將多列中的值合併為一列

mask() 方法的工作方式與 if-then 類似。

如果某個列的 null 條件為 false，則將使用其值。否則，它將從其他指定列中獲取值。

示例程式碼：

# Python 3.x
import pandas as pd

df_age = pd.DataFrame(
    {
        "Age_in_Years": ["4 y", None, None, None],
        "Age_in_Months": ["48 m", "24 m", None, None],
        "Age_in_Days": ["1440 d", None, "2520 d", None],
    }
)
df_age["Age"] = (
    df_age["Age_in_Years"]
    .mask(pd.isnull, df_age["Age_in_Months"])
    .mask(pd.isnull, df_age["Age_in_Days"])
)
df_age

輸出：

Pandas 合併

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe

作者： Fariba Laiq

I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.

從頭開始編寫程式碼以將多列中的值合併到 Pandas DataFrame 中的一列中

在 Pandas DataFrame 中使用 DuckDB 執行 SQL 查詢以將多列中的值合併為一列

在 Pandas DataFrame 中使用 combine_first() 方法將多列中的值合併為一列

在 Pandas DataFrame 中使用 bfill() 方法將多列中的值合併為一列

在 Pandas DataFrame 中使用 mask() 方法將多列中的值合併為一列

相關文章 - Pandas DataFrame

在 Pandas DataFrame 中使用 `combine_first()` 方法將多列中的值合併為一列

在 Pandas DataFrame 中使用 `bfill()` 方法將多列中的值合併為一列

在 Pandas DataFrame 中使用 `mask()` 方法將多列中的值合併為一列