R 資料框的選定列的總和

Jesse John 2023年1月30日 R R DataFrame
  1. 使用 Base R 的 rowSums() 函式計算資料框選定列的總和
  2. 使用 Base R 的 apply() 函式計算資料框選定列的總和
  3. 使用 Tidyverse 函式計算 R 中資料框選定列的總和
  4. まとめ
R 資料框的選定列的總和

在很多情況下,我們想使用其他列的值建立一個新列。本文將教授如何建立一個新列來計算 R 中所選資料框列的總和。

我們將學習建立新列的三種方法:使用基礎 R 中的 rowSums()apply() 以及 Tidyverse 中的一組函式。




# Create five variables.
Student = c("Student A", "Student B", "Student C")
Hobby = c("Music", "Sports", "Cycling")
Maths = c(40, 35, 30)
Statistics = c(30, 35, 20)
Programming = c(25, 20, 35)

# Create a data frame from the variables.
df_students = data.frame(Student, Hobby, Maths, Statistics, Programming)

# View the data frame.

使用 Base R 的 rowSums() 函式計算資料框選定列的總和

我們將使用 data_frame$new_column 語法建立一個新列,並使用 rowSums() 函式為其賦值。要新增的列將使用子集語法直接在函式中給出。


# This adds the new column to the data frame.
df_students$myRowSums = rowSums(df_students[,c("Maths", "Statistics", "Programming")])

# View the data frame with the added column.

# We can also give a vector of column positions.
# df_students$myRowSums = rowSums(df_students[,c(3:5)])


> # View the data frame with the added column.
> df_students
    Student   Hobby Maths Statistics Programming myRowSums
1 Student A   Music    40         30          25        95
2 Student B  Sports    35         35          20        90
3 Student C Cycling    30         20          35        85

我們還可以儲存列的名稱以新增為字串向量。我們可以將此向量傳遞給 rowSums() 函式。


# Save the list of columns as a vector of strings.
col_list = c("Maths", "Statistics", "Programming")

# Pass the vector of strings to the subsetting square brackets.
df_students$myRowSums = rowSums(df_students[,col_list])

# View the data frame with the added column.

文件指出,rowSums() 函式等效於帶有 FUN = sumapply() 函式,但要快得多。

它指出 rowSums() 函式模糊了一些 NaNNA 的細微之處。

使用 Base R 的 apply() 函式計算資料框選定列的總和

我們將這三個引數傳遞給 apply() 函式。

  1. 資料框所需的列。
  2. 要保留的資料框的維度。1 表示行。
  3. 我們要計算的函式,sum


# We will recreate the data frame from the variables.
df_students = data.frame(Student, Hobby, Maths, Statistics, Programming)

# In base R, we can delete a column by setting its name to NULL.
# df_students$myRowSums = NULL

# A new column gets created.
df_students$myApplySums = apply(df_students[,col_list], 1, sum)

# View the data frame with the added column.



# Names of columns as a vector of strings.
df_students$myApplySums = apply(df_students[,c("Maths", "Statistics", "Programming")], 1, sum)

# Vector of columns positions.
df_students$myApplySums = apply(df_students[,c(3, 4, 5)], 1, sum)

使用 Tidyverse 函式計算 R 中資料框選定列的總和

我們可以將 dplyrmutate() 函式與 Tidyverse 中的其他函式結合使用來建立總和列。

在使用 Tidyverse 方法時,我們需要了解一些細節。Tibbles 刪除行名稱,並且對有效數字、尾隨零和尾隨小數具有不同的預設值。

首先,我們需要載入 dplyr 包並建立一個 tibble。


  1. 管道運算子,%>%,以避免巢狀某些函式。
  2. rowwise() 使其他函式在行上工作。
  3. mutate() 新增列。
  4. sum() 用於加法。
  5. c_across() 旨在與 rowwise() 一起使用。
  6. all_of() 從字元向量中選擇值。

rowwise() 是一種分組型別。使用後,我們可能需要使用 ungroup(data_frame_name) 並將未分組的版本儲存為物件。


# We will recreate the data frame from the variables.
df_students = data.frame(Student, Hobby, Maths, Statistics, Programming)

# Load the dplyr package.

# Create a tibble from the data frame.
# This could have been done with the next step but obscured the main point.
tb_students = as_tibble(df_students)

# We have to assign the RHS to an object to save the column to the object.
# It can be the same as the original tibble.
tb_students = tb_students %>% rowwise() %>% mutate(myTidySum = sum(c_across(all_of(col_list))))

# View the rowwise tibble with the added column.


> # View the rowwise tibble with the added column.
> tb_students
# A tibble: 3 x 6
# Rowwise:
  Student   Hobby   Maths Statistics Programming myTidySum
  <chr>     <chr>   <dbl>      <dbl>       <dbl>     <dbl>
1 Student A Music      40         30          25        95
2 Student B Sports     35         35          20        90
3 Student C Cycling    30         20          35        85


但是 Tidyverse 方法的一大優勢是它提供了許多指定列的方法。

根據我們的需要,我們可以使用選擇輔助函式,例如 starts_with()contains()where(),使用聯合和交集組合選擇,指定我們不想選擇的列,等等在。



tb_students = as_tibble(df_students)
# Take the union of the column names.
tb_students = tb_students %>% rowwise() %>% mutate(myTidySum = sum(c_across(Maths | Statistics | Programming)))

tb_students = as_tibble(df_students)
# Give a range of columns as a range of names.
tb_students = tb_students %>% rowwise() %>% mutate(myTidySum = sum(c_across(Maths:Programming)))

tb_students = as_tibble(df_students)
# Give a range of columns as a range of column positions.
tb_students = tb_students %>% rowwise() %>% mutate(myTidySum = sum(c_across(3:5)))

tb_students = as_tibble(df_students)
# Select all columns having 'at' or 'am'
tb_students = tb_students %>% rowwise() %>% mutate(myTidySum = sum(c_across(contains('at') | contains('am'))))

tb_students = as_tibble(df_students)
# Select all columns except Student and Hobby.
# Make sure the tibble only has the required columns before running the next line.
tb_students = tb_students %>% rowwise() %>% mutate(myTidySum = sum(c_across(!c(Student, Hobby))))


請參閱 R for Data Science 中的章節以瞭解管道運算子。

有關 rowwise()c_across() 的幫助,請參閱 Tidyverse 函式參考

有關 tidyselect 輔助函式,請參閱 tidyselect 選擇語言

在 R Studio 中,有關 rowSums()apply() 的幫助,請單擊 Help > Search R Help 並在搜尋框中鍵入不帶括號的函式名稱。或者,在 R 控制檯的命令提示符處鍵入一個問號,後跟函式名稱。


rowSums()apply() 函式使用簡單。要新增的列可以使用名稱或列位置直接在函式中指定,也可以作為字元向量提供。

Tidyverse 方法雖然有點複雜,但提供了許多替代方法來指定要新增的列。


Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe
作者: Jesse John
Jesse John avatar Jesse John avatar

Jesse is passionate about data analysis and visualization. He uses the R statistical programming language for all aspects of his work.