EN VI

Find the name of the maximum value by row by range of columns in R?

2024-03-14 09:30:05
How to Find the name of the maximum value by row by range of columns in R

I have a similar problem as outlined in this question. The problem is that I have more than 200 columns, so I can't list them all as in this code:

  df %>%
  rowwise %>%
  mutate(Max = names(.)[which.max(c(x, y, z))]) %>%
  ungroup

I have tried using this code but it is giving the max of all my columns and I just need the max of columns 3 to 223. Columns 1 and 2 are ID and year and I need them.

df %>%
  rowwise() %>%
  mutate(Max = names(.)[which.max(c_across(3:223))]) %>%
  ungroup()

All my columns have different names so I can't use

mutate(Max = names(.)[which.max(c_across(starts_with("X")))])

How can I find the name of the column with the maximum value per row of a dataset with around 200 columns without listing all of the 200 columns' names?

Solution:

Your second approach is on the right track, you just need to subset names(.) as well:

set.seed(13)
library(dplyr)

# example data
df <- data.frame(id = 1:5, year = 2020:2024)
for (L in LETTERS[1:5]) df[[L]] <- sample(5)

df %>%
  rowwise() %>%
  mutate(Max = names(.[3:7])[which.max(c_across(3:7))]) %>%
  ungroup()

But faster and more succinct to use max.col():

df %>%
  mutate(Max = names(.[3:7])[max.col(.[3:7], "first")])

Result:

  id year A B C D E Max
1  1 2020 3 4 3 5 5   D
2  2 2021 4 5 1 4 3   B
3  3 2022 1 1 2 1 1   C
4  4 2023 2 2 4 2 2   C
5  5 2024 5 3 5 3 4   A
Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login