EN VI

mutate column based on difference in col A, with rows identified by relative values in col B?

2024-03-14 00:00:15
How to mutate column based on difference in col A, with rows identified by relative values in col B

I have the dataframe below, showing sampling events and when each individual was observed, I want a new column showing the number of sampling events between the current observation and the previous most recent observation of an individual.

so what I have is this:

tibble(ind=c(rep("A",10),rep("B",10)),
   event=c(c(1:10),c(1:10)),
   obs=c(NA,NA,1,2,NA,NA,NA,4,NA,5,1,NA,NA,NA,2,NA,3,4,NA,5))

    # A tibble: 20 × 3
   ind   event   obs
   <chr> <int> <dbl>
 1 A         1    NA
 2 A         2    NA
 3 A         3     1
 4 A         4     2
 5 A         5    NA
 6 A         6    NA
 7 A         7    NA
 8 A         8     4
 9 A         9    NA
10 A        10     5
11 B         1     1
12 B         2    NA
13 B         3    NA
14 B         4    NA
15 B         5     2
16 B         6    NA
17 B         7     3
18 B         8     4
19 B         9    NA
20 B        10     5

And what I want is this, where missed.events = (event at obs[i])-(event at obs[i-1])-1:

# A tibble: 20 × 4
   ind   event   obs missed.events
   <chr> <int> <dbl>         <dbl>
 1 A         1    NA            NA
 2 A         2    NA            NA
 3 A         3     1            NA
 4 A         4     2             0
 5 A         5    NA            NA
 6 A         6    NA            NA
 7 A         7    NA            NA
 8 A         8     4             3
 9 A         9    NA            NA
10 A        10     5             1
11 B         1     1            NA
12 B         2    NA            NA
13 B         3    NA            NA
14 B         4    NA            NA
15 B         5     2             3
16 B         6    NA            NA
17 B         7     3             1
18 B         8     4             0
19 B         9    NA            NA
20 B        10     5             1

I'm sure I am missing something obvious here, and I've tried various sorts of case_when() and if_else() conditions within a group_by(ind) %>% mutate() argument, but can't quite figure this out.

thanks in advance

Solution:

Obtain the subset with missed events and join back to the original dataframe

df %>%
  filter(!is.na(obs)) %>%
  mutate(missed.events=event-lag(event)-1, .by=ind) %>%
  right_join(df)

Joining, by = c("ind", "event", "obs")
# A tibble: 20 x 4
   ind   event   obs missed.events
   <chr> <int> <dbl>         <dbl>
 1 A         1    NA            NA
 2 A         2    NA            NA
 3 A         3     1            NA
 4 A         4     2             0
 5 A         5    NA            NA
 6 A         6    NA            NA
 7 A         7    NA            NA
 8 A         8     4             3
 9 A         9    NA            NA
10 A        10     5             1
11 B         1     1            NA
12 B         2    NA            NA
13 B         3    NA            NA
14 B         4    NA            NA
15 B         5     2             3
16 B         6    NA            NA
17 B         7     3             1
18 B         8     4             0
19 B         9    NA            NA
20 B        10     5             1
Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login