EN VI

mutate column based on difference in col A, with rows identified by relative values in col B?

How to mutate column based on difference in col A, with rows identified by relative values in col B

I have the dataframe below, showing sampling events and when each individual was observed, I want a new column showing the number of sampling events between the current observation and the previous most recent observation of an individual.

so what I have is this:

tibble(ind=c(rep("A",10),rep("B",10)),
   event=c(c(1:10),c(1:10)),
   obs=c(NA,NA,1,2,NA,NA,NA,4,NA,5,1,NA,NA,NA,2,NA,3,4,NA,5))

    # A tibble: 20 × 3
   ind   event   obs
   <chr> <int> <dbl>
 1 A         1    NA
 2 A         2    NA
 3 A         3     1
 4 A         4     2
 5 A         5    NA
 6 A         6    NA
 7 A         7    NA
 8 A         8     4
 9 A         9    NA
10 A        10     5
11 B         1     1
12 B         2    NA
13 B         3    NA
14 B         4    NA
15 B         5     2
16 B         6    NA
17 B         7     3
18 B         8     4
19 B         9    NA
20 B        10     5

And what I want is this, where missed.events = (event at obs[i])-(event at obs[i-1])-1:

# A tibble: 20 × 4
   ind   event   obs missed.events
   <chr> <int> <dbl>         <dbl>
 1 A         1    NA            NA
 2 A         2    NA            NA
 3 A         3     1            NA
 4 A         4     2             0
 5 A         5    NA            NA
 6 A         6    NA            NA
 7 A         7    NA            NA
 8 A         8     4             3
 9 A         9    NA            NA
10 A        10     5             1
11 B         1     1            NA
12 B         2    NA            NA
13 B         3    NA            NA
14 B         4    NA            NA
15 B         5     2             3
16 B         6    NA            NA
17 B         7     3             1
18 B         8     4             0
19 B         9    NA            NA
20 B        10     5             1

I'm sure I am missing something obvious here, and I've tried various sorts of case_when() and if_else() conditions within a group_by(ind) %>% mutate() argument, but can't quite figure this out.

thanks in advance

Solution:

Obtain the subset with missed events and join back to the original dataframe

df %>%
  filter(!is.na(obs)) %>%
  mutate(missed.events=event-lag(event)-1, .by=ind) %>%
  right_join(df)

Joining, by = c("ind", "event", "obs")
# A tibble: 20 x 4
   ind   event   obs missed.events
   <chr> <int> <dbl>         <dbl>
 1 A         1    NA            NA
 2 A         2    NA            NA
 3 A         3     1            NA
 4 A         4     2             0
 5 A         5    NA            NA
 6 A         6    NA            NA
 7 A         7    NA            NA
 8 A         8     4             3
 9 A         9    NA            NA
10 A        10     5             1
11 B         1     1            NA
12 B         2    NA            NA
13 B         3    NA            NA
14 B         4    NA            NA
15 B         5     2             3
16 B         6    NA            NA
17 B         7     3             1
18 B         8     4             0
19 B         9    NA            NA
20 B        10     5             1