Suppose I have a data frame with 3 columns: the first indicates the identifier of each individual, and the other two a value corresponding to different events.
ID <- c(1,1,1,2,2,2,2,3,3)
Event1<- c(2,3,5,2,5,2,1,3,5)
Event2 <- c(1,6,2,1,2,1,2,5,6)
dataFrame <- data.frame(individuo=ID, evento1=Event1, evento2=Event2)
An individual may have more than one row. Is it possible to identify which row contains the highest value, between both columns, for each individual?
For example, the highest value for individual 1 is found in the second row, third column.
If I needed to identify the value based solely on a single column of values (for example, event1), I would apply the following code to get the desired result:
Resultado_deseado <- dataFrame%>%
group_by(individuo)%>%
mutate(ranking = dense_rank(evento1))%>%
filter(ranking == max(ranking))
But I don't see how to do something similar considering the values of both columns.
Any help will be greatly appreciated!
On the other hand, perhaps I understood your question from another perspective, that is, you want to identify the row that contains the highest value, considering more than one column for each individual, if so, the following could help you:
first, with the row_number() we create an identifier for each row, with the pivot_longer, we create an event_type variable that contains the different events, since you can have many columns to search for the maximum value.
second we calculate the ranking as the row that contains the maximum value of the different events
finally we return to the original format of the base with the pivot_wider(), restoring the event_type to its two columns in this particular case, since it can be more columns in another case.