If I have the following data frame:
a<-sample(c("A", "B", "C"), 50, T)
b<-sample(c(2010:2020), 50, T)
d<-sample(1:10, 50, T)
e<-sample(1000:5000, 50, T)
df<-data.frame(a,b,d,e)
df
a b d e
1 C 2018 3 4458
2 B 2011 1 2870
3 C 2012 10 4262
4 C 2011 1 2803
5 A 2015 7 4638
6 A 2016 10 2525
7 B 2010 4 1779
8 B 2018 8 1084
9 A 2016 4 2401
10 A 2015 3 3308
11 B 2017 4 4410
12 C 2017 3 1882
13 C 2020 4 2944
14 A 2017 2 1722
15 B 2014 3 4607
16 B 2011 10 4768
17 C 2011 3 4987
18 A 2016 9 3916
19 C 2010 2 3237
20 A 2020 5 3422
21 A 2011 1 4959
22 A 2016 10 3097
23 B 2017 2 1906
24 A 2010 4 3621
25 B 2015 10 2606
26 B 2018 6 3892
27 B 2010 5 3759
28 B 2018 7 4247
29 B 2018 8 1523
30 C 2016 3 4817
31 C 2017 9 3350
32 C 2018 1 1711
33 B 2014 1 3695
34 A 2010 1 3184
35 A 2019 5 4451
36 A 2019 10 4535
37 A 2010 7 4926
38 C 2014 2 3750
39 B 2017 10 4187
40 B 2010 5 2756
41 A 2014 2 4466
42 C 2017 2 3538
43 C 2016 5 3823
44 C 2019 3 2895
45 C 2019 9 1290
46 A 2016 7 2715
47 C 2014 2 3898
48 B 2012 10 4126
49 A 2015 1 3755
50 A 2013 3 4545
How can I generate a variable, in the data frame, that indicates the maximum value of b
given the values of the variable a
and d
. That is, find the maximum b for the variables a y d
.
In the example, the maximum value for b
if you a
take "B" and d
take 10 will be 2017.
Thank you very much in advance
I hope you can help me.
With the package
tidyverse
it is really simple. First, we can group by the variablesa
andd
get the maximum ofb
for each group:If we are only interested in a certain group, we can "filter" it, in this case, before
group_by()
to reduce the work of the latter:The same using base R, can also be solved using the function
aggregate()
and a simple select operation:The notation
b ~ a + d
is a formula to indicate how the grouping will operate, in this case it is grouped bya
andd
and the dependent variable will beb
the one to which we will apply the function indicated in the third parameter (max()
)