I have two df:
DF1:
df1 <- read.table(text="col1 col2 col3
1 pepe 12
2 juan 2
3 rojo 33
4 rojo 44
5 azul 45
6 azul 5
7 amarillo 100
8 amarillo 200
9 amarillo 150
10 rojo 200
11 azul 25
12 amarillo 23
13 azul 50",
header=TRUE, stringsAsFactors=FALSE)
DF2:
df2 <- read.table(text="
col1 col2 col3
1 pepe 12
2 juan 2
3,4,10 rojo 277
5 azul 45
6 azul 5
7,8,9,12 amarillo 473
11 azul 25
13 azul 50",
header=TRUE, stringsAsFactors=FALSE)
I want to see if the two df's have the same information. To do this I group some rows of df1 using this solution
df1 %>%
mutate(col1=case_when(
col1 %in% c(3, 4, 10) ~ '3,4,10',
col1 %in% c(7, 8, 9, 12) ~ '7,8,9,12',
TRUE ~ as.character(col1)
)) %>%
group_by(col1, col2) %>%
summarise(col3=sum(col3)) -> df11
So my df11 is now:
col1 col2 col3
1 pepe 12
11 azul 25
13 azul 50
2 juan 2
3,4,10 rojo 277
5 azul 45
6 azul 5
7,8,9,12 amarillo 473
To compare the df I order them by the same column
df11[order(df11$col1),] -> df11
df2[order(df2$col1),] -> df2
and compare them
compare(df11,df2)[2]
$detailedResult
col1 col2 col3
TRUE TRUE TRUE
The problem is that this part has to be put in by hand.
mutate(col1=case_when(
col1 %in% c(3, 4, 10) ~ '3,4,10',
col1 %in% c(7, 8, 9, 12) ~ '7,8,9,12'
And the df to compare are changing. Sometimes the groupings are many, so entering it by hand is cumbersome.
I would like the groupings to detect them automatically.
df1$col1 <- as.character(df1$col1)
anti_join(df2, df1, by="col1")
col1 col2 col3
1 3,4,10 rojo 277
2 7,8,9,12 amarillo 473
So I know that the groupings are (3,4,10) and (7,8,9,12) but I don't know how to put it in this part automatically
mutate(col1=case_when(
col1 %in% c(3, 4, 10) ~ '3,4,10',
col1 %in% c(7, 8, 9, 12) ~ '7,8,9,12'
What you can do is build the groups from the
df2
, for example:Expanding the values of
col1
individually and preserving the group, now it is much simpler, we can joindf1
anddf2
through aleft_join
and with the groups already defined we can group:Note:
mutate(grupo=ifelse(is.na(grupo), col1, grupo))
it is for the eventual case in which you had values ofcol1
in thedf1
but not in thedf2