I have a data.frame
with a certain structure:
ucba <- data.frame(UCBAdmissions)
ucba
Admit Gender Dept Freq
1 Admitted Male A 512
2 Rejected Male A 313
3 Admitted Female A 89
4 Rejected Female A 19
5 Admitted Male B 353
6 Rejected Male B 207
7 Admitted Female B 17
8 Rejected Female B 8
9 Admitted Male C 120
10 Rejected Male C 205
11 Admitted Female C 202
12 Rejected Female C 391
13 Admitted Male D 138
14 Rejected Male D 279
15 Admitted Female D 131
16 Rejected Female D 244
17 Admitted Male E 53
18 Rejected Male E 138
19 Admitted Female E 94
20 Rejected Female E 299
21 Admitted Male F 22
22 Rejected Male F 351
23 Admitted Female F 24
24 Rejected Female F 317
And I would like to reformulate it to the following form:
Dept Male/Admitted Male/Rejected Female/Admitted Female/Rejected
1 A 512 313 89 19
2 B 353 207 17 8
3 C 120 205 202 391
4 D 138 279 131 244
5 E 53 138 94 299
6 F 22 351 24 317
Basically:
- We group by department
- We summarize in columns the values of acceptance/rejection (
Admit
) and genderGender
. - Final output should be other
data.frame
and column names should be self explanatory
I've researched various options ( aggregate
and xtabs
) which so far are not entirely convincing to me.
An alternative is to use functions that come in the package
tidyr
:First we join the columns
Gender
andAdmit
using the functionunite
(opposite function isseparate
):Then we spread the data frame (transform it to wide format) using the function
spread
(the opposite isgather
):One of the nice things about the package's functions
tidyr
(including those indplyr
) is that both input and output are data.frames so it's easy to chain them together. They also make the code more readable because each function is a verb.agregate()
A somewhat complex way to read but feasible, is to use
aggregate()
as follows:The exit:
Explanation:
aggregate
initial one, we manage to group byDept
and create a column that will be a list with the values ofFreq
of each subgroup (Gender
andAdmit
)sapply()
we apply the functionunlist()
and "open" the list in columnsdata.frame
and set the column names to something clearer usingsetNames()
.xtabs()
Much easier is to use the contingency tables through
xtabs()
, to achieve the output we are looking for, we can simply do:This directly generates the expected output, the problem is that it is an object of the class
xtabs table
and not adata.frame
, so we must convert it, but to maintain the structure of the table we must useas.data.frame.matrix()
: