Given a table like this:
df:
Habitat Spp_site Spp Site NGS RT-PCR
Crop Amaranthus_M2V Amaranthus M2V 1 1
Crop Amaranthus_M3V Amaranthus M2V 1 0
Crop Convolvulus_M1V Convolvulus M1V 0 0
Wasteland Convolvulus_E1P Convolvulus E1P 1 1
Oak Convolvulus_Q2P Convolvulus Q2P 1 1
Oak Anchusa_Q1P Anchusa Q1P 0 1
I would like to compress this table based on the data from the NGS AND RT-PC column, getting output like this:
df_out:
Spp Habitat NGS RTPCR
Amaranthus Crop 2/2 1/2
Convolvulus Crop 0/1 0/1
Convolvulus Wasteland 1/1 1/1
Convolvulus Oakwood 1/1 1/1
Anchusa Oakwood 0/1 1/1
The NGS column would give is the number of spp that have been sequenced compared to the total of that species, taking into account each habitat. On the other hand, the RT-PCR column would be the RT-PCR value compared to the total.
I can't find a way to do this. Thanks in advance. All the best.
Assuming these data:
Important : note the name of the column
RT-PCR
you should modify it for a less ambiguous name, in this case it was done automatically and renamed asRT.PCR
With
dplyr
you can pose a fairly simple solution, which will depend on the number of columns you have:The trick is to group by
Habitat
andSpp
then "summarize" each column, in terms of the values you are looking for a) The number of rows in the group b) The sum of the values