I start with these data:
datos1 <- read.table(text = '
municipio años poblacion
Mun1 2000 50
Mun2 2000 300
Mun3 2000 1
Mun4 2000 100
Mun5 2000 5
Mun6 2000 20
Mun7 2000 20
Mun8 2000 3
Mun9 2000 20
Mun10 2000 20
Mun1 2001 100
Mun2 2001 50
Mun3 2001 10
Mun4 2001 30
Mun5 2001 20
Mun6 2001 60
Mun7 2001 40
Mun8 2001 20
Mun9 2001 20
Mun10 2001 3
Mun1 2002 20
Mun2 2002 5
Mun3 2002 20
Mun4 2002 5
Mun5 2002 20
Mun6 2002 10
Mun7 2002 20
Mun8 2002 20
Mun9 2002 5
Mun10 2002 0
Mun1 2003 20
Mun2 2003 25
Mun3 2003 20
Mun4 2003 10
Mun5 2003 20
Mun6 2003 10
Mun7 2003 30
Mun8 2003 20
Mun9 2003 60
Mun10 2003 20
Mun1 2004 20
Mun2 2004 10
Mun3 2004 20
Mun4 2004 10
Mun5 2004 20
Mun6 2004 34
Mun7 2004 20
Mun8 2004 20
Mun9 2004 34
Mun10 2004 21
', header = TRUE, stringsAsFactors = FALSE)
I want to make the variation rates of the population from year to year, that is,
2001 compared to the year 2000.
2002 compared to the year 2001.
etc.
To do this I order by year:
datos1[order(datos1$años),] -> datos1
I calculate the rate:
library(TTR)
tasa <- datos1 %>%
mutate(Tasa = ROC(poblacion, n = 10, type = "discrete"))
I remove the population column so I can switch to wide format:
tasa <- tasa [,-c(3)]
I round the rate:
round(tasa$Tasa,2) -> tasa$Tasa
I pass it to wide format:
tasa%>%
pivot_wider(names_from = años, values_from = Tasa) -> tasa
I save it:
write.csv(tasa ,'Tasa_Poblacion.csv', quote = F,row.names = FALSE)
Doubts:
Is there any other more direct way?
how would you do if you had more columns and wanted the rate for each one, something like this:
data2 <- structure(list(municipality = c("Mun1", "Mun2", "Mun3", "Mun4", "Mun5", "Mun6", "Mun7", "Mun8", "Mun9", "Mun10" , "Mun1", "Mun2", "Mun3", "Mun4", "Mun5", "Mun6", "Mun7", "Mun8", "Mun9", "Mun10", "Mun1", "Mun2", " Mun3", "Mun4", "Mun5", "Mun6", "Mun7", "Mun8", "Mun9", "Mun10", "Mun1", "Mun2", "Mun3", "Mun4", "Mun5" , "Mun6", "Mun7", "Mun8", "Mun9", "Mun10", "Mun1", "Mun2", "Mun3", "Mun4", "Mun5","Mun6", "Mun7", "Mun8", "Mun9", "Mun10"), years = c(2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2001, 2001, 2001 , 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003 , 2003, 2003, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004), population = c(50, 300, 1, 100, 5, 20, 20, 3, 20, 20, 100, 50, 10, 30, 20, 60, 40, 20, 20, 3, 20, 5, 20, 5, 20, 10, 20, 20, 5, 0, 20, 25, 20, 10, 20, 10, 30, 20, 60, 20, 20, 10, 20, 10, 20, 34, 20, 20, 34, 21), births = c(4, 0, 3, 0, 4, 3, 5, 1 , 2, 1, 2, 2, 1, 3, 3, 2, 0, 5, 0, 5, 4, 1, 5, 3, 0, 5, 5, 1, 5, 0, 2, 0, 1 , 0, 5, 0, 4, 2, 1, 5, 1, 2, 2, 2, 3, 3, 0, 2, 4, 1), deaths = c(2, 2, 1, 3, 1, 0, 0, 3, 2, 4, 3, 4, 0, 3, 5, 1, 5, 4, 3, 0, 5,1, 4, 2, 4, 2, 3, 2, 2, 2, 1, 1, 0, 3, 2, 4, 3, 1, 4, 2, 0, 1, 1, 4, 5, 3, 3, 5, 3, 4 )), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -50L))
You can calculate the rate directly and in a more generalizable way, grouping by municipality and using
lag(poblacion)
to obtain the previous value:In case you have more variables, you can use
mutate_at
to apply the calculation of the rate to several:Or even more generic, if you calculate on all the variables except municipality and years: