I have an excel file like this
structure(list(Territorio = c("Aguadulce", "Alanís", "Albaida del Aljarafe",
"Alcalá de Guadaíra", "Alcalá del Río", "Alcolea del Río",
"Algaba (La)", "Algámitas", "Almadén de la Plata"), `2007` = c(1,
2, 3, 4, 5, 6, 7, 8, 9), `2008` = c(10, 11, 12, 13, 14, 15, 16,
17, 18), `2009` = c(19, 20, 21, 22, 23, 24, 25, 26, 27), `2010` = c(28,
29, 30, 31, 32, 33, 34, 35, 36), `2011` = c(37, 38, 39, 40, 41,
42, 43, 44, 45), `2012` = c(46, 47, 48, 49, 50, 51, 52, 53, 54
)), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"
))-> datos
I change it to long format
library(tidyr)
dat_nuevos = gather(datos, key = Año, value = Valores, -Territorio)
but how do I do it when the data is like this?
structure(list(...1 = c("Territorio", "Aguadulce", "Alanís",
"Albaida del Aljarafe", "Alcalá de Guadaíra", "Alcalá del Río",
"Alcolea del Río", "Algaba (La)", "Algámitas", "Almadén de la Plata"
), Industria = c("2007", "1", "2", "3", "4", "5", "6", "7", "8",
"9"), ...3 = c("2008", "10", "11", "12", "13", "14", "15", "16",
"17", "18"), ...4 = c("2009", "19", "20", "21", "22", "23", "24",
"25", "26", "27"), Energia = c("2007", "28", "29", "30", "31",
"32", "33", "34", "35", "36"), ...6 = c("2008", "37", "38", "39",
"40", "41", "42", "43", "44", "45"), ...7 = c("2009", "46", "47",
"48", "49", "50", "51", "52", "53", "54")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
I assume that your data has exactly the same structure as in the example and that is how it comes out of reading Excel. As is often the case with data cleansing, the solutions are very ad hoc to the data. Although I admit that this case of poorly named variables in Excel is endemic.
This is the solution I found. It is long and convoluted and the resulting data would have to be validated very carefully because it could fail in many ways.
I call the problematic data
datos2
I get the result:
The commented code:
Bonuses
With
pivot_wider
it's easy to convert them back to wide format, with correct column names.