I need to import a database into R that is in csv format. The problem is that the file contains the information in a way that makes it impossible to import correctly. Of the four rows with words, I only need to keep the year of each table in the database.
What I finally need to get is something like the following:
Mes Día Año Precipitaciones
Ene 1 2018 10
Ene 2 2018 23
Ene 3 2018 22
Ene 4 2018 11
......
Ene 1 2019 13
Ene 2 2019 31.3
Ene 3 2019
......
.
.
..
The link to the data is: https://www.dropbox.com/s/rq8cql40r5dk413/190068%20B.%20Juarez.txt?dl=0
Quite an elegant solution with
tidyr
.The idea is to read the entire text file with
readLines
, so that each line is an element in a vector of character strings. Quickly pass that vector to a data.frame and usedplyr
+tidyr
to clean and separate the data.1st solution, much more complicated
I found a solution for it. The files with which they are distributing that data are horrible, so the solution has to be a Frankenstein. In this case there is a mix of tidyverse with base R that could be better standardized to make the code more maintainable.
I try to explain it in the comments, but it's still complicated because it uses regular expressions and iteration over lists.
The idea is to read each line of the file as a vector, then separate the data (rainfall per day and month) from the years to which they correspond, then within each line/day with data separate the data for each month and finally put everything back together.
And you know this:
It might be simpler at some intermediate step to write a decent .csv and read it, but with this solution you don't rely on write privileges.
An alternative with R base. The idea is relatively simple:
read.csv()
to read eachdata.frame
individualdata.frame
finale and added the column ofAño