I have a database in excel like this:
I need to do a multiple regression between rindess as the dependent variable and Dec,Jan,Feb as regressors for each of the departments. That is, a regression for May 25, July 9 ...
My code is the following.
library(readxl)
precip_dic <- read_excel("C:/Users/kevin/Desktop/rtrabajo.xlsx",
sheet = "precip_dic")
precip_ene <- read_excel("C:/Users/kevin/Desktop/rtrabajo.xlsx",
sheet = "precip_ene")
precip_feb <- read_excel("C:/Users/kevin/Desktop/rtrabajo.xlsx",
sheet = "precip_feb")
rindes <- read_excel("C:/Users/kevin/Desktop/rtrabajo.xlsx",
sheet = "rindess")
attach(precip_dic)
attach(precip_ene)
attach(precip_feb)
attach(rindes)
data <- cbind(precip_dic$`CORONEL SUAREZ`,precip_ene$`CORONEL SUAREZ`,precip_feb$`CORONEL SUAREZ`,rindes$`CORONEL SUAREZ`)
colnames(data) = c("dic","ene","feb","rindeas")
db <- data.frame(data)
modelo_simple <- lm(data = db,formula = rindeas ~ dic+ene+feb)
First, let's imagine an initial dataset like this:
As you can see I have 4
data.frames
smaller than yours (only 3 locations as an example). The initial conditions would be:data.frame
has a period columnn
columns for each localitydf
has the same number of observations and the same periodsNA
, in your example there are, you will have to solve this issue before, since I do not know the criteria of what to do with these cases.There would be a classic iterative way to deal with the problem, but it is complex and unclear, so I am going to present a similar solution
tidyverse
that makes use of the packagebroom
(ideal for working with models).In the first place we build a new and unique
data.frame
more "elegant" and clear for what we are going to do:With
gather()
we take the columns oflocalidad
and the precipitation value to rows and "join" all the months and the yields by means of twojoins
. The result is something like this:That is, a
df
withperiodo
,localidad
and the 4 variables that matter to us. Now if we are going to generate the models, 1 per locality, we are going to do it from thedf
original minus the last period to use this as data to test the prediction later.And magically, we have a row per locality and a column
modelo
of the classlm
, that is, the linear model. If we now need to make a particular prediction we could do something traditional like this:Or use
broom
and work in a much more comfortable and simple way:The column
.fitted
is the new adjusted value for each row. It is also very convenient to easily obtain the coefficients of each row with which we build each model by locality: