To test a multiple linear regression I need to delete a row from my data frame (an observation), estimate the model with the data and predict the value of Y with the data of the independent variables that I delete. Some of my code is
df <- data.frame(regresion_agregada)
df$periodo <- c(1999:2017)
for (i in 1999:2017 ) {
dt = filter(df, periodo!=i)
model = lm(data = dt,formula = y ~ xa+xb+xc)
}
How can I test the model object with the values of the deleted observation in the loop, that is, with the values of period i for xa, xb, xc ?
You must use
predict.lm()
indicating the model and the test data (in this case the data containing only the indicated period is filtered). I added aprint()
so you can see in the console the values you get.Note that the object
model
that remains in your environment at the end of your for loop is the last one used, that is, it only corresponds to the last period (2017).Samuel's answer is adequate, but to give more alternatives, the cross-validation examples may be of interest at: https://rubenfcasal.github.io/intror/programacion.html#aplicacion-validacion-cruzada
It should also be borne in mind that if what we are interested in is the residual of traditional cross-validation (leave-one-out, LOOCV), they can be obtained directly from the adjustment of the model with all the data. From the R help (eg
?influence.measures
): "For linear models, rstandard(*, type = "predictive") provides leave-one-out cross validation residuals, and the “PRESS” statistic (PREdictive Sum of Squares, the same as the CV score) of model model is PRESS <- sum(rstandard(model, type="pred")^2)"