I have a dataframe
with several missing values and I want to perform an analysis of variance (ANOVA) to compare two binomial logit models :
- Model A : contains a set of variables.
- Model B : Contains the same variables as Model A plus 3 study variables .
We import the data:
Model A:
modelo_logit <- glm(SAP ~ sexo + edad + peso + niv_est + enf_cron + sit_lab +
frec_act_fis + ingreso_eq + GHQ_12,
data = datos_modelo, family = binomial(link = "logit"),na.action = "na.omit")
Model B:
modelo_logit_viv <- glm( SAP ~ sexo + edad + niv_est + enf_cron + sit_lab +
frec_act_fis + ingreso_eq + GHQ_12 +
n_dormitorios + cont_indus + delincuencia, # variables de estudio
data = datos_modelo, family = binomial(link = "logit"),na.action = "na.omit")
In order to perform an ANOVA , I execute:anova(modelo_logit,modelo_logit_viv)
And I get the following error:
Error in anova.glmlist(c(list(object), dotargs), dispersion = dispersion, : models were not all fitted to the same size of dataset
Both models should be fitted by the same for the same data set , but since there are several missing values, model B has more NA's
than model A (since model B contains more variables which increases the number of observations to be removed). compared to model A).
My question then is: How can I obtain the dataframe used in model B ( modelo_logit_viv
) to be able to estimate model A ( modelo_logit
) and thus have the same dataframe to estimate both models and then carry out the ANOVA? There must be some element inside the object generated by glm(*)
that contains the dataframe
one that has been used to estimate once the NA's
, but I can't find it.
The element you are looking for is called
model
, here is an example copied from the help ofglm
:As can be seen, the data of model 2 (which adds a new variable with
NA's
) has fewer rows than the original data, because precisely in this case the complete rows are omitted.Now, regarding what you are trying to do, I understand that you have only two possible paths to follow:
NA
data.frame
only the rows that do not haveNA
any of the variables of interest, for example,na.ommit(tu_dataframe[, columnas_para_los_modelos])
and thedata.frame
result would be the one that would feed the two models.