I have two binomial logit models :
- Model A:
modelo_logit_viv <- glm( SAP ~sexo + edad + peso + niv_est + enf_cron + sit_lab +
frec_act_fis + GHQ_12 + ingreso_eq +
n_dormitorios + cont_indus + delincuencia, # variables de estudio
data = datos_modelo, family = binomial(link = "logit"),na.action = "na.omit")
- Model B (nested model of A):
modelo_logit <- glm(SAP ~ sexo + edad + peso + niv_est + enf_cron + sit_lab +
frec_act_fis + ingreso_eq + GHQ_12,
data = modelo_logit_viv$model, family = binomial(link = "logit"))
c(edad, peso, ingreso_eq, GHQ_12)
They are continuous variables, the rest are categorical variables (factor).
I want to analyze whether the characteristics of the dwelling (study variables) have an influence on the self-perceived health status (SAP). All variables are significant in both models. However, I want to perform an analysis of variance (ANOVA) between these two models to check that model B is better than model B . So I run:
anova(modelo_logit,modelo_logit_viv)
And I get the following table:
Resid. Df Resid. Dev Df Deviance
1 16805 15439
2 16802 15420 3 18.644
Can the interpretation of this table tell me if there is an influence between housing characteristics and health status?
Another way to ask this question is: Is the 18.644
F test statistic that I should compare with Snedecor's F table to determine whether or not to accept the null hypothesis that there is a difference in means between the two models?
If not, how can I compare these two models with an ANOVA in R?
The deviation term that we see when executing
anova(modelo_logit,modelo_logit_viv)
is the difference between the sum of the prediction errors , so ifmodelo_logit
loses better thanmodelo_logit_viv
, the deviation should decrease, as it is the case.This suggests that the study variables indeed have explanatory power or some association with the dependent variable
SAP
.We can check this using the Chi-square test that is implemented in
anova()
. An example withdataframe
iris
:On the other hand, if the study variables included in the model are significant under the Wald test (we can see this by running
summary(modelo_logit_viv)
) the likelihood ratio should be similar to , so we also calculate McFadden's pseudo R^2:We see that the values are very similar
c(0.49,0.48)
and the Chi-square test in my model is also significant. We can conclude then that the study variables are relevant when explaining the state of health (SAP
)