I want to graph a series of predictions on a logit model in R. The model is as follows:
modelo_logit3 <- glm(formula = Sold ~ price+age+poor_prop+airport, data = datos, family = binomial)
summary(modelo_logit3)
Call:
glm(formula = Sold ~ price + age + poor_prop + airport, family = binomial,
data = datos)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8327 -1.0676 -0.3743 1.0907 1.9014
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.275016 0.743781 5.748 9.05e-09 ***
price -0.148547 0.021930 -6.774 1.26e-11 ***
age 0.009497 0.004592 2.068 0.0386 *
poor_prop -0.184504 0.029633 -6.226 4.78e-10 ***
airportYES 0.871132 0.200409 4.347 1.38e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 697.28 on 505 degrees of freedom
Residual deviance: 610.46 on 501 degrees of freedom
AIC: 620.46
Number of Fisher Scoring iterations: 4
I would like to represent in a scatter plot three series of probability of sale, based on three different values of price: 20, 30 and 40. The variables age and airport will have a constant value and poor_price will be the one that varies. On the chart, the Y-axis will represent the probabilities and the X-axis the poor_price variable. What I have done is the following:
# Realizamos las predicciones y las guardamos en variables para usarlas luego:
a = predict(modelo_logit3, newdata = data.frame(price=20, age=50,
poor_prop=c(5,25,35,50,65),
airport= 'YES'), type ="response")
b = predict(modelo_logit3, newdata = data.frame(price=30, age=50,
poor_prop=c(5,25,35,50,65),
airport= 'YES'), type ="response")
c = predict(modelo_logit3, newdata = data.frame(price=40, age=50,
poor_prop=c(5,25,35,50,65),
airport= 'YES'), type ="response")
# Ahora, creamos un dataframe con el resultado de estas predicciones para las
# distintas combinaciones de "price" y "poor_prop:
predicciones <- data.frame(
price = c(rep(20, times=5), rep(30, times=5), rep(40, times=5)),
fitted_values = c(a,b,c),
poor_prop = c(5,25,35,50,65)
)
# Veamos el dataframe que hemos creado:
predicciones
# Hago un attach del dataframe:
attach(predicciones)
# Finalmente, graficamos las predicciones:
ggplot(data = predicciones, aes(x = poor_prop, y = fitted_values,
col = price)) + geom_point() + geom_line() +
scale_color_gradient(low="blue", high="red")
The dataframe I have made is:
price fitted_values poor_prop
20 8.490973e-01 5
20 1.231930e-01 25
20 2.171980e-02 35
20 1.392686e-03 50
20 8.759648e-05 65
30 5.602225e-01 5
30 3.082831e-02 25
30 5.001293e-03 35
30 3.156376e-04 50
30 1.983277e-05 65
40 2.238433e-01 5
40 7.149899e-03 25
40 1.136666e-03 35
40 7.147629e-05 50
40 4.490112e-06 65
And the graph I get is the following:
However, the correct thing would be for each line to be linked with its respective price, in order to have the three series of probabilities and I do not understand why all the points are being linked. If anyone has any ideas and can help me, I'd really appreciate it.
All the best!
Finally, I have been given a solution on the English StackOverflow page. It can be solved by transforming the price variable into a factor type in order to have three categories, each referring to the different values.
I leave the link with the answer:
Graphical representation of a series of probabilities from logistic model with R