Be:
set.seed(2020)
y_fitted<-runif(10,min=0,max = 1)
y_fitted
0.197 0.078 0.818 0.942 0.884 0.166 0.355 0.748 0.451 0.556
y<-sample(0:1,10,replace=T)
y
1 1 1 0 0 1 1 1 1 0
I want to create a dataframe such that:
- Column 1: if
y_fitted>= 0.5
it is 1 and ify_fitted<0.5
it is 0 - Column 2:
y
unchanged - Column 3:
TRUE
yesy_fitted==y
;FALSE
Yesy_fitted!==y
Finally, get the percentage of values TRUE
that are in the third column.
The problem I am having when creating column 1 : I don't understand why when executing the following lines of code, the values are modified by 1 as I want, but the values between 0:0.4999
are not changed to 0 .
y_fitted[y_fitted==0:0.499999999]<-0
y_fitted[y_fitted==0.5:1]<-1
I want to do all this to analyze the goodness of fit of my binomial logit model, where y_fitted
are the values estimated by my model and the values of y
are the values of my variable to explain (endogenous).
I put 0.499999999
in to make sure no value is left out of the change I want to make to column 1, but I'm sure there's a cleaner way to do it.
The percentage that I request would indicate, then, the percentage of correct answers that my model has within the sample with which I am working.
What is the most efficient way to do this?
This clause:
y_fitted[y_fitted==0:0.499999999]<-0
(and the following one as well) do not have the meaning you seek to give it, some reasons I can tell you:0:0.499999999
I understand that what you are looking for is to generate a sequence of values from 0 to 0.49..., however, it:
works as a sequence of step 1, it does not work for a sequence with a smaller step, so the result will be a vector with a only value 0, you can check it by evaluating0:0.49999
.To generate NON-integer sequences you have to use
seq()
, however, even if you generate this sequence like thisseq(from=0, to=0.49, by=.1)
, it will always be a set of discrete values, you can add more values by increasing the precision/step withby
but it will always be a vector of discrete values.Also when you compare the
a == b
what happens is that you compare each value ofa
with each value ofb
that is to say something likea[1] == b[1], a[2] == b[2]
that, including recycling of values if one vector is greater than the other, which certainly does not solve the problem.What you are looking for, you should actually write it like this:
Or much more concise:
Or why not, taking advantage of the conversion from logical to integers, this, thanks to the fact that you want the values 1 and 0 that are consistent with the
TRUE
orFALSE
: