I am trying to implement Wilcoxon's T-test on my own to compare its results with those provided by the internal function built into R itself.
I'm still not sure how, at the moment I'm doing tests with a loop for
and I would like to find its "elegant" equivalence in the form of a function of the family apply
.
It would be for a single sample and we want to test if the median is 5.
me0 = 5
muestra = c(4, 5, 6, 5, 3, 4, 2, 7, 6, 5, 4, 3, 8, 8, 9, 4, 6, 7, 2, 5, 6)
I get the differences between the sample elements and the hypothetical median:
diferencias = muestra - me0
Result:
[1] -1 0 1 0 -2 -1 -3 2 1 0 -1 -2 3 3 4 -1 1 2 -3 0 1
In order to determine the ranges, although this solution does not convince me very much because I lose information that I would have to recover later, I eliminate the zeros, obtain the absolute value of the differences and order the sequence from smallest to largest:
absolutas = sort(abs(diferencias[diferencias != 0]))
Result:
[1] 1 1 1 1 1 1 1 1 2 2 2 2 3 3 3 3 4
Now I determine what different elements there are:
niveles = as.numeric(levels(factor(absolutas)))
Result:
[1] 1 2 3 4
And through a loop, I get the ranges:
rangos = c()
for(nivel in niveles)
{
rangos = c(rangos, mean(which(absolutas == nivel)))
}
Result:
[1] 4.5 10.5 14.5 17.0
Which is to be expected.
And the question is how to improve the syntax of the loop for
using some function of the family apply
.
By means
sapply()
of you can directly obtain the vector you are looking for:On the other hand, it is worth commenting that replacing a
for()
by one of the*apply()
is more a matter of taste or to make a code more compact and consistent with the traditional mechanics of R, but there is no great impact at the performance level.