My question is not about a specific language but rather how to solve a problem.
How can I generate random numbers but with different probabilities of coming out, for example,
5 --> 63%
10 --> 29.99%
25 --> 7%
50 --> 0.01%
80 --> 0.0001%
The percentages are taken from a game as an example, nothing more.
A fairly generic method would be to have in a data structure (for example in two separate lists) the possible numbers to be generated and the possible "relative weights" of each one, for example the probability in percent that they come out.
From the list of weights we get a list of probabilities. We must ensure that the probabilities add up to 1, so in the list that contains those weights we could divide each element by the sum of all of them.
From the list of probabilities we build another list of "cumulative" probabilities, which are, for each element, its probability plus the sum of all the previous probabilities.
For example, if the input lists are:
that of "readjusted" weights would be:
and the cumulative probabilities would be:
So, once we have the latter, the algorithm would be:
acumuladas
and find the index of the first element that exceeds the number obtained in the previous section (in the example, the first number that meets it is 0.92989907, whose index is 1)datos
the one with that index from the list (continuing with the example, the selected one would bedatos[1]
10.If the first step generates the random numbers uniformly distributed between 0 and 1, this algorithm will select the numbers
datos
with the specified probabilities.Extension
In case someone finds it useful, and to compare with other languages, here is a possible implementation in Python:
To see how it goes, I generate 10,000 random numbers with a given distribution and count how many times each one occurs:
The colleague @abulafia was ahead of me, but since he had already prepared a code in C# that demonstrated the method, I'll leave it here.
Basically, the method is the same. Is about:
1- Have two collections of equal size, one with the values, and another with the probabilities
2- The accumulated weights are computed, that is, for each array position, the value of the position plus the sum of the previous ones is placed.
3- A random number between 0 and the sum of the weight values is obtained (ideally, 1)
4- The index of the first value greater than the random number obtained is obtained, and that is the index of the value that is selected.
Let's go with the code:
As you can see in this example, I do the random roll 10,000 times to see if the method works correctly. This is an example of the output of a run:
As you can see, this method adjusts quite well to input percentages.
To generate discrete random variables there are two cases: the first one has already been developed - with replacement - and the second is without replacement .
As it was mentioned that the inspiration came from a game, I thought of Playing Cards. These are shuffled or permuted before being distributed, the latter should not be confused with the change of variable or Monty Hall problem since the probabilities change with each extraction.
Continuing with the case of cards, at the beginning the probability of obtaining any card is 1/54, when drawing the second it is 1/53 and so on.
Taken the following input values
Suppose we need to extract three data.
Before the First extraction, there is a cumulative probability diagram, which is identical to the case with detailed replacement already by several in this trend . When generating the pseudo-random number 0.725... -between 0 and 1- which falls on the ordinate axis (y-axis) it corresponds to the number 10 of the abscissas (x-axis)
Since 10 comes out, the remaining probabilities are added and this result then divides each of them... as in the example of the cards, one of the 54 comes out, I add all the remaining ones, which gives 53 and that total divides each card 1/53.
Therefore, the input data for the second extraction would be as follows:
Being its graphical representation this:
As number 5 came out, the input data and its corresponding graph of probabilities, remain like this for a third extraction if it is necessary to carry it out.
The extraction bears the name of sample and it is this amount that goes in a loop -as a condition- until the data that is needed is extracted.
Summarizing the algorithm can be verbalized as follows:
We have data ( d ), its probabilities ( p ), the sample size ( m ) and the current amount extracted ( q ) which is initially zero
A final clarification, if the sample is the size of all the data, this is equivalent to permuting the data, that is, to shuffling the cards, although unlike these, your case is interesting since the probabilities are different.
A practical application of generating discrete random variables is the virtual keyboards of the banks, where the keys {1,2,3,4,5,6,7,8,9,0} must be mixed or permuted for each session and / or user.
Here is a Lua implementation of discrete va generation with and without replacement
You can run the Lua code here in replit
Finally, I made the graphics in R, the source code is in replit , but I edited it in Skitch .