Given the function:
def md5(r, n):
i = 0
while i != n:
r = hashlib.md5(r[:9]).hexdigest()
i+=1
return r
Results:
md5("00000000000000000000000000000000", 0) #00000000000000000000000000000000
md5("00000000000000000000000000000000", 1) #4c93008615c2d041e33ebac605d14b5b
md5("00000000000000000000000000000000", 2) #c6246d2a39695fb74971b09ced30874f
md5("4c93008615c2d041e33ebac605d14b5b", 1) #c6246d2a39695fb74971b09ced30874f
md5("00000000000000000000000000000000", 2017) #57ab27ac3626d4565913485ff42b037b
Is there a way to calculate:
md5("00000000000000000000000000000000", 2017201720172017)
Without dying first, of course.
md5("00000000000000000000000000000000", 20172017)
It takes me 84 seconds, so 2017201720172017 is 100000001 times that. Approximately 2 million hours.
I don't use xrange
or range
since the value 2017201720172017 is too big.
Yes, but with a quantum supercomputer.
I agree with this comment:
Planck time says that in one second there are 10 43 instants of time.
The length of that is 32, so the number of MD5 hashes that can exist is:
Assuming that each hash costs a thousand instants of time:
This means that it is physically possible to hash in a very short time, but at the moment no computer is that fast. Let's take 100 years as acceptable, there are people who live longer than that. So we do the math, 100 years divided by 16 32 .
According to the Gregorian calendar , there are 3 k 155 1 695 200 seconds in 100 years . That divided by 16 32 gives:
The result is one over 100 thousand quadrillion seconds, that is, 9 * 10 -30 seconds.
That's the amount of time the supercomputer needs to check each hash, accepting that it can take 100 years. Currently a normal computer is around 10 -8 each hash, with the help of an entire humanity it might be possible, or perhaps in a future time, not too far away.
Dear all, after racking my brain for a few days I was able to arrive at the result without using a quantum supercomputer. It should be clarified that it is only for the calculation of
md5("00000000000000000000000000000000", 2017201720172017)
Since the first nine characters are taken for the next iteration, presumably they were going to be repeated at some point. Saving and comparing the previous results, it was possible to determine that after iteration 333399, a repetition of known results of 60301 iterations was produced, that is:
They all give the same result.
Therefore, we proceeded to calculate:
(2017201720172017 - 333399) Mod 60301 = 45742
The result of 45742 would be the offset of iteration 2017201720172017 in the interval of 60301 that repeats.
Then:
Therefore it was not necessary to calculate the 2017201720172017 iterations, but only 379141.
In generic formula it would be something like this (I am not a mathematician):
r(i)=r((i - 333399) Mod 60301 + 333399)
Ex:
r(401265) = r((401265 - 333399) Mod 60301 + 333399) = r(340964) r(74589651) = r((74589651 - 333399) Mod 60301 + 333399) = r(359120) r(201720172017) = r((201720172017 - 333399) Mod 60301 + 333399) = r(362302) r(20122017) = r((20122017 - 333399) Mod 60301 + 333399) = r(343289) r(2017201720172017) = r((2017201720172017 - 333399) Mod 60301 + 333399) = r(379141)
Where i is the iteration sought and i >= 333399
r is the result of calling the md5 function with parameters "00000000000000000000000000000000" ei
Ex:
r(401265) = md5("00000000000000000000000000000000", 401265) r(340964) = md5("00000000000000000000000000000000", 340964) r(2017201720172017) = md5("00000000000000000000000000000000", 201720172017) #Imposible de resolver r(379141) = md5("00000000000000000000000000000000", 379141) #Mismo resultado que el anterior
A big hello to all.
As I indicated in my comment to the auto-response, the solution is not clear to me at all. After much thought, I am encouraged to write this.
To the point: Yes .
If we analyze the sample code shown in the question
we clearly observe that the only thing that interests us are the first 72 bits (8 * 9). Also, except for the first time, we use the first 72 bits of the output as input.
Analyzing the question, we can rewrite it like this:
Which, in turn, can be rewritten as 2 separate questions:
Given a sequence of input data, composed of data
72
bits, will this sequence have repeated elements before reaching the maximum possible value for those 72 bits?If the answer is yes, how many bits do we need to be sure of finding the first repeated element?
So, as Jack the Ripper said , let's go by parts...
Will there be repetition before the 72-bit limit is reached?
Technically, what we are looking for is the existence or not of a summary collision . Copied directly from the above link
Why do we look for a collision? so very simple. If we are able, within the range of 72 bits, to find 2 values that generate the same 72 bits as output, we will inevitably enter a loop, from which there is no way out.
Recall that the output data is, in turn, used as input data. Some hypothetical output data (which are inputs of the next iteration):
It is observed that both
9
generate17
the same output (4 in this example). The17
loop begins there.Now, after consulting how MD5 works , we come to the first stumbling block . Although we only use 72 bits, the function works, internally , with 128 bits. And it delivers results of that same size.
Intuition tells us that there will be no collisions , since the width of the outputs is greater than the inputs. We soon get into the detail that we only use 72 bits . The rest of the data (56 bits) is always repeated, on all inputs . With this, we are artificially limiting the width of the outputs. For practical purposes, the outputs will contain only 72 bits.
Even so, it is not clear that repetitions can exist. It is necessary to review more thoroughly the interior of MD5, until we realize one thing:
That is enough. It is impossible to map 96 bits into just 32 without getting repeated outputs for different input data . Even if we mix different operations, the loss of information will occur, and with it the certainty that we will have repeated outputs.
Therefore, the answer to our first question is:
Now, on to our second question.
When will that repetition take place?
The topic of hash collisions has been extensively studied; the best known method is the
Ataque del cumpleaños
. Said attack is based on probability ; tells us the probability that, after a certain number of attempts, we find two input data that generate the same result.Unfortunately, my mathematical knowledge is not that great . Fortunately, in the previous link we have a table. Using it, we deduce that, for 72 bits of data, the number of trials required, for a 50% probability, will be between
5.1 × 10^9
and2.2 × 10^19
. As we can see, a not insignificant number .Now, considering that we are talking about probabilities , the answer to our second question is somewhat more imprecise :
Therefore, the answer to the original question is it possible to do it in a reasonable time ? is curious :
And why is it possible?
For what was said before about the
ataque del cumpleaños
. We are talking about probabilities , not certain data .It would be possible (although unlikely) that we would loop on the second iteration (not on the first one, sure). It is also possible that we enter the loop in the last one.
Once the roll is released , I reaffirm my comment. Although not indicated, it has all the earmarks of being an exercise. The coincidence of choosing these specific data is quite unlikely . Therefore, I conclude that it is a known feature of MD5, surely discovered during the multiple tests to which it has been subjected.