I am rephrasing this question from another user, because the original one was unsolvable and the question will possibly be closed, but I think the general technique to solve it might be interesting.
It is essentially about completing the following text, so that the MD5 hash of the complete text is equal to a given one. This is the incomplete text:
with an iron rod) that there are too many redundancies and that only the argument should suffice. What I can say is my name. My name is ......."
We are given the following data:
- The missing word at the end (which I have represented by
.......
) has 7 letters, all capital ascii. - That word starts with
A
and ends withO
. - The MD5 hash of the final text, encoded in CP-1252, must be
7292e0ae2227b2fd1cb089656aeff5e2
What is the missing word?
The functions
hash
are irreversible. That is, the fact that we know the final hash (7292e0ae2227b2fd1cb089656aeff5e2
) does not allow us to know the starting text.Cryptographic hashes also fulfill other properties that make it impossible to find which text would generate a given hash, if not by brute force , that is, trying all possible texts and finding the hash of each one, until finding one whose hash matches the given.
Luckily this time the search space is relatively small. With the data they give us, we know that the only thing we have to try are all the possible combinations of the 26 uppercase Ascii letters, taken 5 by 5. Concatenating each of these combinations one
A
before and oneO
behind, we have the different words that could complete the text. There are 26 5 cases, which although a large number (11881376) can be exhaustively traversed in a couple of minutes.The following solution uses Python to generate the combinations (
itertools.product
it greatly simplifies the code) and calculates the md5 hashs, until it finds the one that produces the desired result.After 54 seconds the answer appears on the screen:
The searched word was "HELP". The search time could have been reduced by using multiprocessing, to check several cases in parallel (as many cores as the machine on which it is executed has, more than that would no longer represent a gain in time).
The text, by the way, was by Roberto Bolaño
I can think of 3 ways:
Considering the context, the letter space can be reduced to 26, 24 or even 22, if what is searched for by context is a proper noun in capital letters, there is less chance that certain letters will be used (X,Ñ,W,Q, Ü,Z), or the low incidence of repeated letters, it may be that accentuation rules are ignored in capital letters (PERU/PERÚ, SANCHEZ/SÁNCHEZ).
As the final number of combinations is within the computable, the complete corpus can be used and give the combinatorics.
note can be extended or reduced when considering names with letters of the Latin alphabet such as: Walter, Ximena, Aarón, Zoilo, Güemes, Gonçalves. etc...
By context, the space of words can be limited to proper names in Spanish, more common in Mexico, in the last part of the 20th century, etc...
Part of the known string is searched in google and if there is a match, it is tried