I was wondering if anyone knows what kind of methods are the most performant to implement in very intensive searches (in arrays) (I am talking about arrays of millions of elements) from C#, for example it is better to use IndexOf or BinarySearch to obtain the index of the element ? should I use arrays or HashSet? and how could I use the HashSet to find the matches in the array?
A logical example of the algorithm I need:
Block: 071827490593720123213023498230402000813
Value to find: 40200
Objective: Return the index where said group of numbers is found
What should return: 30
Code that I currently implement:
int offSet = 0;
while ((offSet = Array.IndexOf(bloque, encontrar[0], offSet)) != -1)
{
if (encontrar.Length > 1)
for (int i = 1; i < encontrar.Length; i++)
{
if (bloque.Length <= offSet + encontrar.Length) break;
else if (encontrar[i] != bloque[offSet + i])
{
if (bloque[offSet + i] == encontrar[0])
{ offSet += (i - 1); break; }
else if (i == encontrar.Length - 1)
{ offSet += i; break; }
break;
}
else if (i == encontrar.Length - 1)
addresses.Add(new IntPtr((int)baseAddress + offSet));
}
else addresses.Add(new IntPtr((int)baseAddress + offSet));
offSet++;
}
This algorithm is not slow, but it is not as fast searching as the program I am comparing it to. The program I am developing opens the processes and looks for values in their memory regions (Yes, I am comparing it with Cheat Engine). As you can see it is more or less similar to Boyer Moore's algorithm, but I need to know if I can replace functions to increase performance or if I should remove or change something in the logic of the algorithm to increase performance.
https://es.wikipedia.org/wiki/Algorithm_of_b%C3%BAsqueda_de_cadenas_Boyer-Moore
First of all, I would profile the executions to see if they tell me anything about the behavior of the code.
Things I would try, because these things I understand are more trial and error than anything else. Maybe in this time you have done:
encontrar
is 1 (we get aif
from loop.while
with what I understand you could eliminate theoffSet += (i - 1)
leaving only with thebreak
addresses.Add(new IntPtr((int)baseAddress + offSet))
. instantiation and addition to the collection could be costly relative to loop execution.UPDATE
Rational: You're doing a cast, creating an IntPtr object and adding it to address. The second operation in particular may be expensive compared to the rest of the algorithm, after all creating objects is not free (it's a hypothesis you should test).
What could be done about it? Well, you could try delegating that task to another process or thread.
That is, build a producer/consumer scheme, where the producer (your current process) passes the baseAddress+offset to the consumer and the consumer creates the IntPtr and adds it to its collection.
Then on execution completion, you can ask the consumer that created the IntPtr to return the generated IntPtr collection.
The idea is that in a multiprocessor environment the search process and the creation process fall on different processors and with this you could gain improved performance.
Here an article about it.
THE END
Of course, I can be wrong about everything. It's probably much more interesting to read your results in response form.
You didn't evaluate use
Regular Expression
to fetch the blockwith this you could search all the matches
Source: Is there a function that returns index where RegEx match starts?