Although I have seen several questions about it, I still have doubts. I have to say that my knowledge of STL containers is very limited. In fact, I've only ever used std::list for anything other than examples.
Let's see if I can explain my problem well. I have a program that represents a graph by the adjacency list method. Each node stores a field called Code, and there can never be two nodes with the same code in each graph. When I use the program there are no problems, inserting a new node involves going through the list to see if there is already a node with the same code, so that if it does not exist, it must be created and linked, but if it does exist, it is only necessary to link it where it corresponds. This is an expensive operation, but when it comes to inserting one or a few nodes, it is imperceptible,
The problem comes to me when I have to read a file with the standard exchange format, and it contains a database with about 60,000 nodes and between 3 and 5 relationships for each node. Opening a database can take more than 10 minutes, and I understand that this is neither normal nor acceptable.
I have made, as I have already said, I have no idea of using containers other than lists, a program (it is very bad but it helps me to check the performance differences) that compares a search for an element in a std::list
and in a std::map
. Although I have read that the function clock()
is not very precise, at least it gives me consistent results (in a list of 10 m of int, it takes me 10 times longer when the value to search for is at the end than at the beginning, for example). The difference is so abysmal that I understand that my solution goes through there. At least when reading files with the standard interchange format.
My idea is to implement a std::map
next to the adjacency list, whose key is the code and whose value is the node. Something like std::map<std::string, nodo>mapa
, that inserts values at the same time that I insert in the list, and that when it has to look for a code, it uses the map.
Anyway, in short, these are my questions:
- Is keeping a list and a map running in parallel an acceptable idea or is it just idiocy of mine?
- For what I want to do, is there another more appropriate container?
I also put the code of the program that I have made to evaluate the times in case someone sees any inconsistency (it is a rehash of what is already out there):
#include <iostream>
#include <list>
#include <ctime>
#include <map>
using namespace std;
bool existe(const list<long unsigned int>&lista, long unsigned int valor);
bool existe(const std::map<std::string,long unsigned int> &mapa, std::string valor);
std::string numToString(long unsigned int num);
const long unsigned int total= 10000000;
const int multiplicador = 1000;
int main()
{
std::list<long unsigned int>lista;
std::map<std::string,long unsigned int> mapa;
//lista de int
for (long unsigned int i=0; i<total; i++)
{
lista.push_back(i);
}
//map de char-int
for (long unsigned int i=0; i<total; i++)
{
mapa.insert (std::pair<std::string,long unsigned int>(numToString(i),i));
}
long unsigned int valor;
std::string cadena;
do
{
cout<<"Ingresa un valor a buscar entre 0 y "<<total<<endl;
cin>>valor;
std::string cadena = numToString(valor);
if (existe(lista,valor))
{
cout<<"La lista contiene el valor: "<<valor<<endl;
}
else
{
cout<<"La lista NO contiene el valor: "<<valor<<endl;
}
if (existe(mapa,cadena))
{
cout<<"El mapa contiene el valor: "<<cadena<<endl;
}
}
while (valor!=-1);
return 0;
}
bool existe(const list<long unsigned int>&lista, long unsigned int valor)
{
clock_t t_ini, t_fin;
double secs;
t_ini=clock();
for (auto elem:lista)
{
if (elem == valor)
{
t_fin = clock();
secs = (double)(t_fin - t_ini) / CLOCKS_PER_SEC;
cout<<"Tiempo: "<<secs*multiplicador<<endl;
return true;
}
}
t_fin = clock();
secs = (double)(t_fin - t_ini) / CLOCKS_PER_SEC;
cout<<"Tiempo: "<<secs*multiplicador<<endl;
return false;
}
bool existe(const std::map<std::string,long unsigned int> &mapa, std::string valor)
{
clock_t t_ini, t_fin;
double secs;
t_ini=clock();
auto it = mapa.find(valor);
std::cout<<"El valor: "<<mapa.at(valor)<<std::endl;
t_fin = clock();
secs = (double)(t_fin - t_ini) / CLOCKS_PER_SEC;
cout<<"Tiempo: "<<secs*multiplicador<<endl;
if (it!=mapa.end())
{
return true;
}
return false;
}
std::string numToString(long unsigned int num)
{
long unsigned aux,indice=0,D;
string numstr;
while(num!=0)
{
D = num % 10;
char var = D+48;
auto it = numstr.begin();
numstr.insert(it,var);
indice++;
num/=10;
}
return numstr;
}
Applied to your scenario...
std::list
has two problems:std::map
has three problems:So... which container to use?
For your case I would recommend
std::unordered_map
. What this container does is group the elements based on the hash of their key. This way of working allows the insert and seek times to be stable regardless of the number of elements in the collection.As a possible disadvantage, to use our own object as a key we will have to work on the corresponding hash function, but in your case, resorting to
std::string
this will not be a problem.This wrapper is available as of the C++11 standard. If you can't make use of that standard or a more modern one you might be able to resort to boost containers (they're pretty much the same).