What is a promise in Javascript?

Question

user23371

Asked: 2020-06-21 11:33:47 +0800 CST 2020-06-21 11:33:47 +0800 CST 2020-06-21 11:33:47 +0800 CST

How do I reuse spaces in a hash table, disregarding the oldest ones?

772

I have to process a lot of sentences. In fact, the number of them is unlimited : they are obtained from a source on the Internet.

The goal is to count how many times a certain phrase is repeated. Since the number of them is, as I say, unlimited , it is obvious that I cannot store all of them in memory.

Additionally, certain conditions are required of me: the complexity of any operation must be O(1). There 's no mention of collisions, so I'm neglecting them for now.

An example of the expected behavior :

4096 sentences are read, all of them different . Coincidentally, neither generates hash collision with another.
The hash table is filled . The 4096 gaps are occupied by the sentences read.
A new sentence is read. Its hash is the same as one of the previous phrases, but the phrase is different from all the 4096previous ones.
I remove from the table the item that has been unused the longest . In this case, it would be the first one I inserted . I don't remove the element with matching hash value, because it is more modern .

EDIT

Currently, I have this:

#include <string.h>

typedef struct phrase_s {
  char text[256];
  unsigned long long count;
  struct phrase_s *prev;
  struct phrase_s *next;
} Phrase;

Phrase List[4096];

// Devuelve la hash de una cadena, limitada a 12 bits ( 0 - 4095 ).
size_t makeHash( const char * );
// Devuelve la cadena MAS ANTIGUA, aquella que hay que eliminar.
Phrase *old( void );


void addString( const char *str ) {
  size_t hash = makeHash( str );

  if( !( List[hash].count ) ) {
    // Caso fácil. No hay cadenas con hash coincidente.
    memcpy( &( List[hash].text ), str, strlen( str ) );
    List[hash].count = 1;
    List[hash].prev = NULL;
    List[hash].next = NULL;

    return;
  }

  // Caso INTERESANTE. La hash COINCIDE con otra.
  // POR HACER.
}

How do I implement point 4 above? Can I continue from the code I already have, or are other modifications necessary?

I reiterate the issue of complexity. The number of operations to perform must be independent of the size of the table.

_{Note: This is a question that was posted a couple days ago and I found it interesting. It is what I could understand with some sense of it, which has been eliminated by its author.}

EDIT 2

I still find it an interesting question; surely there are more good answers, which show the logic to follow in these cases. I add the language C++to the allowed responses.

EDIT 3

As I have been told, the term must be specified O( 1 ). Normally, it would indicate a constant complexity . Since we're talking about a hash table, we'll add the mean tagline ; that is, the chosen algorithm must have, on average , complexity O( 1 ).

Also, to be able to use std::unordered_map, responses in C++11 are supported.

1 Answers

Voted

eferion · Answer 1 · 2020-06-23T12:43:48+08:00

The problem, as you have stated, has no solution:

A hash table does not store information about the age of each string, which makes it impossible for you to identify and discard the oldest elements.
If you design a hash table and then try to store those hashes in a table that doesn't fit the entire spectrum of the hash... What good is the hash then?
What happens if entering the second element of the table collides its hash with that of the first? Do you discard a string when you have 4095 free spaces?

The simplest implementation, in my opinion, is a circular buffer. How is it implemented? Very easy:

typedef struct
{
  char text[256];
} nodo;

#define NUMELEMS 4096

typedef struct
{
  nodo elementos[NUMELEMS];
  int num_elementos;
  int indice;
} buffer;

Well, we have already defined the buffer:

An array that stores 4096 elements, each element being capable of storing a string of 255 characters.
An integer indicating the number of elements in the buffer.
An integer that acts as an index. Index of what? This variable will allow us to write sequentially to the buffer.

Buffer initialization: Simple function...both integers to 0.

void InitBuffer(buffer* b)
{
  b->num_elementos = 0;
  b->indice = 0;
}

Function to increment the index: I don't like to repeat code.

void IncrementaIndice(int* indice)
{
  *indice= (*indice + 1) % NUMELEMS;
}

Insert an element in the buffer: What we are going to do now is replace the value pointed to by the variable indice... then we shift the index one position. As the buffer is circular, if we reach the end of it we have to move the pointer to the beginning. The element counter will have to be updated until we reach the maximum number of elements... once that limit is reached, the oldest elements will begin to be replaced and the number of elements will remain constant

void AddItem(buffer* b, char* str)
{
  strcpy(b->elementos[b->indice],str);

  IncrementaIndice(&b->indice);

  if( b->num_elementos < NUMELEMS )
    b->num_elementos++;
}

How is it used? Easy:

#include <stdio.h>
#include <string.h>

struct nodo
{
  char text[256];
};

#define NUMELEMS 6

struct buffer
{
  struct nodo elementos[NUMELEMS];
  int num_elementos;
  int indice;
};

void InitBuffer(struct buffer* b)
{
  b->num_elementos = 0;
  b->indice = 0;
}

void IncrementaIndice(int* indice)
{
  *indice= (*indice + 1) % NUMELEMS;
}

void AddItem(struct buffer* b, char* str)
{
  strcpy(b->elementos[b->indice].text,str);
  IncrementaIndice(&b->indice);

  if( b->num_elementos < NUMELEMS )
    b->num_elementos++;
}

int main()
{
  struct buffer miBuffer;
  InitBuffer(&miBuffer);

  for( int i=0; i<10; i++ )
  {
    char cad[256];
    scanf("%s",cad);
    AddItem(&miBuffer,cad);
  }

  printf("Numero de elementos: %d\n",miBuffer.num_elementos);
  printf("Cadenas almacenadas:\n");

  int indice = miBuffer.indice;
  for( int i=0; i<miBuffer.num_elementos; i++ )
  {
    printf(" - %s\n",miBuffer.elementos[indice].text);
    IncrementaIndice(&indice);
  }
}

As you can see, inserting an element has an O(1) complexity since it does not require loops of any kind... and the number of elements stored does not matter. In addition, it will always overwrite the oldest values and its management is very simple.

Edition motivated by the always attentive @Trauma... I didn't take into account that they didn't admit duplicates... the solution then goes through integrating, as you comment in the question, a hash system to the circular buffer system:

A possible hash for the strings could be ( source ), I have sloppily adapted it to unsigned short... collisions will be more possible but we reduce the field to cover:

unsigned short get_hash(unsigned char *str)
{
    unsigned short hash = 5381;
    short c;

    while (c = *str++)
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

    return hash;
}

Okay, now that we can generate a hash, it's time to consider how to combine the two worlds... it occurs to me to have two vectors: the circular one that we have already seen and one of hashes. In this way knowing if a hash is already occupied is as simple as looking at the hash table. How are the two lists related? I think it's better to keep it simple... the hash table has one element per possible hash (hence I made the table of type unsigned short). Initially, all values are 0. When an element is going to be inserted, it is checked if the position given by the hash of the string is at 0 or 1... if it is at 1, a string already exists and it is not inserted in the buffer while that if it is 0 we have a free pass.

To correctly manage the hash table, the library must be loaded limits.h. Thus, the size of the hash table will be correct regardless of the system in which the program is compiled:

#include <limits.h>

struct buffer
{
  struct nodo elementos[NUMELEMS];
  char hashes[USHRT_MAX];
  int num_elementos;
  int indice;
};

Now we have to update the initialization... passing a bit of performance (the improvement would be negligible), it already makes sense to use memset:

void InitBuffer(struct buffer* b)
{
  memset(b,0,sizeof(struct buffer));
}

We already have all the bytes of the buffer at 0 (including the hash table)... Let's modify the function to add elements:

void AddItem(struct buffer* b, char* str)
{
  unsigned short strHash = get_hash(str);
  if( b->hashes[strHash] == 0 )
  {
    b->hashes[strHash] = 1;

    char* posicion = b->elementos[b->indice].text;
    if( *posicion != 0 )
      b->hashes[get_hash(posicion)] = 0;

    strcpy(posicion,str);

    IncrementaIndice(&b->indice);

    if( b->num_elementos < NUMELEMS )
    b->num_elementos++;
  }
}

The logic is simple:

It checks that the hash of the new string is 0 and, if so, continues and sets that hash to 1.
If a string is going to be overwritten, its hash is deleted first (remember that there are no duplicates)
The new string is copied
Index and counter are incremented

And to top it off we are missing a function that I had missed before. We need a function that gives us the initial index of the buffer, which will be:

0 while not completely filled
The current index otherwise. It is the next index to be deleted so it is easy to deduce that this is the oldest element.

Said with code:

int inicio(struct buffer* b)
{
  if( b->num_elementos == NUMELEMS )
    return b->indice;
  else
    return 0;
}

And now it only remains to correct the presentation loop of the mainto use this index as initial instead of what was before:

for( int i=0, indice = Inicio(&miBuffer); i<miBuffer.num_elementos; i++ )
{
  printf(" - %s\n",miBuffer.elementos[indice].text);
  IncrementaIndice(&indice);
}

C++11 version

In this case I have tried to force the machine so that it is not a simple translation of the C code:

The list of elements is composed from a doubly linked list. Thus removing an element from the list (duplicate case) does not require expensive operations
To find out if an element is in the list, use a std::unordered_map. This map stores the values found in the list and a pointer to the node that contains the value.
The list is created from std::unique_ptr. Using raw pointers was too easy and the idea was to come up with something totally different... and yes, with std::unique_ptryou can create a doubly linked list.
The list is based on templates, to be able to customize it to your liking.

And well, here I leave you the result of typing for a while. The code can be complex to follow if you don't have a good level of C++.

#include <functional>
#include <iostream>
#include <memory>
#include <unordered_map>

namespace UniquePtrUtils
{
  // alias de puntero a funcion
  template<class T>
  using Functor = std::function<void(T*)>;

  // Alias del unique_ptr, para evitar repetir codigo
  template<class T>
  using UniquePtr = std::unique_ptr<T,Functor<T>>;

  // Funcion dummy. Se usa en algunos unique_ptr
  template<class T>
  void NoDelete(T*)
  { /* no hace nada */ }

  // Obtiene un unique_ptr que apunta al mismo objeto que otro unique_ptr
  // El unique_ptr devuelto no borra el objeto apuntado
  template<class T>
  UniquePtr<T> RefUnique(UniquePtr<T> & original)
  {
    return UniquePtr<T>(original.get(),NoDelete<T>);
  }

  // Utilidad para crear un unique_ptr escribiendo poco
  template<class T>
  UniquePtr<T> NewUnique(T * ptr)
  {
    return UniquePtr<T>(ptr,std::default_delete<T>());
  }
}

// Representa un nodo de la lista
template<class T>
struct Node
{
    // Alias... para no repetir
    using NodePtr = UniquePtrUtils::UniquePtr<Node<T>>;

    T value;          // valor almacenado por el nodo
    NodePtr next;     // puntero al siguiente nodo
    NodePtr previous; // puntero al nodo anterior

    // Constructor del nodo
    Node(T && value)
      : value{std::forward<T>(value)}
    { }

    // Destructor. Quitad el comentario si quereis ver cuando se borran
    // realmente los nodos
    ~Node()
    { /* std::cout << "Delete Node" << std::endl; */ }
};

// Clase que gestiona la lista de elementos
template<class T, int SIZE>
class RingBuffer
{
public:

  using RingNode = Node<T>;
  using RingNodePtr = typename RingNode::NodePtr;

  RingBuffer()
    : m_bufferSize{0}
  { }

  // Añade un nuevo elemento a la lista
  void Add(T && value)
  {
    // Si el elemento ya existe y la lista tiene mas de un elemento
    // (en listas de un elemento no hay que hacer nada)
    // sacamos el nodo antiguo de la lista.
    auto it = m_hash.find(std::forward<T>(value));
    if( it != m_hash.end() )
    {
      if( m_bufferSize == 1 )
        return;

      // Obtenemos una referencia de los nodos anterior y siguiente
      auto prevPtr = Ref(it->second->previous);
      auto nextPtr = Ref(it->second->next);

      // esta operacion permite que el nodo a eliminar se borre el solito
      // cuando el codigo abandone el if
      auto tempPtr = std::move(prevPtr->next);

      // Remapeamos los enlaces del nodo anterior
      nextPtr->previous = Ref(it->second->previous);

      // Si el nodo a sacar es el nodo primario de la lista
      // tenemos que actualizar este nodo primario para evitar
      // que la lista se borre por error
      if( m_first.get() == tempPtr.get() )
      {
        m_first.release();
        m_first = std::move(it->second->next);
        it->second->next = Ref(m_first);

        // Mapeamos una referencia... el ultimo nodo y enlaza con el primero
        // con un puntero que no borre. No queremos tener un borrado doble  
        prevPtr->next = Ref(it->second->next);
      }
      else
      {
        // en caso contrario basta con remapear el nodo siguiente
        prevPtr->next = std::move(it->second->next);
      }

      // sacamos el nodo viejo de la lista
      m_hash.erase(it);

      // y claro, la lista ahora tiene un nodo menos  
      m_bufferSize--;
    }

    // Si la lista esta llena hay que sobreescribir un nodo
    if( m_bufferSize == SIZE )
    {
      m_current = Next(m_current);    // avanzamos el cursor
      m_hash.erase(m_current->value); // quitamos el valor a borrar de la tabla de hash
      m_current->value = std::forward<T>(value); // asignamos el nuevo valor
    }
    else
    {
      // Si la lista no esta llena añadimos un nuevo nodo
      RingNodePtr node = UniquePtrUtils::NewUnique(new RingNode(std::forward<T>(value)));
      m_bufferSize++;
      if( !m_first ) // Si es el primer nodo de la lista...
      {
        m_first = std::move(node);
        m_first->next = Ref(m_first);
        m_first->previous = Ref(m_first);
        m_current = Ref(m_first);
      }
      else
      {
        // La lista tiene al menos un nodo...
        m_current->next = std::move(node);
        m_current->next->previous = Ref(m_current);
        m_current->next->next = Ref(m_first);
        m_first->previous = Next(m_current);

        m_current = Next(m_current);
      }

      // El nuevo valor se añade al mapa de hash
      m_hash.insert(std::make_pair(std::forward<T>(value),Ref(m_current)));
    }
  }

  // Funcion para imprimir la lista
  // Pense en implementar iteradores... pero se me ha hecho tarde
  // Puede ser un ejercicio interesante implementar los iteradores
  void Print()
  {
    if( !m_first )
    {
      std::cout << "Empty list\n";
      return;
    }

    RingNodePtr node = Ref(m_current);
    do
    {
        std::cout << node->value << ' ';
        node = Prev(node);
    } while( node->value != m_current->value );
    std::cout <<'\n';
  }

private:

  size_t m_bufferSize;
  RingNodePtr m_first;
  RingNodePtr m_current;
  std::unordered_map<T,RingNodePtr> m_hash;

  // Utilidad para avanzar al siguiente nodo    
  RingNodePtr Next(RingNodePtr& node)
  {
    return Ref(node->next);
  }

  // Utilidad para retroceder al nodo anterior
  RingNodePtr Prev(RingNodePtr& node)
  {
    return Ref(node->previous);
  }

  // Obtiene un unique_ptr que no borra  
  RingNodePtr Ref(RingNodePtr& node)
  {
    return UniquePtrUtils::RefUnique<RingNode>(node);
  }
};

To see the working example: link

There are easier ways to do this but my intention is not to do it the easy way. I hope other answers will explore new options.

How do I reuse spaces in a hash table, disregarding the oldest ones?

EDIT

EDIT 2

EDIT 3

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?