What is a promise in Javascript?

Question

Patricio Nicolas

Asked: 2020-12-17 16:34:05 +0800 CST 2020-12-17 16:34:05 +0800 CST 2020-12-17 16:34:05 +0800 CST

Most efficient way to compare two arrays

772

I would like to know what is the most efficient way to compare the contents of two arrays, to do this in the least amount of time and to use the least amount of resources.

I present this example in Python of how I do it on a daily basis. For this example, there are few elements, it takes very little time, but as we add elements, the times increase exponentially.

If you want to give an example in another welcome language, the idea is to know the logic of how to do it over the language.

Here the example:

#!/usr/bin/python
# -*- coding: utf-8 -*-

arreglo1   = [1,2,3,4,5,6]
arreglo2   = [0,2,4,6,8,10]
repetidos  = []

for x in arreglo1:
    for y in arreglo2:
        if x == y:
            repetidos.append(x)

print "Los Repetidos son"
for z in repetidos:
    print z

The output is: The Repeats are 2 4 6

5 Answers

Voted

dwarandae · Answer 1 · 2020-12-20T22:07:30+08:00

Since the question has the tag I lenguaje-agnósticowill answer the question as a computational problem and from an algorithmic analysis.

The computational problem that it tries to solve is the following: given an array A of n elements and an array B of m elements with n and m nonnegative integers, return an array C with k elements where each kth element of this array C belongs to both to array A and to array B. It is clear that if the array C is empty it means that there are no elements common to both arrays.

A trivial solution to this problem in pseudocode is the following:

REPETIDOS(A[0..n-1],B[0..m-1])
//Compara dos arreglos para identificar elementos comunes
//Entrada: dos arreglos A y B a comparar
//Salida: un arreglo C que contiene los elementos comunes a ambos arreglos
1. crear C, un arreglo vacío
2. para i ← 0 hasta i ← n -1
3.     para j ← 0 hasta j ←  m -1
4.        si A[i] = B[j]
5.            añada el elemento A[i] al arreglo C
6. retorne C

The trivial solution consists in comparing all the elements of the array B with each one of the elements of the array A and adding those that are repeated in the array C and returning them. This solution is essentially the same as the one proposed in the Python code that accompanies the question.

For this example, there are few elements, it takes very little time, but as we add elements, the times increase exponentially.

Let us clarify the previous statement. From the pseudocode above we find a nested double loop. The first loop (line 2.) is executed n times (because array A has n elements), for every time that line is executed line 3. is executed m times (because array B has m elements. This means that the number of comparisons carried out by this procedure is n times m.The analysis of the algorithm then shows that its complexity is O(nm)(This complexity is an abstraction that, under a defined computational model, allows us to determine the execution time of the algorithm for a large input size). If we assume that both arrays have the same number of elements n , the complexity of the algorithm is O(n^2) . That is, the complexity of the algorithm is not 'exponential' as it colloquially mentions, but quadratic with respect to n , the size of the input of the problem.

Given the above context the question now is: is there an algorithm better than O(n^2) to compare two arrays of size n? . The answer is yes.

If you add the elements of the first array to a data structure called Hash and then iterate through the elements of the second array to see if they are in the structure, if so, you add it to array C.

REPETIDOSCONHASH(A[0..n-1],B[0..m-1])
//Compara dos arreglos para identificar elementos comunes
//Entrada: dos arreglos A y B a comparar
//Salida: un arreglo C que contiene los elementos comunes a ambos arreglos
1. crear C, un arreglo vacío y D una estructura hash
2. para i ← 0 hasta i ← n -1
3.     agregue A[i] a D.
4. para j ← 0 hasta j ← m -1
5.     si B[j] se encuentra en D
6.         añada el elemento B[j] al arreglo C
7. retorne C

The operations of adding and querying elements to a hash structure have computational complexity O(1) , an amortized analysis shows that in average cases the insertion and query take constant time . For the above algorithm, line 2 is executed n times (we need to add n elements of array A to hash D , for each execution of line 2, line 3 is executed 1 time (because inserting into structure takes constant time) Line 4 is executed mtimes and for each execution we take constant time to verify if the m-th element of the array is in the hash structure, if so, it is clear that the element is repeated and we add it to array C.

From the above it is clear that the computational complexity of the new algorithm is O(n) , that is, it is linear with respect to the size of the array elements. This is a substantial improvement to the previous algorithm with complexity O(n^2) , remember that the goal of setting the complexity is to identify the execution time when the size of the input tends to larger and larger values.

Then a question arises, is it possible to overcome the previous algorithm? , the answer is no . Clearly, in order to identify which elements are repeated in both arrays, we must go through some of the arrays at least once, so at least O(n) operations must be performed.

Other algorithm implementations allow for decent computational complexities. One possible strategy is to sort both arrays and do a linear traversal comparing each of the elements (not in pairs like the first algorithm, but using two iterators that guarantee to traverse the two arrays in O(n+m) , or O(n If we assume that we use an ordering algorithm of order O(nlogn) , the complexity of this algorithm is precisely O(nlogn) .Another possibility is to assume that both A and B are ordered, so the computational complexity is , using the algorithm above, O(n) .

In short: the most efficient way to compare two arrays, without making any additional assumptions, is to use a Hash, which guarantees a computational complexity in most cases of O(n) .

FJSevilla · Answer 2 · 2020-12-17T17:47:28+08:00

There is no general method that is the most efficient. It will depend on many factors, but a very important one is going to be the language used. For each language it is possible that there is a more efficient way than another. A Python list is not the same as a C array or a C++ Vector. There are always general ideas such as avoiding the number of complete iterations over the array as much as possible, using hash tables if possible, etc. but it will depend on each language and even on the data type.

Speaking specifically of Python and based on your example, the problem with using that method is that for each element arreglo1you loop through the arreglo2integer. In Python, a much more efficient way to see the elements that are present in two iterables simultaneously is to use intersection()the sets( set) method, which takes two sets and returns another set with the elements present in both:

#!/usr/bin/python
# -*- coding: utf-8 -*-

arreglo1   = [1,2,3,4,5,6]
arreglo2   = [0,2,4,6,8,10]
repetidos = set(arreglo1) & set(arreglo2)

print "Los Repetidos son"
for z in repetidos:
    print z

Edition:

I add a small comparison using two lists of 100,000 elements each. I compare your code, with a more efficient alternative using inand list comprehensions and with set.intersection(). All three functions return a list.

#!/usr/bin/python
# -*- coding: utf-8 -*-
import random
from time import time

def ciclo_for(a, b):
    repetidos = []
    for x in a:
        for y in b:
            if x == y:
                repetidos.append(x)
    return repetidos

def set_intersection(a, b):
    repetidos = list(set(a) & set(b))
    return repetidos

def list_comprehensions(a, b):
    return [n for n in a if n in b]

arreglo1 = [n for n in xrange(0,100000)]
arreglo2 = [n for n in xrange(0,200000,2)]

t0 = time()
ciclo_for(arreglo1, arreglo2)
print 'ciclo for:', time()-t0

t0 = time()
list_comprehensions(arreglo1, arreglo2)
print 'list_comprehensions:', time()-t0

t0 = time()
set_intersection(arreglo1, arreglo2)
print 'set_intersection:', time()-t0

To what has been said before, it must be added that the method appendof lists subtracts even more efficiency, if possible. The results leave little doubt:

for loop: 199.760999918
list_comprehensions: 92.1570000648
set_intersection: 0.0139999389648

Marcos · Answer 3 · 2020-12-17T17:52:17+08:00

Marcos

2020-12-17T17:52:17+08:002020-12-17T17:52:17+08:00

in PHPexistsarray_intersect

$array1 = array(1,2,3,4,5,6);
$array2 = array(0,2,4,6,8,10);
$result = array_intersect($array1, $array2);
print_r($result);

// Salida
Array
(
    [1] => 2
    [3] => 4
    [5] => 6
)

in Pythonexistsintersection

array1 = [1,2,3,4,5,6]
array2 = [0,2,4,6,8,10]

array3 = set(array1).intersection(array2)
print array3

// Salida
set([2, 4, 6])

3

Yoel Rodriguez · Answer 4 · 2020-12-17T17:13:35+08:00

Yoel Rodriguez

2020-12-17T17:13:35+08:002020-12-17T17:13:35+08:00

In php there is a function that does it is array_diff() . What it does is compare two arrays and remove the ones that don't match. Ex:

$array1    = array("a" => "a", "b", "c", "d");
$array2    = array("b" => "a", "b", "d");

$resultado = array_diff($array1, $array2);

print_r($resultado);

2

Samael Olascoaga · Answer 5 · 2020-12-17T17:53:46+08:00

Samael Olascoaga

2020-12-17T17:53:46+08:002020-12-17T17:53:46+08:00

This is how I would do it:

arreglo1   = [1,2,3,4,5,6]
arreglo2   = [0,2,4,6,8,10]
repetidos  = []

print("Elementos repetidos: ")
for elemento in arreglo1:
    if elemento in arreglo2:
        repetidos.append(elemento)
        print(elemento)

I tried both algorithms with array1 = 1-6000 from 1 to 1 and array2 = 1-10000 from 2 to 2.

Your implementation takes: 6,003 seconds, the one I propose takes: 2,001 seconds.

All the best.

1

Most efficient way to compare two arrays

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?