What is a promise in Javascript?

Question

Jose Antonio Dura Olmos

Asked: 2020-12-02 12:59:40 +0800 CST 2020-12-02 12:59:40 +0800 CST 2020-12-02 12:59:40 +0800 CST

Why is an ordered array processed faster than an unordered one?

772

Here is a very peculiar piece of C++ code. For some strange reason sorting the data miraculously causes the code to run 3 times faster.

#include <algorithm>
#include <ctime>
#include <iostream>

int main()
{
    // Generate data
    const unsigned arraySize = 32768;
    int data[arraySize];

    for (unsigned c = 0; c < arraySize; ++c)
        data[c] = std::rand() % 256;

    // !!! Con esto el siguiente bucle se ejecuta más rápido.
    std::sort(data, data + arraySize);

    // Test
    clock_t start = clock();
    long long sum = 0;

    for (unsigned i = 0; i < 100000; ++i)
    {
        // Primary loop
        for (unsigned c = 0; c < arraySize; ++c)
        {
            if (data[c] >= 128)
                sum += data[c];
        }
    }

    double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;

    std::cout << elapsedTime << std::endl;
    std::cout << "sum = " << sum << std::endl;
}

Without std::sort(data, data + arraySize);, the code runs in 12.41 seconds.
With the data sorted it runs in 3.82 seconds.

At first I thought it might be a language or compiler anomaly. So I tried in Java :

import java.util.Arrays;
import java.util.Random;

public class Main
{
    public static void main(String[] args)
    {
        // Generate data
        int arraySize = 32768;
        int data[] = new int[arraySize];

        Random rnd = new Random(0);
        for (int c = 0; c < arraySize; ++c)
            data[c] = rnd.nextInt() % 256;

        // !!! Con esto el siguiente bucle se ejecuta más rápido.
        Arrays.sort(data);

        // Test
        long start = System.nanoTime();
        long sum = 0;

        for (int i = 0; i < 100000; ++i)
        {
            // Primary loop
            for (int c = 0; c < arraySize; ++c)
            {
                if (data[c] >= 128)
                    sum += data[c];
            }
        }

        System.out.println((System.nanoTime() - start) / 1000000000.0);
        System.out.println("sum = " + sum);
    }
}

Which gives similar but less extreme results. 10.7 and 6.2 seconds.

At first I thought it might have to do with sorting the data getting it into the cache, and then I realized that the data had just been generated with what should already be in the cache before sorting.

What's going on?
Why is an ordered array faster than an unordered one?
The code is calculating the sum of independent terms. The order should not influence, in any case, all the elements are traversed and the same ones are always added, giving the same result, whether they are ordered or not.

This is a question translated from the original in English and adapted to the results it gives on my computer: Why is processing a sorted array faster than an unsorted array? by GManNickG

1 Answers

Voted

Jose Antonio Dura Olmos · Answer 1 · 2020-12-02T12:59:40+08:00

You are a victim of bugs in the branch predictor .

What is jump prediction?

Consider a railroad fork:

_{Image of Mechanism, courtesy of Wikimedia Commons. Used under a CC-By-SA 3.0 license .}

Suppose we are in the 19th century - long before remote or radio communications.

You are the operator of a fork and you hear a train approaching. You have no idea what path it is supposed to follow. So you stop the train to ask the pilot for the direction it is going. And you put the needles in the proper direction.

Trains are heavy and have a lot of inertia, so it takes them a long time to stop.

Is there a better way to do it? You could guess the direction the train is headed!

If you guess right, the train continues.
If you miss, the pilot stops, backs up, and yells at you to change the needles. Then he can go the other way.

If you hit every time , the train will never have to stop.
If you fail often , the train will spend a lot of time stopping, going back and restarting.

Consider an if statement At the processor level it is a conditional jump statement.

enter image description here

You are the processor and you see the conditional jump. You have no idea if it will jump or not. To do? Stop execution and wait for the previous instructions to finish. Then you continue on the correct path.

Modern processors are complicated and highly segmented, taking a long time to _start execution_ and _stop execution_.

Is there a better way? Guess if the jump will be made or not!

If you hit, you continue the execution.
If you fail, you have to empty the execution unit and undo what was executed after the jump/no-jump. Then you have to restart by the other route.

If you hit always , the execution is never interrupted.
If you crash often , you spend a lot of time stopping, undoing, and restarting.

This is branch prediction. I admit that it is not the best possible analogy since the pilot of the train could indicate the direction with a pennant. But in computers, the processor doesn't know whether or not the jump will be executed until the last moment.

So what prediction strategy to use to minimize the number of times the train must turn back and go the other way? Looking at a historical! If the train goes left %99 of the time, then you predict left. If it toggles, then you toggle your predictions. If it goes in one direction one out of 3 times, you make the same predictions...

In other words, you try to identify a pattern and follow it. This is more or less how jump predictors work.

Most programs have conditional jumps that behave well. So modern branch predictors have >%90 hits , but when faced with unpredictable conditional branches with no recognizable patterns, branch predictors are virtually useless.

To deepen the topic: "Jump Predictor" Wikipedia article .

The above gives us a clue as to where the problem is, in the statement `if`:

if (data[c] >= 128)
    sum += data[c];

Note that the data is evenly distributed between 0 and 255. When the data is sorted, approximately the first half of the iterations will not go into the statement if. After this, they will always enter the statement if.

This is very good for the branch predictor since the same type of branch is always made many times in a row.

Even a simple saturation counter will correctly predict jumps except for a few iterations after the change.

Quick view:

S = Sí se salta
N = No se salta

data[] = 0, 1, 2, 3, 4, ... 126, 127, 128, 129, 130, ... 250, 251, 252, ...
branch = N  N  N  N  N  ...   N    N    S    S    S  ...   S    S    S  ...

       = NNNNNNNNNNNN ... NNNNNNNSSSSSSSSS ... SSSSSSSSSS  (fácil de predecir)

However, when the data is completely random, the jump predictor is useless because it cannot predict random data.

There will probably be about 50% of failed predictions. (which is no better than randomly predicting)

data[] = 226, 185, 125, 158, 198, 144, 217, 79, 202, 118,  14, 150, 177, 182, 133, ...
branch =   S,   S,   N,   S,   S,   S,   S,  N,   S,   N,   N,   S,   S,   S,   N  ...

       = SSNSSSSNSNNSSSN ...   (completamente aleatorio - imposible de predecir)

What can be done?

If the compiler is not able to optimize conditional branching in a conditional assignment, then there are some tricks if you are willing to sacrifice code clarity for performance.

Replaces:

if (data[c] >= 128)
    sum += data[c];

By:

int t = (data[c] - 128) >> 31;
sum += ~t & data[c];

This removes the conditional jump and replaces it with some bitwise operations.

_{(Note that this trick is not strictly equivalent to the iforiginal statement, but in this case, it is valid for all values of data[].)}

Benchmarks: Core i7 920 @ 3.5 GHz

C++ - Visual Studio 2010 - x64 Release

//  Salto condicional - Aleatorio
segundos = 11.777

//  Salto condicional - Ordenado
segundos = 2.352

//  Sin salto condicional - Aleatorio
segundos = 2.564

//  Sin salto condicional - Ordenado
segundos = 2.587

Java - Netbeans 7.1.1 JDK 7 - x64

//  Salto condicional - Aleatorio
segundos = 10.93293813

//  Salto condicional - Ordenado
segundos = 5.643797077

//  Sin salto condicional - Aleatorio
segundos = 3.113581453

//  Sin salto condicional - Ordenado
segundos = 3.186068823

Observations:

With conditional jump: There is a big difference between execution with ordered and unordered data.
With the catch: There is no difference between ordered and unordered data.
In the case of C++, the trick is actually somewhat slower than conditional jumping when the data is sorted.

As a general rule of thumb, data-dependent conditional jumps should be avoided in critical loops (like the one in this example).

Update :

GCC 4.6.1 with -O3or -ftree-vectorizeon a x64 is capable of generating a conditional mapping. So there is no difference between sorted and unsorted data - both are fast.
VC++ 2010 is unable to generate conditional assignments for this conditional jump even with /Ox.
Intel Compiler 11 does something miraculous. Swaps the two loops , thereby taking the conditional jump to the outer loop. Not only is it immune to prediction failures, it's twice as fast as what VC++ and GCC can generate. In other words, ICC has taken advantage of the test loop to beat the performance test...
If you give the Intel Compiler the code without jumping, it directly vectorizes it... and it's exactly as fast as conditional jumping (with loop swapping)

This shows that even in modern and mature compilers there can be wild differences in their ability to optimize code...

This answer is a translation of the English original by Mystical

Why is an ordered array processed faster than an unordered one?

What is jump prediction?

The above gives us a clue as to where the problem is, in the statement `if`:

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?

Why is an ordered array processed faster than an unordered one?

1 Answers

What is jump prediction?

The above gives us a clue as to where the problem is, in the statement if:

The above gives us a clue as to where the problem is, in the statement `if`: