What is a promise in Javascript?

Question

Patricio Moracho

Asked: 2020-09-24 19:49:41 +0800 CST 2020-09-24 19:49:41 +0800 CST 2020-09-24 19:49:41 +0800 CST

How to improve performance of a geom_histogram()?

772

I am graphing a histogram from a very important set of data using geom_histogram(), I have realized that as the "definition" of it increases by increasing the number of binsbars, the result is slower and slower. The ratio, with a histogram of R base is at least 10 to 1 in time. Example:

library("ggplot2")
library("microbenchmark")
set.seed(2019)
x <- rnorm(100000)
df <- data.frame(x=x)

ggplot_hist <- function(data, bins=100000){
  print(ggplot(data, aes(x=x)) + geom_histogram(bins=bins))
}

base_hist <- function(x, breaks=100000){
  print(hist(x, breaks=length(x)))
}

microbenchmark(
  base_hist(x), 
  ggplot_hist(df), 
  times=3L 
)

Unit: seconds
            expr       min        lq      mean    median        uq       max neval
    base_hist(x)  4.503556  4.632358  4.680143  4.761159  4.768436 4.775713     3
 ggplot_hist(df) 56.330033 57.249490 60.182923 58.168946 62.109369 66.049791    3

Is there a way to optimize a histogram in ggplot?

1 Answers

Voted

Patricio Moracho · Answer 1 · 2020-09-24T19:49:41+08:00

According to the hypothesis of this interesting answer , the bottleneck would be in the calculation of the binsor bars. We can try to test it:

microbenchmark(
  ggplot_hist(df, bins=1), 
  ggplot_hist(df, bins=100), 
  ggplot_hist(df, bins=1000), 
  ggplot_hist(df, bins=10000), 
  ggplot_hist(df, bins=100000), 
  times=3
) -> tiempos
summary(tiempos)[, c(1,4), drop=FALSE]

                           expr       mean
1     ggplot_hist(df, bins = 1)   502.0619
2   ggplot_hist(df, bins = 100)   705.1292
3  ggplot_hist(df, bins = 1000)  1209.0673
4 ggplot_hist(df, bins = 10000)  6143.7755
5 ggplot_hist(df, bins = 1e+05) 55068.0312

We can see that as we incorporate more level of detail, increasing the binstime grows rapidly. On the other hand, if we study the "base" histogram like this:

microbenchmark(
  base_hist(df$x, breaks=1), 
  base_hist(df$x, breaks=100), 
  base_hist(df$x, breaks=1000), 
  base_hist(df$x, breaks=10000), 
  base_hist(df$x, breaks=100000), 
  times=3
) -> tiempos_base

summary(tiempos_base)[, c(1,4), drop=FALSE]
                             expr     mean
1     base_hist(df$x, breaks = 1) 10.49051
2   base_hist(df$x, breaks = 100) 10.46801
3  base_hist(df$x, breaks = 1000) 13.60592
4 base_hist(df$x, breaks = 10000) 16.48910
5 base_hist(df$x, breaks = 1e+05) 13.49401

With the basic histogram, we see that the growth over time as the increases binsis minimal. So the idea, which is proposed in the mentioned answer, is to replace the calculation of the binswith the base function hist()and then draw the bars using a geom_rect(). Let's see:

quick_hist = function(x, bins=100000) {

  res = hist(x, plot=FALSE, breaks=bins)
  dat = data.frame(xmin=head(res$breaks, -1L),
                   xmax=tail(res$breaks, -1L),
                   ymin=0.0,
                   ymax=res$counts)

  print(ggplot(dat, aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax)) +
    geom_rect(size=0.5))
}

print(microbenchmark(
  base_hist(x), 
  ggplot_hist(df), 
  quick_hist(df$x), 
  times=3L 
), signif=3)

Unit: seconds
             expr   min    lq  mean median    uq  max neval
     base_hist(x)  4.75  4.76  7.70   4.76  9.17 13.6     3
  ggplot_hist(df) 57.30 57.40 57.60  57.40 57.80 58.1     3
 quick_hist(df$x)  6.45  6.64  9.57   6.83 11.10 15.4     3

And we see that with the ad-hoc function quick_hist()we have managed ggplotto improve the performance of the histogram in a radical way and with a very similar visual result.

How to improve performance of a geom_histogram()?

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?