I am trying to calculate the 80th percentile for a store wait time; that is, the value that leaves 80% of the records below it.
I have found an interesting solution in an international SO thread:
https://stackoverflow.com/a/38644788/11030842
And I have successfully applied it to my case:
SELECT
t1.na_store AS na_store,
t1.dt_day AS dt_day,
t1.qt_waiting_time_s AS qt_waiting_time_s_p_80
FROM
(
SELECT
t.id AS id_ticket,
q.description AS na_store,
date(t.issue_time) AS dt_day,
timestampdiff(second, t.issue_time, t.called_in_time) AS qt_waiting_time_s,
@row_num:= @row_num + 1 AS row_num
FROM
ticket AS t
INNER JOIN queue AS q ON t.queue_id = q.id,
(SELECT @row_num:= 0) AS c
ORDER BY
qt_waiting_time_s ASC
) AS t1
WHERE
t1.row_num = round(0.8 * @row_num)
My problem is that this code allows to determine the P80 for all the data, but what if I want to calculate the P80 grouping by store and by day?
It would be necessary to do the count per store and per day instead of using a global counter, that is, grouping already in the subquery, but the truth is that I don't know if it is possible.
Let's see if you can give me a hand, thanks in advance.
Knowing any percentile of a data set implies:
That's the theory, it works either on the total rows or by partitioning the data into groups. From the code of your question, I understand that you are in a version of mysql prior to 8, so you do not have
ROW_NUMBER()
that which is what easily solves point 1 and can also be done by partitioning the groups by store and day, so that you have to find an alternative, which is basically something similar to what you already have, but with the exception that we will have to "reset" the row number every time the store and day change.Let's imagine the problem more conceptually, you have a table similar to this:
tienda
,dia
andtiempo
is the data, to generate an ascending time enumerator partitioned bytienda
ydia
, we can do something like this:One possible way out:
The data is just an example to understand the idea of the numerator. Having this, we have already solved point 1, we have to calculate the total number of rows for each group and then obtain the rows of the indicated percentile. In my opinion, the simplest is to work with temporary tables:
We have two tables, one with each original row and its order number and another with the total rows of each group, we simply have to join them by means of a
join
and carry out the desired filter:Grades:
floor
because the rounding criteria would always be the same