What is a promise in Javascript?

Question

Asked: 2020-08-13 11:49:10 +0800 CST 2020-08-13 11:49:10 +0800 CST 2020-08-13 11:49:10 +0800 CST

Class intervals and absolute frequency

772

I have the following dataframe. Dataframe with observation data.

import numpy as np
data = np.array([[15, 38, 14, 13, 29, 25], [20, 13, 16, 32, 44, 39], [45, 46, 19, 23, 24, 18], 
                 [19, 20, 21, 18, 25, 33], [13, 18, 22, 24, 27, 27] ])
# Creating pandas dataframe from numpy array
datos = pd.DataFrame({'Column1': data[:, 0], 'Column2': data[:, 1], 'Column3': data[:, 2], 'Column4': data[:, 3], 'Column5': data[:, 4]})


# Deuelve una lista por cada fila
lista_datos = datos.values.tolist()

# Ordenación de los datos.
lista_ordenada_ventas = np.sort(lista_datos, axis=None)

# Crear un narray con una lista
array_datos_ordenados = np.array(lista_ordenada_ventas.reshape(datos.shape[0], datos.shape[1]))

# Crear un df con un narray
ventas_semanales_ordenadas = pd.DataFrame(array_datos_ordenados)

# Crear un df de una única columna, con una lista
ventas =  pd.DataFrame(lista_ordenada_ventas)  
ventas.columns = (["Valores"])

I create a DataFrame with the limits of the class intervals applicable to said observations for analysis.

datos = np.zeros((6, 2))
intervalos = pd.DataFrame(datos, columns = ["LimInf", "LimSup"] )

intervalos.iloc[0,0] = 13
c = 6
intervalos.iloc[0, 1] = 13 + c -1
for i in range (1, intervalos.shape[0]):
    intervalos.iloc[i,0] = intervalos.iloc[i-1,1] + 1
    intervalos.iloc[i, 1] = intervalos.iloc[i,0] + c -1
intervalos

We continue to create exact limits,

intervalos["LimExacInf"] = 0
intervalos["LimExacSup"] = 0
# Calcula el límite exacto superior
for i in range (0, intervalos.shape[0]):
    intervalos.loc[i , "LimExacSup"]  = (intervalos.loc[i, "LimSup"]  + intervalos.loc[i, "LimInf"])/2
for i in range (1, intervalos.shape[0]):
    intervalos.loc[i , "LimExacInf"]  = intervalos.loc[i-1, "LimExacSup"] 

c = intervalos.loc[1, "LimExacSup"] - intervalos.loc[1, "LimExacInf"] 
intervalos.loc[0, "LimExacInf"] = intervalos.loc[0, "LimInf"] 
intervalos["MarcaClase"] = (intervalos["LimExacSup"] + intervalos["LimExacInf"]) / 2
intervalos

Next I want to add the "AbsoluteFrequency" column, counting in the table of variables how many are included within each range, which meet the condition of being >= the lower limit of the interval or < than the upper limit. I try with this script, and it gives me an error that I can't quite interpret.

frecuencia = 0
intervalos["FrecAbsolutas"]= 0
lista_frecuencias= [0]
for i in range (0, ventas.shape[0]):
    for j in range (0, intervalos.shape[0]): 
        if ventas.iloc[i,0] >= intervalos.iloc[j, 3] or ventas.iloc[i,0] < intervalos.iloc[j, 4]:
            lista_frecuencias[i] = lista_frecuencias[i] + 1

intervalos["FrecAbsolutas"]= lista_frecuencias

intervalos

The absolute frequencies should be: 5, 8, 5, 2, 1 and 3 Returns the error: IndexError: list index out of range I would appreciate suggestions to modify the script.

On the other hand, will there be any function in pandas, scipy, etc, that performs this task? I will appreciate your help.

3 Answers

Voted

FJSevilla · Answer 1 · 2020-08-14T07:12:11+08:00

You can make use of pandas.IntervalIndexto generate the class intervals and pandas.cutto segment the data (and calculate the absolute frequencies in this case):

import numpy as np
import pandas as pd


data = np.array(
    [[15, 38, 14, 13, 29, 25], [20, 13, 16, 32, 44, 39],
     [45, 46, 19, 23, 24, 18], [19, 20, 21, 18, 25, 33],
     [13, 18, 22, 24, 27, 27]]
    )

datos = data.flatten()


freq = 6                 # Amplitud de los intervalos
inf = datos.min()        # Limite inferior del primer intervalo
dif = (datos.min() - datos.max()) % freq or freq
sup = datos.max() + dif  # Limite superior del último intervalo

intervals = pd.interval_range(
    start=inf,
    end=sup,
    freq=freq,
    name="Intervalo",
    closed="left"
    )

df = pd.DataFrame(index=intervals)
df["FreqAbs"] = pd.cut(datos, bins=df.index).value_counts()
df["Marca"]  = df.index.mid

           FreqAbs  Marca
Intervalo                
[13, 19)         9   16.0
[19, 25)         9   22.0
[25, 31)         5   28.0
[31, 37)         2   34.0
[37, 43)         2   40.0
[43, 49)         3   46.0

If you want to have the limits in two columns you can do:

df["LimInf"] = df.index.left
df["LimSup"] = df.index.right

           FreqAbs  Marca  LimInf  LimSup
Intervalo                                
[13, 19)         9   16.0      13      19
[19, 25)         9   22.0      19      25
[25, 31)         5   28.0      25      31
[31, 37)         2   34.0      31      37
[37, 43)         2   40.0      37      43
[43, 49)         3   46.0      43      49

Or using the Sturges rule as you comment we can do something like this:

import numpy as np
import pandas as pd
import math


data = np.array(
    [[15, 38, 14, 13, 29, 25], [20, 13, 16, 32, 44, 39],
     [45, 46, 19, 23, 24, 18], [19, 20, 21, 18, 25, 33],
     [13, 18, 22, 24, 27, 27]]
    )

pd.set_option('precision', 2)
datos = data[:,:].flatten()


# Cálculo del número de intervalos
# Si la parte entera de k es un número impar, redondeamos a la baja
k = 1 + 3.322 * math.log10(len(datos))
numero = int(k)
if numero % 2 == 0:
    periodos = math.ceil(k)
else:
    periodos = int(k)

inf = datos.min()        # Limite inferior del primer intervalo
dif = datos.max()
sup = datos.max() + 1    # Limite superior del último intervalo

intervals = pd.interval_range(
    start=inf,
    end=sup,
    periods=k,
    name="Intervalo",
    closed="left")

df = pd.DataFrame(index=intervals)
df["FreqAbs"] = pd.cut(datos, bins=df.index).value_counts()
df["Marca"]  = df.index.mid

              FreqAbs    Marca
Intervalo                 
[13.0, 19.8)       11     16.4
[19.8, 26.6)        9     23.2
[26.6, 33.4)        5     30.0
[33.4, 40.2)        2     36.8
[40.2, 47.0)        3     43.6

I add one to the maximum (46) because otherwise, since the interval is closed by its upper limit [..., 46), observation 46 would be outside the interval.

efueyo · Answer 2 · 2020-08-14T11:42:56+08:00

For here are the shots!. The flatten() function, new to me, flattens the nparray to a single list. I have modified it

datos = data[:,:].flatten()

since with data[:,:-1] , you forget the last variable of each list of the nparray, and that affects the calculation of the absolute frequencies. The "interval_range()" function , new to me, builds class intervals of predefined width, with exact upper and lower bounds on each interval, within an empty dataframe.

The "cut" function , also new to me, is very handy as it somehow sorts the variables within each range of the created df. What I don't quite understand is how it calculates the frequencies. The sum of the absolute frequencies has to be equal to the number of variables, in this case 30. Using your statement

datos = data[:,:-1].flatten()

the sum of absolute frequencies gives 19. If I modify said statement for the reason mentioned above, it returns us.

    FreqAbs     Marca   LimInf  LimSup
Intervalo               
(13, 19]    8   16.0    13  19
(19, 25]    9   22.0    19  25
(25, 31]    3   28.0    25  31
(31, 37]    2   34.0    31  37
(37, 43]    2   40.0    37  43

The sum of frequencies is in this case 24, which would not be correct either. There are values that escape counting. For example, in the interval 13-19 we actually have 9 variables.

efueyo · Answer 3 · 2020-08-14T12:09:22+08:00

To determine the number of intervals, knowing the size of the sample, in this case M = 30 variables, the Sturges rule is applied: In our case, it is better to use the period parameter instead of the freq parameter

k = 1 + 3,322 * log10 (N)
periodos = math.ceil(k)
intervals = pd.interval_range(start=inf, end=sup, periods= k, name="Intervalo")

With what we get an exact upper limit for the last interval, correct according to the maximum variable.

Intervalo
(13.0, 18.5]
(18.5, 24.0]
(24.0, 29.5]
(29.5, 35.0]
(35.0, 40.5]
(40.5, 46.0]

However, the sum of absolute frequencies is still incorrect.

Class intervals and absolute frequency

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?