What is a promise in Javascript?

Question

AlejandroR

Asked: 2022-07-25 06:51:05 +0800 CST 2022-07-25 06:51:05 +0800 CST 2022-07-25 06:51:05 +0800 CST

Elimination of outliers

772

I have created a function that removes the atypical values on the column of a dataframe by passing the df and the column as parameters:

import numpy as np
def outlier(df, col_name):
    q1 = np.percentile(np.array(df[col_name].tolist()), 25)
    q3 = np.percentile(np.array(df[col_name].tolist()), 75)
    IQR = q3 - q1
                      
    Q3 = q1+(3*IQR)
    Q1 = q3-(3*IQR)
    outlier_num = 0
                      
    for value in df[col_name].values.tolist():
        if (value < Q1) | (value > Q3):
            outlier_num +=1
    return Q1, Q3, outlier_num

The problem is when trying to pass the parameters:

df_covtype = df_covtype[(df_covtype['column_name'] > outlier(df_covtype, 'column_name')[0]) &
              (df_covtype['colum_name'] < outlier(df_covtype, 'column_name')[1])]

It tells me the following:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-122-e4962bb5c2b0> in <module>()
----> 1 df_covtype = df_covtype[(df_covtype['column_name'] > outlier(df_covtype, 'column_name')[0]) &
      2               (df_covtype['column_name'] < outlier(df_covtype, 'column_name')[1])]
      3 df_covtype.shape

1 frames
<ipython-input-119-f1e12f2fd893> in outlier(df, col_name)
      2 import numpy as np
      3 def outlier(df, col_name):
----> 4     q1 = np.percentile(np.array(df[col_name].tolist()), 25)
      5     q3 = np.percentile(np.array(df[col_name].tolist()), 75)
      6     IQR = q3 - q1

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in __getattr__(self, name)
   5485         ):
   5486             return self[name]
-> 5487         return object.__getattribute__(self, name)
   5488 
   5489     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'tolist'

If anyone can give me a hand, I'd appreciate it. Greetings and thank you

1 Answers

Voted

HeytalePazguato · Answer 1 · 2022-07-25T07:38:58+08:00

Good day,

There is a slightly easier way to do what you are looking for using pandas.DataFrame.quantileandpandas.DataFrame.apply

First I'll generate one dataframefor the example:

df = pd.DataFrame(np.random.randn(100, 3), columns = ['A', 'B', 'C'])

This returns a dataframe3 column 100 row random value

Now we get the low and high bounds to detect the outliers

low = 0.25
high = 0.75

quant_df  = df.quantile([low, high])

This will give us something dataframelike the following:

        A           B           C
0.25    -0.675637   -0.552684   -0.823368
0.75    0.754698    0.863303    0.343633

Indicating the limits for each column

Finally we use apply()in the dataframeoriginal using the limits obtained in the dataframeprevious

df.apply(lambda x: x[(x > quant_df.loc[low,x.name]) & (x < quant_df.loc[high,x.name])], axis=0)

Note: If you want to include the limits then you must use >=and <=respectively, in the above example the limits are excluded

Note 2: If you want to exclude any column you must do it before applying the limits, you can do it in the following way:

filt_df = df.loc[:, df.columns != 'Columna_excluida']

And in this case we would do the applyen filt_dfinstead of the dataframeoriginal

Finally, if you want to delete the rows that contain anything NaNcaused by it, applyyou can do it as follows:

df.dropna(inplace=True)

Full example:

import pandas as pd
#Numpy lo utilizo para generar el dataframe ejemplo
import numpy as np

#Gererar el dataframe
df = pd.DataFrame(np.random.randn(100, 3), columns = ['A', 'B', 'C'])

low = 0.25
high = 0.75

quant_df  = df.quantile([low, high])

df = df.apply(lambda x: x[(x > quant_df.loc[low,x.name]) & (x < quant_df.loc[high,x.name])], axis=0)

#Opcional para eliminar filas con NaN en cualquier columna
df = df.dropna(inplace=True)

Elimination of outliers

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?