What is a promise in Javascript?

Question

Asked: 2020-12-06 11:39:47 +0800 CST 2020-12-06 11:39:47 +0800 CST 2020-12-06 11:39:47 +0800 CST

Problems applying a normalization function

772

Here is a very small view of the variables of a database that I am using for an exercise on classification.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41188 entries, 0 to 41187
Data columns (total 21 columns):
age               41188 non-null int64
job               41188 non-null object
marital           41188 non-null object

dtypes: float64(5), int64(6), object(10)
memory usage: 6.6+ MB

I want to normalize the variable ageand for that create a simple function:

def normalize(X):
    NORM = []
    for i in X:
        norm= (i-min(X))/(max(X)-min(X))
        NORM.append(norm)
    return NORM

The problem is that when I try to apply with mapor with applyto create a new variable I get the same error:

log["age_n"]=  log["age"].apply(normalize)
TypeError: 'int' object is not iterable

log["age_n"]=  log["age"].map(normalize)
TypeError: 'int' object is not iterable

Any guidance on this error would be greatly appreciated.

1 Answers

Voted

FJSevilla · Answer 1 · 2020-12-06T12:11:50+08:00

applypass each row to the function and the function returns the corresponding value for that same row. By receiving the value of each row of column age, an integer, you get the error shown when trying to iterate over the with for i in X, since it Xis each observation and not the entire column.

The function should therefore be something like this:

min_age = log["age"].min()
dif_age = log["age"].max() - min_age

def normalize(row):
    return(row - min_age) / dif_age

log["age_n"] = log["age"].apply(normalize)

Or use an anonymous function:

min_age = log["age"].min()
dif_age = log["age"].max() - min_age
log["age_n"] = log["age"].apply(lambda row: (row - min_age) / dif_age)

However, for what you want to do you don't need (and shouldn't) useapply , there is a simpler and more efficient way taking advantage of the vectorization offered by Pandas/NumPy without having to resort to a Python function.

Starting from a reproducible example:

>>> import pandas as pd

>>> data = {"age":     [20,  34,  82,  47,  95,  14,  58], 
            "job":     ["a", "a", "a", "a", "a", "a", "a"], 
            "marital": ["b", "b", "b", "b", "b", "b", "b"]}

>>> log = pd.DataFrame(data)

We can get the new column simply with:

>>> min_age = log["age"].min()
>>> dif_age = log["age"].max() - min_age
>>> log["age_n"] = (log["age"] - min_age) / dif_age
>>> log

   age job marital     age_n
0   20   a       b  0.074074
1   34   a       b  0.246914
2   82   a       b  0.839506
3   47   a       b  0.407407
4   95   a       b  1.000000
5   14   a       b  0.000000
6   58   a       b  0.543210

Problems applying a normalization function

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?