Here is a very small view of the variables of a database that I am using for an exercise on classification.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41188 entries, 0 to 41187
Data columns (total 21 columns):
age 41188 non-null int64
job 41188 non-null object
marital 41188 non-null object
dtypes: float64(5), int64(6), object(10)
memory usage: 6.6+ MB
I want to normalize the variable age
and for that create a simple function:
def normalize(X):
NORM = []
for i in X:
norm= (i-min(X))/(max(X)-min(X))
NORM.append(norm)
return NORM
The problem is that when I try to apply with map
or with apply
to create a new variable I get the same error:
log["age_n"]= log["age"].apply(normalize)
TypeError: 'int' object is not iterable
log["age_n"]= log["age"].map(normalize)
TypeError: 'int' object is not iterable
Any guidance on this error would be greatly appreciated.
apply
pass each row to the function and the function returns the corresponding value for that same row. By receiving the value of each row of columnage
, an integer, you get the error shown when trying to iterate over the withfor i in X
, since itX
is each observation and not the entire column.The function should therefore be something like this:
Or use an anonymous function:
However, for what you want to do you don't need (and shouldn't) use
apply
, there is a simpler and more efficient way taking advantage of the vectorization offered by Pandas/NumPy without having to resort to a Python function.Starting from a reproducible example:
We can get the new column simply with: