I have created a dataframe with giving me outers joins
the following table:
As you can see, except customer_id
for the other variables, they are repeated in a block of three. I made this like this because I just needed to create new conditional columns for acl
and monto
as follows:
merged_outer1['Delta_Monto'] = np.where((merged_outer1['monto_x']+merged_outer1['monto_y']+merged_outer1['monto'])>=1700, 1, 0)
merged_outer1['Delta_Acl'] = np.where((merged_outer1['acl_x']+merged_outer1['acl_y']+merged_outer1['acl'])>=9, 1, 0)
Giving me satisfactorily Delta_Monto
and Delta_Acl
:
However what I require for the column pendiente
is a different condition. If in any record there is pendiente
a change of sign in any of the three columns, create a column called Delta_Pendiente
with a 1
otherwise (that is, if there is no change of sign) mark with 0
. Could someone guide me on how to perform this conditional. First of all, Thanks.
What you need is to use the DataFrames method.
.apply()
This method allows you to apply an operation by vector , that is, by rows or by columns, and obtain a result by combining it if we combine it with a functionlambda
.To explain it with a reproducible problem , I am going to create an example of how it would be with your data. Let's suppose we have the following problem:
Perfect, first let's create a
DataFrame
with random data:Now we have a dataframe like this, with 100 students:
How to use apply()?
The method
apply()
needs two parameters:func
: the function that we want to apply by vector, in this case it is a conditional, that marks whether or not we pass the course and returns one or zero.axis
: to which vectors we want to apply this operation, in this case, since we want to iterate through each row, it will be to the row vectors, so it will beaxis=1
(if we put 0, it will be by column).All we have to do is create a function that returns one or zero based on whether the corresponding subjects have been passed, so I create the function
pasa_de_curso()
:We already have everything we need to use
.apply()
so we create a new column of zeros and ones based on our function:Why do we use lambda?
As we said previously, it
apply()
takes an entire vector (a row in our case) to apply an operation to, but we don't want to apply an operation to that entire row, we want to pick certain elements of the row. Because our functionpasa_de_curso()
will be wrapped in another function (lambda
).What is happening is the following:
apply()
returns the first row to the functionlambda
.lambda
contains a function calledpasa_de_curso()
the one thatlambda
delivers the row, we, through the indices, select those values of the row that we want and they will be passed topasa_de_curso()
pasa_de_curso()
and the result is returned, which will finally be stored in our new variable.Finally, you can see all this in the official Pandas documentation