What is a promise in Javascript?

Question

Alberto Tutxo

Asked: 2022-07-04 01:45:00 +0800 CST 2022-07-04 01:45:00 +0800 CST 2022-07-04 01:45:00 +0800 CST

Subtract two values from the same column in a Pandas dataframe

772

From this DataFrame with name df

samples	target	CT	Ct Mean
C41	B2M	20.642399	20.680149
C41	B2M	20.717901	20.680149
C42	ULK1	29.097883	29.110802
C42	ULK1	29.123722	29.110802
C43	TBP	22.126412	21.4221565
C43	TBP	20.717901	21.4221565

The Ct Mean column is the average of the two values of the Ct column of the same Sample and Target (this is given by default by the .xlsx that I imported into pandas).

My intention is to check if the difference between the two values of the Ct column for the same sample and target is not greater than +-1 two to two. For example, for C41 (B2M) the difference between 20.642399 and 20.717901 is less than 1, so it should return the value of Ct as is. Instead, for C43 (TBP) the difference between 22.126412 and 20.717901 is greater than 1 and replace the two Ct values for C43 with "Undetermined". The result that should give me would be this:

samples	target	CT	Ct Mean
C41	B2M	20.642399	20.680149
C41	B2M	20.717901	20.680149
C42	ULK1	29.097883	29.110802
C42	ULK1	29.123722	29.110802
C43	TBP	Undetermined	Undetermined
C43	TBP	Undetermined	Undetermined

I have tried in various ways to make a subtraction between two elements of the same column of a dataframe but I have not been able to. The first was to apply a loop for that column that would make the difference between the two values by making a jump so that it would then do the following two:

def loop(i):
    for i in range(0,96,2):
        if i-(i+1)>1 or i-(i+1)<(-1):
                i=="Undetermined"
        else:
            return i
prueba = df["Ct"].apply(loop)
prueba

Print:

0     0
1     0
2     0
3     0
4     0
     ..
91    0
92    0
93    0
94    0
95    0
Name: Ct, Length: 96, dtype: int64

NOTE* My dataframe has 96 rows. I have only put a head with the first 6 for the example. When printing, it gives me all 0. I have been searching and I saw that there is a method .diffthat allows subtracting the value of an element minus the value of the previous element, but I don't know how to apply it. Another way I thought is to use:

df["Ct"].sub(df[0,len(df),2], axis=0)

Obviously it gives an error and the syntax is not correct either.

2 Answers

Voted

abulafia · Answer 1 · 2022-07-04T03:36:51+08:00

Solution

def myfunc(g):
  if any(g.Ct.diff().abs()>1):
    g["Ct Mean"] = "Undetermined"
    g["Ct"] = "Undetermined"
  return g

df = df.groupby(["Sample", "Target"]).apply(myfunc)

show

If dfit initially contains:

  Sample Target         Ct     Ct Mean
0    C41    B2M  20.642399   20.680149
1    C41    B2M  20.717901   20.680149
2    C42   ULK1  29.097883   29.110802
3    C42   ULK1  29.123722   29.110802
4    C43    TBP  22.126412  21,4221565
5    C43    TBP  20.717901  21,4221565

the result of the above code produces:

  Sample Target            Ct       Ct Mean
0    C41    B2M     20.642399     20.680149
1    C41    B2M     20.717901     20.680149
2    C42   ULK1     29.097883     29.110802
3    C42   ULK1     29.123722     29.110802
4    C43    TBP  Undetermined  Undetermined
5    C43    TBP  Undetermined  Undetermined

How does it work

As you can see, everything is resolved in one line:

df.groupby(["Sample", "Target"]).apply(myfunc)

What this does is to group the dataframe by "Sample" and "Target" so that it gathers in several "sub-dataframes" the rows that have the same value in "Sample" and "Target". The function myfunc.

Therefore, this function receives in its parameter ga group, which is actually a dataframe but "filtered" so that it has only a couple of elements with the same Sample and Target, at least in this case they are only a couple of elements. More generally, they receive a dataframe with an arbitrary number of rows, with the same columns as the dforiginal, and with the same value of "Sample" and "Target" in all rows.

What the function does is determine if in that group the value of the "Ct Mean" column must be changed to put "Undetermined" or it must be left as it was. Then it returns the group in question, so that .apply()it can be concatenated again with the remaining groups to create the dataframe with the result.

The key to determining whether or not to put "Undetermined" is the following line:

  if any(g.Ct.diff().abs()>1):

g.Ctis the Ct column of the received group. When applying .diff()it, the previous one is subtracted in that column from each element. The first one does not have a previous one, so the result is NaN, but in the following ones the result will be the difference. Thus, we have a column of differences. To that column it is applied .abs()to keep the absolute value so that the sign does not influence. Therefore, there is a column of numbers (the first of them NaN).

The column is compared with >1what gives us a new column but this time of booleans. For each element of the difference that is greater than 1, there will be a True(and the rest will be False). The first that is NaNwill Falsealways give. In your case there will only be one more element (because there are only two rows in each group), but in general we could have a column with many booleans.

That column is passed to any()which returns Trueif there is at least one Trueamong the elements. Only if all are will it Falsereturn False.

The result is that if among the values returned by .diff() there is one greater than 1, ifit will be executed and then it will do:

g["Ct Mean"] = "Undetermined"
g["Ct"] = "Undetermined"

which assigns that value to the entire column, that is, to all rows in that group. If ifhe is not fulfilled, ghe is not touched.

Finally the function returns g(whether it has been modified or not).

César González · Answer 2 · 2022-07-04T04:18:51+08:00

You could use diff but then you would have to retrieve the second elements of each group to see the difference and substitute the values of each of them. Perhaps conceptually it is a bit more of a mess, so first I will propose an alternative and then I will explain how to do it with diff.

Option 1: Group by

What you are looking for is to group by the sample column, calculate the difference and return a mask indicating whether the difference is greater than 1 or not. With that mask you would modify the values of your dataframe in those in which the condition is met:

# creamos la función de comparación
def check_diff(series):
    values = series.tolist()
    return abs(values[0] - values[1]) > 1

# agrupamos utilizando como key la columna Sample y nuestra función de agregación
df_check = df[['Sample', 'Ct']].groupby('Sample').agg({'Ct': check_diff})

# Combinamos los dos dataframes
df = pd.merge(df, df_check, how='left', on='Sample')

At this point you have this dataframe

samples	target	Ct_x	Ct Mean	Ct_y
C41	B2M	20.6424	20.6801	False
C41	B2M	20.7179	20.6801	False
C42	ULK1	29.0979	29.1108	False
C42	ULK1	29.1237	29.1108	False
C43	TBP	22.1264	21.4222	True
C43	TBP	20.7179	21.4222	True

The Ct_y column tells us if the difference is greater than 1, so we could now replace the values

# Por ejemplo puedes hacer algo así (recuerda importar numpy)
df['Ct'] = np.where(df['Ct_y'] == True, 'Undetermined', df['Ct_x'])
df['Ct Mean'] = np.where(df['Ct_y'] == True, 'Undetermined', df['Ct Mean'])

df.drop(columns=['Ct_x', 'Ct_y'], inplace=True)

Our dataframe would look like this:

samples	target	Ct Mean	CT
C41	B2M	20.680149	20.642399
C41	B2M	20.680149	20.717901
C42	ULK1	29.110802	29.097883
C42	ULK1	29.110802	29.123722
C43	TBP	Undetermined	Undetermined
C43	TBP	Undetermined	Undetermined

You could really simplify it by bypassing the column with True and False and directly returning the difference, which would look something like this:

def get_diff(series):
    values = series.tolist()
    return abs(values[0] - values[1])


df_check = df[['Sample', 'Ct']].groupby('Sample').agg({'Ct': get_diff})

df = pd.merge(df, df_check, how='left', on='Sample')

df['Ct'] = np.where(df['Ct_y'] > 1, 'Undetermined', df['Ct_x'])
df['Ct Mean'] = np.where(df['Ct_y'] > 1, 'Undetermined', df['Ct Mean'])
df.drop(columns=['Ct_x', 'Ct_y'], inplace=True)

Option 2: Diff
First we calculate diff (assuming the dataframe is sorted) and remove duplicates to keep the last element:

df['Diff'] = df['Ct'].diff()

df_check = df.drop_duplicates('Sample', keep='last')
df_check = df_check[['Sample', 'Diff']]

samples	diff
C41	0.075502
C42	0.025839
C43	-1.40851

With this dataframe we can now operate exactly the same as in option 1

df = pd.merge(df, df_check, how='left', on='Sample')

df['Ct'] = np.where(df['Ct_y'] > 1, 'Undetermined', df['Ct_x'])
df['Ct Mean'] = np.where(df['Ct_y'] > 1, 'Undetermined', df['Ct Mean'])
df.drop(columns=['Ct_x', 'Ct_y'], inplace=True)

Subtract two values from the same column in a Pandas dataframe

Solution

show

How does it work

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?

Subtract two values ​​from the same column in a Pandas dataframe

2 Answers

Solution

show

How does it work

Subtract two values from the same column in a Pandas dataframe