What is a promise in Javascript?

Question

Asked: 2020-02-15 04:44:42 +0800 CST 2020-02-15 04:44:42 +0800 CST 2020-02-15 04:44:42 +0800 CST

How to replace values in two independent DataFrame?

772

I have the following two DataFramewith a column with the same name iden:

df1: 
iden   c    A1   A2    A3    
 11     1     1     1   NaN
 23     2     3     3   NaN
 11     3     2     2     1
 74     4   NaN     1   NaN
 74     1   NaN   NaN   NaN

df2= 
iden caso    
74     A
77     B
11     C
25     A
48     B

What I need is to replace all the values of the column idenin both DataFramein such a way that if there is a value that is in both DataFramethe same number is assigned, the values are identifiers. In the example the answer would be:

df1: 
    iden   c    A1   A2    A3    
     1     1     1     1   NaN
     2     2     3     3   NaN
     1     3     2     2     1
     3     4   NaN     1   NaN
     3     1   NaN   NaN   NaN

    df2= 
    iden caso    
    3     A
    4     B
    1     C
    5     A
    6     B

I thought about creating a new column in each DataFrame using isinfor number generation:

df1['new_iden'] = list(".." if x else ".." for x in df1.iden.isin(df2.iden))

and then delete the original column.

But I don't know how to tell it what value to put in the if so that it generates the numbers as required.

I appreciate what you can help me with.

2 Answers

Voted

FJSevilla · Answer 1 · 2020-02-15T10:22:18+08:00

One possibility is to iterate over both columns (first df1.idenand then df2.iden) and assign new values in that order, using a dictionary as an intermediary to store the pairs "antiguo valor": "nuevo valor". Then just make use of loc/ atand assign each cell its new value according to the dictionary:

import io
import pandas as pd


df1 = pd.read_csv(io.StringIO("""\
iden   c    A1   A2    A3    
 11     1     1     1   NaN
 23     2     3     3   NaN
 11     3     2     2     1
 74     4   NaN     1   NaN
 74     1   NaN   NaN   NaN
 """), sep="\s+"
 )

df2 = pd.read_csv(io.StringIO("""\
iden caso    
74     A
77     B
11     C
25     A
48     B
"""), sep="\s+"
)

new_ids = {}
num = 1
for idx1, id1 in df1.iden.iteritems():
    if (new_id1 := new_ids.get(id1)) is None:
        new_id1 = new_ids[id1] = num
        num += 1
    df1.at[idx1, "iden"] = new_id1

for idx2, id2 in df2.iden.iteritems():
    if (new_id2 := new_ids.get(id2)) is None:
        new_id2 = new_ids[id2] = num
        num += 1
    df2.at[idx2, "iden"] = new_id2

Departure:

>>> df1

   iden  c   A1   A2   A3
0     1  1  1.0  1.0  NaN
1     2  2  3.0  3.0  NaN
2     1  3  2.0  2.0  1.0
3     3  4  NaN  1.0  NaN
4     3  1  NaN  NaN  NaN

>>> df2

   iden caso
0     3    A
1     4    B
2     1    C
3     5    A
4     6    B

If using Python < 3.8.x (without assignment expressions ) the code should be:

new_ids = {}
num = 1
for idx1, id1 in df1.iden.iteritems():
    new_id1 = new_ids.get(id1)
    if new_id1 is None:
        new_id1 = new_ids[id1] = num
        num += 1
    df1.at[idx1, "iden"] = new_id1

for idx2, id2 in df2.iden.iteritems():
    new_id2 = new_ids.get(id2)
    if new_id2 is None:
        new_id2 = new_ids[id2] = num
        num += 1
    df2.at[idx2, "iden"] = new_id2

There are, as always, more possibilities, another option is to use collectiosn.defaultdictto generate the dictionary together with itertools.count(as a generator of the new keys) and pandas.Series.replaceto substitute the values based on the dictionary:

import itertools
import collections

ngen = itertools.count(1)
new_ids = collections.defaultdict(lambda: next(ngen))

for idn in itertools.chain(df1.iden, df2.iden):
    new_ids[idn]

df1.iden.replace(new_ids, inplace=True)
df2.iden.replace(new_ids, inplace=True)

For DataFrames with a relatively large number of rows and replacements it is in principle more efficient than the previous version:

Note: the use of io.StringIOand pandas.read_csvis just to make the code reproducible based on the example in the question.

jalm · Answer 2 · 2020-02-15T07:57:42+08:00

you can create a function that generates successive numbers on each iteration:

def auto_increment_gen():
    i = 0
    while True:
        yield i
        i += 1

auto_inc = auto_increment_gen()

df1['new_iden'] = list(next(auto_inc) for x in df1.iden.isin(df2.iden))

But be careful because if you try to evaluate the entire generator by iterating over it in a for loop, or using it to initialize other iterators like lists or sets, you will get into an infinite loop.

If you know the range of indices you need you can make it safer with a for loop.

How to replace values in two independent DataFrame?

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?

How to replace values ​​in two independent DataFrame?

2 Answers

How to replace values in two independent DataFrame?