What is a promise in Javascript?

Question

Revolucion for Monica

Asked: 2020-08-23 07:00:57 +0800 CST 2020-08-23 07:00:57 +0800 CST 2020-08-23 07:00:57 +0800 CST

Add lines from one dataframe to another when they match values of a column

772

I have a first dataframe to which I need to add the lines of a second dataframe .

It is more or less like the first one:

    QID    Questions    B   Answer1 Answer2 Answer3 F G H I J
0   3   a   4.0 a   a   a   a   e   g   i   l    
1   4   b   5.0 b   b   b   a   r   h   m   p
2   5   d   5.0 NaN e   d   b   u   e   i   z
3   6   e   5.0 d   h   r   b   c   z   i   3
...

And the second:

    QID    Questions    B   Answer1 Answer2 Answer3 F ...
0   1   a   4.0 a   a   a   a   
1   2   b   5.0 b   k   b   a   
2   2_1 z   5.0 b   k   b   a   
3   2_2 w   4.0 b   k   b   c   
4   3   d   5.0 NaN e   d   b   
5   4   e   5.0 d   h   r   b   
...

I would like to get:

    QID    Questions    B   Answer1 Answer2 Answer3 F G H I J
0   3   a   4.0 a   a   a   a   e   g   i   l    
1   4   b   5.0 b   b   b   a   r   h   m   p
2   4_1 z   5.0 b   k   b   a   r   h   m   p
3   4_2 w   4.0 b   k   b   c   r   h   m   p
4   5   d   5.0 NaN e   d   b   u   e   i   z
5   6   e   5.0 d   h   r   b   c   z   i   3
...

As you can see the dataframes share the Questions bso I have added the following lines that are included _in the new dataframe.

Literally this means that the first data box and the second data box share the same texts t1and t2in the cells of the "Answers" column. But for a given combination (t1,t2) where t1 == t2, when there are also rows below it such that QID has a _then I want to add those rows after the row they were recorded on.

Until today I tried:

rows_to_add = pd.DataFrame()
for i, row1 in df.iterrows():
  for j, row2 in df2.iterrows():
    if row1['Questions'] == row2['Questions']:
      # here I want to test if the next row has _ in his QID
      # if so I add all the lines with the same QID before _ but with row1 QID
      k = 0
      for _, next_row_df2 in df2[j+1:].iterrows():
        if "_" in str(next_row_df2['QID']):
          next_row_df2['QID'] = str(row1['QID']) + '_' + k
          rows_to_add += next_row_df2 # but I need to change the QID
        else:
          break # exit this loop and add the lines to the dataframe
        k += 1
      df = pd.concat([df.iloc[:i], rows_to_add, df.iloc[i:]]).reset_index(drop=True)
      rows_to_add = pd.DataFrame()

But it doesn't add the rows. Maybe you could do it in a more efficient way: only iterate on the df2 lines where there are _? Or with map-reduce?

Maybe you could do it in a more efficient way: only iterate on lines df2 where there is _ ? Use map reduce?

1 Answers

Voted

Diego Rueda · Answer 1 · 2020-08-27T10:47:25+08:00

Although the explanation you give is a bit confusing, I think I understand what you are asking for, correct me if there is any confusion.

From the questions in dataframe_1 find if the question is repeated in dataframe_2
If the question is repeated, look for the subquestions in dataframe_2 that have an underscore in the QID .
Replace the QID of the subquestions in dataframe 2 , with the QID of the question in dataframe 1 .

We can create a method to do these functions:

# leemos los datos
df1 = pd.read_csv('dataframe_1.csv', index_col=0)
df2 = pd.read_csv('dataframe_2.csv', index_col=0)

# limpio las preguntas y qid que están vacios
df1 = df1.dropna(subset=['Questions', 'QID'])
df2 = df2.dropna(subset=['Questions', 'QID'])

def question_checker(qid, question, df2):
    """
    Input:
        qid: id de la pregunta en df1
        question: texto de la pregunta en df1
        df2: dataframe donde buscaremos si hay preguntas repetidas y las subpreguntas 
    Output:
        subquestions_df: dataframe with all subquestions found in df2
    """

    # buscamos si la pregunta está en el dataframe 2
    repeated_question = df2[df2['Questions'] == question]
    
    # si existe, buscamos por el qid de la pregunta en el dataframe 2
    if len(repeated_question) > 0:
        for i, row in repeated_question.iterrows():
            df2_qid = row['QID']
            sub_questions = df2[df2['QID'].str.startswith(f'{df2_qid}_')]

            # si hay subpreguntas, reemplazamos el qid
            if len(sub_questions) > 0:
                sub_questions['QID'] = sub_questions['QID'].str.replace(df2_qid, qid)
                return sub_questions

And lastly, we use this method to review all questions in dataframe_1

new_df = pd.DataFrame()
for i, row in df1.iterrows():
    qid = row['QID']
    question = row['Questions']
    sub_rows = question_checker(qid, question, df2)
    if sub_rows is not None:
        new_df = new_df.append(sub_rows)

And you can join this new dataframe with dataframe_1 , put the index, sort it and so on.

I hope it helps you.

Add lines from one dataframe to another when they match values of a column

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?

Add lines from one dataframe to another when they match values ​​of a column

1 Answers

Add lines from one dataframe to another when they match values of a column