What is a promise in Javascript?

Question

Asked: 2020-08-01 05:29:07 +0800 CST 2020-08-01 05:29:07 +0800 CST 2020-08-01 05:29:07 +0800 CST

How to convert multiple columns to a single column in a DataFrame?

772

I have the following DataFrame:

         df= 
             0      1      2      3      4   ...     43    44    45    46    47
    0       B349   M179   R42X   K040   R17X  ...   None  None  None  None None
    1       M545   Q729   R609   J00X   F339  ...   None  None  None  None None

The actual dimension of the Dataframe is:df= [220957 rows x 48 columns]

I need to create a DataFramewith a single column that stores all the values of all the columns of dfignoring the empty cells (the order does not matter).

For the example if all the values in the columns after 4 were empty the result would be like this:

  0      B349
  1      M179
  2      R42X
  3      K040
  4      R17X
  5      M545
  6      Q729
  7      R609
  8      J00X
  9      F339

I have tried to do it using the transpose function:

  df = pd.concat([df.T[x] for x in df.T], ignore_index=True)

and I thought then to eliminate the invalid values, but it takes a long time considering the amount of DataFramereal data.

Can somebody help me?. I thank you!

1 Answers

Voted

FJSevilla · Answer 1 · 2020-08-01T06:04:12+08:00

Since all your columns appear to be of type object( str) you don't have to worry about possible conversion of types and you can use the method pandas.DataFrame.to_numpyto get the values as a NumPy array, then flatten the array with numpy.ndarray.flatten, and apply a boolean filter to remove the values None. Then simply use the array as the column for the new DataFrame:

import pandas as pd
import numpy as np


df = pd.DataFrame({0: ['a', 'b', None],
                   1: [None, 'c', None],
                   2: [None, None, None],
                   3: ['d', 'e', 'f']
                   })

>>> df

      0     1     2  3
0     a  None  None  b
1     c     d  None  e
2  None  None  None  f

data = df.to_numpy().flatten()
res = pd.DataFrame(data[data != None])

>>> res
   0
0  a
1  b
2  c
3  d
4  e
5  f

You can change the order in which the array is flattened with the ordermethod argument flattenif you wish, "c" for rows (default) or "f" for columns. It seems confusing but 'c'it refers to the c language and fortran and it is due to the way in which each language stores arrays in memory (row greater / column greater).'f'

Note: the method to_numpy()has been added as of NumPy 0.24.0, if using an earlier version you will need to use the attribute values:df.values.flatten()

How to convert multiple columns to a single column in a DataFrame?

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?