I have the following DataFrame
:
df=
0 1 2 3 4 ... 43 44 45 46 47
0 B349 M179 R42X K040 R17X ... None None None None None
1 M545 Q729 R609 J00X F339 ... None None None None None
The actual dimension of the Dataframe is:df= [220957 rows x 48 columns]
I need to create a DataFrame
with a single column that stores all the values of all the columns of df
ignoring the empty cells (the order does not matter).
For the example if all the values in the columns after 4 were empty the result would be like this:
0 B349
1 M179
2 R42X
3 K040
4 R17X
5 M545
6 Q729
7 R609
8 J00X
9 F339
I have tried to do it using the transpose function:
df = pd.concat([df.T[x] for x in df.T], ignore_index=True)
and I thought then to eliminate the invalid values, but it takes a long time considering the amount of DataFrame
real data.
Can somebody help me?. I thank you!
Since all your columns appear to be of type
object
(str
) you don't have to worry about possible conversion of types and you can use the methodpandas.DataFrame.to_numpy
to get the values as a NumPy array, then flatten the array withnumpy.ndarray.flatten
, and apply a boolean filter to remove the valuesNone
. Then simply use the array as the column for the new DataFrame:You can change the order in which the array is flattened with the
order
method argumentflatten
if you wish, "c" for rows (default) or "f" for columns. It seems confusing but'c'
it refers to the c language and fortran and it is due to the way in which each language stores arrays in memory (row greater / column greater).'f'