Good day,
I have a problem unnesting the following
df=
Node1 ; Node2
0 (22, {'Y': '996.3', 'X': 773.6}) ;(56, {'Y': '996.1', 'X': 773.1})
1 (23, {'Y': '996.5', 'X': 773.8}) ;(57, {'Y': '996.30', 'X': 773.2})
2 (24, {'Y': '996.8', 'X': 773.6}) ;(58, {'Y': '996.16', 'X': 773.69})
3 (25, {'Y': '996.7', 'X': 773.6}) ;(59, {'Y': '996.60', 'X': 773.15})
[4 rows x 2 columns]
type(df)
pandas.core.frame.DataFrame
How can I unnest and convert this Dataframe into the following:
Node1 ; Node2
Num1 ; Y1; X1 ;Num2; Y2; X2
0 22; 996.3; 773.6 ;56; 996.1; 773.1
1 23; 996.5; 773.8 ;57; 996.30; 773.2
2 24; 996.8; 773.6 ;58; 996.16; 773.69
3 25; 996.7; 773.6 ;59; 996.60; 773.15
I will be grateful for your help or any idea how I could do it.
Cheers
There are several difficulties with your input data. In addition to each cell in your dataframe being a tuple, and the second element of the tuple being a dictionary, on top of that the dictionary values are either strings (in
Y
) or floats (inX
). I understand that you want to convert all of themfloat
to your final structure.This is the value of
df
that I will use as input, the same as the one you provided, as shown by aprint()
:The first thing that comes to my mind is to take a column, for example,
df.Node1
and use its values to create a new dataframe, usingpd.DataFrame.from_items()
, since this constructor expects me to pass it a list whose elements are tuples (matches) whose second elements have to be dictionaries (matches also).However it does not produce the desired result:
But we are very close. If we do the transpose of this ( operator
.T
) that changes rows by columns, we almost have it. By the way I can use.applymap()
to convert tofloat
all elements:It would only be missing that the numbers 22, 23, 24, 25, instead of being the index, were another column called "Num1", and rename the columns "X", "Y" so that they are "X1", "Y1" . This can be done by giving the index the name "Num1" and then doing a
reset_index()
.Once that is done, we can do the same with the column
Num2
and finally usepd.concat()
to concatenate the dataframes obtained in each case.The following code implements these ideas:
The result in
r
is:additional note
I'm not entirely sure if you also want the columns to be hierarchical, that is, to have the headers "Node1" and "Node2" grouping the three respective columns. If this is the case, just change the last line to:
to get:
although in this case I don't see any point in "renaming" the columns X, Y to be X1, Y1 and X2, Y2. They could very well keep their original names (and the columns Num1 and Num2 both named Num). Namely:
Resulting in:
There is no ambiguity in those repeated names, due to the hierarchy. To access, for example, the X column of Node2, you could put: