I have a list with 140 items, which I request 3 data from an api and store their data in a list, I do all this by reusing the same lists, since I receive the data, store them in lists and add them to the dataframe and I eliminate the content of the lists to repeat again, to what would be about 420 columns of data that I add to my dataframe. the code works fine, without problems, but in the console it shows me the following:
Performance Warning: DataFrame is highly fragmented. This is usually the result of calling
frame.insert
many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, usenewframe = frame.copy()
my code would be something like this:
import os
import pandas as pd
data=[1,2,3,4,5,7,8,9....140]
alto=[]
largo=[]
ancho=[]
df = pd.DataFrame()
for i in data:
alto.clear()
largo.clear()
ancho.clear()
obj=client_get_data(i)
""" respuesta de la API
la misma me da unos 50 resultados parecidos a estos, ya que los objetos cambian sus dimensiones con el tiempo y la consulta la hago desde que se creo
[
[
150, // alto
'20', // ancho
'70', // largo
1354255 //ignorar
]
]"""
for element in obj:
alto.append(element[0])
ancho.append(element[1])
largo.append(element[2])
#agrego las listas al dataframe y convierto dos de sus datos a tipo float para su uso final ya que los recibo string
df[f'alto {i}']=alto
df[f'ancho {i}']=ancho
df[f'ancho {i}'] = pd.to_numeric(df[f'ancho {i}'], downcast="float")
df[f'largo {i}']=largo
df[f'largo {i}'] = pd.to_numeric(df[f'largo {i}'], downcast="float")
df[f'volumen {i}']=df[f'alto {i}']*df[f'ancho {i}']*df[f'largo {i}']
#elimino esas columnas del dataframe ya que no las utilizare mas
df=df.drop([f'alto {i}', f'ancho {i}',f'largo {i}'], axis=1)
#para terminar guardo los datos en un excel
df.to_excel(f"{os.path.dirname(__file__)}\data.xlsx",index=False)
would there be a way to add the data to the dataframe without that warning popping up
the best way to add "too many" columns to the dataframe would be like this:
using the function
concat
, it creates a dataframe of several data at once (in my case lists), whileappend
before adding the content it creates a copy of it, hence the performance error that it showed me (attached image at the end of comparison)after having modified those columns to taste and having eliminated the ones I didn't want, I use
concat
to increase the final dataframe, the one that will be used after collecting all the datathat would be all, now the error does not appear again and our code improves in performance, I hope my problem has been of help to someone else