I have to check the value of about 200 columns of a dataframe, I have the following code:
for i in df.columns:
if len(df[df[i].str.contains('Texto')])>0:
print(i)
the code works fine for me and returns the name of the column that is what I want, is there any other way to make it more efficient?
PS: to improve its speed, the search is carried out only in the last row, which is the one that contains the data that I want
Note: Your code doesn't work, Pandas says that the truth value of a string is ambiguous and to use certain methods. Even so, I offer you the most optimized way that I manage to do.
For starters, for loops should be your last resort when using Pandas. That's because its functions are done in C, which is faster than Python.
Following that principle and others I'll mention later, we get this one-line solution:
Here is a step-by-step explanation:
We obtain the columns and the obtained index we convert it to a series to be able to work it better in the future.
Inside the square brackets, we apply a series of booleans to it.
We obtain that series by obtaining the last row (since you say that it is the only important case) and we apply
.str.startswith("Texto")
. This will result in an array where each value corresponds to a respective column and is true when the value in that row starts with "Text".df.iloc allows us to access a value, row or column from its position.
show
Produces