Let me tell you, I have the following column called "ASSET" is of type string and has 10 or 11 characters:
And with this column I generate another called "CLASS" with the following code:
join["CLASE"] = ""
for row in join["ASSET"]:
join["CLASE"] = [row[0:3] if len(row)==11 else row[0:4] for row in join["ASSET"]]
In essence the code works and does what it is supposed to do, which is the following:si el registro tiene 11 caracteres, me trae los primeros 3 caracteres, y en caso de que tenga 12 caracteres, me trae los primeros 4 caracteres.
However, it takes a long time to do it, I don't know if the list comprehension for the size of data that I'm handling (approx. 156,000 records) is too much or what's wrong. That's why I was looking for your help to know if you can think of a way that I can do the same thing that this code snippet does, but more efficiently. Currently it takes about an hour to run just that cell.
Good day,
The problem is that you have a loop
for
, then a list comprehension and then aif
You can do it directly from the value of your column with the method
pandas.Series.str
Since you didn't put your data as text create a generic example and you will have to adapt the column names.
The data that I am using in the "sample2.csv" file is this:
In your example you mention that if the
string
is 10 digits you want to get the first 3 and if it is 11, get the first 4 and so you use theif
, but in other words, what you want is to take the characters from the beginning ofstring
the to the character "-7"Everything you have can be reduced to one line:
This returns: