I have a column of a df in which there are measurements for feet and inches separated by their corresponding special characters:
df.altura
0 5' 4"
1 5' 11"
2 5' 10"
3 5' 7"
and what I have to do is create a new column in which the same height is found but in centimeters, that is, multiplying the first number by 0.3048 and the second by 0.0254 and adding them. If I separate them, the last character is still there and then to add them I would do it by slicing, multiplying and adding but for this I have to manage to separate them without leaving the quotes
You can apply
str.rstrip
or a simple slice to remove the"
final:However a much simpler way in my opinion is to use
pandas.Series.str.extract
with the expression:\d
-> Character class, any digit.+
-> Counter, 1 or more*
-> Quantifier, 0 or more()
-> Capture groups, each one will form a new column.\s
-> Space.The expression is very simple and covers the given examples, it can be adapted according to needs, for example if it is possible to have decimals or if feet or inches can be missing in some row.
This directly generates a DataFrame with two columns, one per capturing group. We simply convert it to an integer, multiply by 0.3048 and by 0.0254 and apply
pandas.DataFrame.sum
to the rows:If, as I mentioned, it is possible to have decimals, we need to modify the expression: