I have strings like the following:
€14.5M, €50M, $14.8M, $$70.5M ,100M
How can I go through those strings and get the number in them, being integers or floats without losing the . I get to this but lose the point so 13.5 becomes 135
dato = '€14.5M'
simple_dream_team.Wage.apply(lambda x: ''.join([n for n in x if n.isdigit()]))
return = 145
I could do a replace of symbols like €, $ or the M to '' but from what I'm seeing anything is coming to me, that's why I don't apply it.
For the examples you provide you could make it work following your original idea by something like this:
However, it is neither very efficient nor very robust.
If you have your strings in a Series or in a column of a DataFrame you can use a regular expression together with
pandas.Series.str.extract
:The regular expression passed to
pandas.Series.str.extract
has to have at least one capturing group defined since each group is going to form a new column. Hence the parentheses in the expression. It only gets the first match, in your case this is all you need, if a string can have more than one match and we need them all you can usepandas.Series.str.extractall
.If you want to convert the column to type float directly, just use
Series.astype
for example:The output is:
Look here I leave you a script that does something similar to what you need. It uses the 're' package which is for Regular Expression Operations.
When UNICODE is not specified, the \D flag matches any decimal digit; This is equivalent to the set [0-9]. Be careful, it is important to incorporate it
# -*- coding: utf-8 -*-
at the beginning of your script since the € and $$ characters will throw an error:Syntax Error: Non-UTF-8 code starting with b'\x80' in file...