I'm trying to parse lines like this with regular expressions in python :
21698213.20307 -4937213.445 7 -3801759.02548 21698206.56648
These values specifically refer to "observations" of GPS signals. In the line above there would be 5 "remarks". If the observables are L1, L2, C1, C2, P2
the values I would like to extract:
L1 : { observación -> 21698213.203, LossOfLockInd -> 0, SignalStrengthInd -> 7}
L2 : { observación -> NOHAY, LossOfLockInd -> NOHAY, SignalStrengthInd -> NOHAY }
C1 : { observación -> -4937213.445, LossOfLockInd -> 0 (NO HAY), SignalStrengthInd -> 7}
C2 : { observación -> -3801759.025, LossOfLockInd -> 4, SignalStrengthInd -> 8}
P2 : { observación -> 21698206.566, LossOfLockInd -> 4, SignalStrengthInd -> 8}
That is, I need to extract each "decimal number from 3 decimal places( observation )" and "each individual number or space ( LossOfLock, SignalStrength )". In the event that there is no value for one of the observables I would like to have 3 empty elements (If there is no observable the separation between each of the observables is 18)
So far I've been able to get the decimals and the integers separately, but I can't also join the empty spaces ( LossOfLock ) or separate the missing observables into 3 empty elements.
This is the expression I'm using at the moment.
([-+]?\d*\.\d{3}|\d)
Example of capture he does so far:
var match = '21698213.20307 -4937213.445 7 -3801759.02548 21698206.56648'.match(/([-+]?\d*\.\d{3}|\d)/g);
console.log(match);
In the end I used the regular expression : ([-+ \d]{9}[. ][ \d]{3})([\d ])([\d ])
proposed by Mariano and a couple of code-based tweaks to fill in the gaps left by the regex at the end :
##Obtenemos la observación
the_obs = re.findall(self.REGEX_PARSE_LINEA_OBS, ''.join(obsArray[obsindex : obsindex + step]) )
## quitamos los espacios de la lista
## El regex devuelve un array de tuplas
## con chain.from_iterable() las tuplas desaparecen
## y pasan dentro de la lista como strings
the_obs = map(strip_, list(itertools.chain.from_iterable(the_obs)))
## El regex nos puede dejar hiuecos al final si no hay observaciones
## con esto rellenamos los huecos
if(len(the_obs) < len(self.header['OBSERV_TYPES'] * 3)) :
## Cuantos huecos faltan por rellenar ?
size = (len(self.header['OBSERV_TYPES'] * 3)) - len(the_obs)
## rellenamos los huecos DEL FINAL!!!
the_obs[len(the_obs):] = ['' for x in range(size)]
Seeing as the original text has a fixed width for each element, instead of using a regular expression, I'd recommend retrieving each value based on its position.
A simplified example for the point raised would be:
Result:
demonstration:
Ideone.com
If you still want to keep trying regular expressions, I would use the same logic: always get the element with fixed width. Of course, we can use groups to separate the value of each column .
Example:
Result
demonstration:
rextester.com