I have the following lines in java:
1234,"Calle Jaime III, 34", 67,3,U
1235,Avenida Los Algodones, 12,1,L
1236,"Calle Principal""31234", 46,3,H
1237,"Calle Alfonso X,22", 65,2,J
I would like to perform a Split for the character
,
but as you can see in the example, the address has quotes, so when there is a comma inside a field with quotes, the Split is done wrong.
I try to get the following:
1234 Calle Jaime III 34 67 3 U
1235 Avenida Los Algodones 12 1 L
1236 Calle Principal 31234 46 3 H
1237 Calle Alfonso X 22 65 2 J
I have found a solution to your problem on SO in English in the following answer
which uses the following regular expression, which does the split on the comma only if that comma has zero, or an even number of quotes in front of it
Here is a small java code to test this expression
Showing the following on the screen:
On the other hand, I have tested the regular expression with the data you have put on a page called https://regex101.com and it works correctly as you can see on the next page
If you also want to remove the quotes and the comma you can do the following:
so that it looks exactly like the data you want to get.
If the patterns you have are exactly those, you can do in each record replaceAll(" , ", " ") with this you would remove only the comma that "bothers" for the split or StringTokenizer of the first case of the example, since the other commas do not are separated by spaces. Then you do the split or StringTokenizer as normal and finally a new replaceAll("\"", "") to remove all the quotes. Repeating this procedure on each record should leave it as the expected result. If you have more patterns put all the examples and we keep thinking....