I download a CSV (unicode) file from a website, the fields are separated by tabs and between double quotes and the line break does not make it at the end of each row as seen in the CSV, example of how the information is seen of the CSV when opening it with notepad:
"campo1" "campo2" "campo3" "campo4"
"campo5" "campo6" "000000AA1011" "000000AA1012"
"000000AA1013" "000000AA1014" "000000AA1015" "000000AA1016"
"000000AA1017" "000000AA1018" "000000AA1019" "000000AA1020"
"000000AA1021" "000000AA1022"
I want to convert it to CSV(utf-8) so that the fields are separated by commas and are not between double quotes and the line break is done as seen in the CSV and in this way I can do the LOAD DATA LOCAL INFILE
from MySQL
and insert the CSV records without any problem to the table MySQL
, example of how I want the CSV to look when opened with a notepad:
campo1,campo2,campo3,campo4,campo5,campo6
000000AA1011,000000AA1012,000000AA1013,000000AA1014,000000AA1015,000000AA1016
000000AA1017,000000AA1018,000000AA1019,000000AA1020,000000AA1021,000000AA1022
It is worth mentioning that in the 1st instance try to take the CSV(unicode) and insert it into the table MySQL
as follows:
LOAD DATA LOCAL INFILE 'C:/data.csv' INTO TABLE bd.down_data
FIELDS TERMINATED BY '\t' ESCAPED BY '"'
LINES TERMINATED BY '\n' IGNORE 1 LINES
(campo1, campo2, campo3);
This query inserts the data but puts a space after each character, example:
campo1 campo2 campo3
0 0 0 0 0 0 A A 1 0 1 1 0 0 0 0 0 0 A A 1 0 1 2 0 0 0 0 0 0 A A 1 0 1 3
0 0 0 0 0 0 A A 1 0 1 7 0 0 0 0 0 0 A A 1 0 1 8 0 0 0 0 0 0 A A 1 0 1 9
and try to remove the spaces as follows:
Select REPLACE(campo1,' ','') as campo1 from bd.down_data;
but it does not make any changes, I imagine that the problem is the encoding of the original CSV file and therefore I want to correct the problem from the CSV (have the fields separated by commas and without double quotes when doing the LOAD DATA LOCAL INFILE
) since if I save again the same CSV from the excel as CSV replacing it and I open it with the notepad it already appears separated by commas.
I am doing this in Python
:
import csv
path = 'data.csv'
with open(path, 'r', encoding='utf-8', errors='ignore') as infile, open('final.csv', 'w') as outfile:
inputs = csv.reader(infile)
output = csv.writer(outfile)
for index, row in enumerate(inputs):
if index == 0:
continue
output.writerow(row)
I create the new file final.csv
but without any record and I get the following error:
Traceback (most recent call last):
File "export.py", line 9, in <module>
for index, row in enumerate(inputs):
_csv.Error: line contains NULL byte
UPDATE:
actual file:
https://drive.google.com/open?id=1LnCJVVc83HPAXAc88BMtKoHFIHpP5xtW
The file initially uses Unicode but is encoded using UTF-16-LE as the encoding format (at least it's correct for the sample file). It is basically what Microsoft has been calling "UNICODE" files, although it is not a very specific term... Therefore we must open the file using UTF-16 and use UTF-8 for the output file:
Which generates the csv without quotes (unless they are necessary, for example a cell that has the separator as part of the data) and using the comma as a separator:
The file has an empty column at the end, no header and no data at least in this file. If you want to remove it, just slicing: