What is a promise in Javascript?

Question

El Cóndor

Asked: 2020-09-11 22:04:15 +0800 CST 2020-09-11 22:04:15 +0800 CST 2020-09-11 22:04:15 +0800 CST

How to convert a csv(unicode) file to csv(utf-8) in python 3.6.5?

772

I download a CSV (unicode) file from a website, the fields are separated by tabs and between double quotes and the line break does not make it at the end of each row as seen in the CSV, example of how the information is seen of the CSV when opening it with notepad:

"campo1"    "campo2"    "campo3"    "campo4"    
"campo5"    "campo6"    "000000AA1011"  "000000AA1012"  
"000000AA1013" "000000AA1014"   "000000AA1015"  "000000AA1016"  
"000000AA1017"  "000000AA1018"  "000000AA1019" "000000AA1020"   
"000000AA1021"  "000000AA1022"

I want to convert it to CSV(utf-8) so that the fields are separated by commas and are not between double quotes and the line break is done as seen in the CSV and in this way I can do the LOAD DATA LOCAL INFILEfrom MySQLand insert the CSV records without any problem to the table MySQL, example of how I want the CSV to look when opened with a notepad:

campo1,campo2,campo3,campo4,campo5,campo6
000000AA1011,000000AA1012,000000AA1013,000000AA1014,000000AA1015,000000AA1016
000000AA1017,000000AA1018,000000AA1019,000000AA1020,000000AA1021,000000AA1022

It is worth mentioning that in the 1st instance try to take the CSV(unicode) and insert it into the table MySQLas follows:

LOAD DATA LOCAL INFILE 'C:/data.csv' INTO TABLE bd.down_data
    FIELDS TERMINATED BY '\t' ESCAPED BY '"'
    LINES TERMINATED BY '\n' IGNORE 1 LINES
    (campo1, campo2, campo3);

This query inserts the data but puts a space after each character, example:

    campo1                       campo2                       campo3
0 0 0 0 0 0 A A 1 0 1 1     0 0 0 0 0 0 A A 1 0 1 2     0 0 0 0 0 0 A A 1 0 1 3
0 0 0 0 0 0 A A 1 0 1 7     0 0 0 0 0 0 A A 1 0 1 8     0 0 0 0 0 0 A A 1 0 1 9

and try to remove the spaces as follows:

Select REPLACE(campo1,' ','') as campo1 from bd.down_data;

but it does not make any changes, I imagine that the problem is the encoding of the original CSV file and therefore I want to correct the problem from the CSV (have the fields separated by commas and without double quotes when doing the LOAD DATA LOCAL INFILE) since if I save again the same CSV from the excel as CSV replacing it and I open it with the notepad it already appears separated by commas.

I am doing this in Python:

import csv

path =  'data.csv'

with open(path, 'r', encoding='utf-8', errors='ignore') as infile, open('final.csv', 'w') as outfile:
     inputs = csv.reader(infile)
     output = csv.writer(outfile)

     for index, row in enumerate(inputs):

         if index == 0:
             continue
         output.writerow(row)

I create the new file final.csvbut without any record and I get the following error:

Traceback (most recent call last):
  File "export.py", line 9, in <module>
    for index, row in enumerate(inputs):
_csv.Error: line contains NULL byte

UPDATE:

actual file:

https://drive.google.com/open?id=1LnCJVVc83HPAXAc88BMtKoHFIHpP5xtW

1 Answers

Voted

FJSevilla · Answer 1 · 2020-09-12T06:50:48+08:00

The file initially uses Unicode but is encoded using UTF-16-LE as the encoding format (at least it's correct for the sample file). It is basically what Microsoft has been calling "UNICODE" files, although it is not a very specific term... Therefore we must open the file using UTF-16 and use UTF-8 for the output file:

import csv

path =  'data.csv'

with open(path, 'r', encoding='utf-16-le',  newline='') as infile,\
     open('final.csv', 'w',  encoding='utf-8', newline='') as outfile:
     inputs = csv.reader(infile, delimiter="\t", quotechar='"')
     output = csv.writer(outfile, delimiter=",", quotechar='"')
     next(inputs) # Descartar primera fila (header)
     output.writerows(row for row in inputs)

Which generates the csv without quotes (unless they are necessary, for example a cell that has the separator as part of the data) and using the comma as a separator:

0000001NV462,MC1,AUTO,,
000000A34147,MC2,AUTO,3,
000000B51066,MC3,AUTO,,
000000B94242,MCG,AUTO,2,
000000C61056,MC9,AUTO,534,
4X30M30M39458,5SA40,AUTOMC76,MC2,5SA49458,MC2,
AUTO ,,
SA70A15440,MCG,AUTO,5,
SA70A15460,MCB,AUTO,1,
SA70A15467,MCB,AUTO,159,
SA70A15468,MCB,AUTO,159,
...

The file has an empty column at the end, no header and no data at least in this file. If you want to remove it, just slicing:

output.writerows(row[:-1] for row in inputs)

How to convert a csv(unicode) file to csv(utf-8) in python 3.6.5?

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?