What is a promise in Javascript?

Question

Juan

Asked: 2020-02-20 08:38:32 +0800 CST 2020-02-20 08:38:32 +0800 CST 2020-02-20 08:38:32 +0800 CST

Open the csv with sublime and save it as UTF 8 << VS >> get the encoding with chardet and encode the file when reading it

772

Watching several Pandas tutorials I found this:

When reading csv files I sometimes get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 17: 
invalid continuation byte

In all the videos they give the option to open the file with Sublime and then save it as UTF-8.

Now, on the other hand, I found the following:

I open the file, go through it and apply the detect function of chardet so that it finds what type of encoding it has.

import chardet

with open('data/atp-tour/data.csv', 'rb') as f:
    result = chardet.detect(f.read())

result['encoding']

Output: 'Windows-1252'

Once this is done, I can encode the file and make it work normally:

datos = pd.read_csv('data/atp-tour/data.csv', encoding=result['encoding'])

What is the best method for encoding? Fix it quick with sublime or find the type of encoding and apply it?

On the other hand, when applying the Chardet encoding, it sometimes gives a memory error that stops exiting when adding the parameter low_memory = False

DtypeWarning: Columns (4,5,7,16) have mixed types. Specify dtype
option on import or set low_memory=False.

What does this mean? Since I investigated and on the same line sometimes it comes out and sometimes not when executing it

1 Answers

Voted

Juan · Answer 1 · 2020-02-23T09:41:46+08:00

FILE ENCODING

Opening the file with sublime is the simplest option to save it in UTF-8, on the other hand you can do the encoding to UTF-8 by doing the csv_read, if it doesn't work, you can import chardet and do the encoding in the same script so that everything stays there, this last option takes longer the more records the file has.

import chardet

with open('data/atp-tour/data.csv', 'rb') as f: result = chardet.detect(f.read())

result['encoding']

Output: 'Windows-1252'

Although that was my query because I'm just starting out and I'm an intern, they already lowered the line that UTF-8 is required, so I no longer have that problem, but to practice you can try what I'm saying.

LOW MEMORY

I leave the information translated, SEE SOURCE provided by @abulafia

The low_memory option is not properly unchecked, but it should be, since it doesn't actually do anything different VIEW SOURCE

The reason you get this low memory warning is because the guess types for each column are memory intensive. Pandas tries to determine which data type to set by analyzing the data in each column.

Pandas can only determine what data type each column has once it reads the entire file. This means that you can't really parse anything before reading the entire file, unless you risk having to change the column type when you read the last value.

Consider the example of a file that has a column named user_id. It contains 10 million rows where the user_id is always numbers. Since pandas can't tell they are just numbers, it will probably keep it as the original strings until it has read the entire file.

Specifying dtypes (should always be done) by adding:

dtype = {'user_id': int}

By calling pd.read_csv() pandas will know when it starts reading the file, that this is just integers.

It's also worth noting that if the last line of the file had "foobar" written in the user_id column, the load would block if the above dtype was specified.

Open the csv with sublime and save it as UTF 8 << VS >> get the encoding with chardet and encode the file when reading it

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?