What is a promise in Javascript?

Question

ilernet

Asked: 2022-03-30 07:24:05 +0800 CST 2022-03-30 07:24:05 +0800 CST 2022-03-30 07:24:05 +0800 CST

Laravel - Error inserting records with special characters

772

I have a local Mysql DB with utf8 encoding.

Then I have a txt file whose content is like this, where there are Ñ accents and other characters.

9950543SÃO MATEUS MAD PT0830

My intention is to read record by record this file of 50,000 lines and insert them into the MySQL table.

To do this I have written a command in Laravel that reads the file, calls the PostalCode model and tries to insert the data.

    public function __construct()
    {
    parent::__construct();

    $content = fopen(storage_path("app\fichero_codigos_postales.TXT"),'r');

    while(!feof($content)){

        $line=utf8_decode(fgets($content));

        $cp = substr($line,0,7);
        $localidad = substr($line,8,24);
        $pais = substr($line,32,2);
        $param = substr($line,34,4);


        try {


            $item = PostaslCode::create([
                'postal_code' => $cp,
                'town' => $localidad,
                'country' => $pais,
                'extra_param' => $param
            ]);
        }
        catch (\Exception $exception)
        {
            var_dump($exception->getMessage());

            dd($exception);
        }

    }

}

The commmand works for me, but in MySQL I see the characters as accents or Ñ as "?"

1 Answers

Voted

ffflabs · Answer 1 · 2022-04-01T19:37:57+08:00

It starts by determining the encoding of the file. on linux
```
file -i fichero_codigos_postales.txt
```

I would give you that information. For this specific case, the tool does not matter. If instead you had to process an uncertain collection of files with variable encodings, you would have to automate that diagnosis.

If in step 1 the original file is in UTF-8, and the destination table is also in UTF-8, then don't transform the data , because you are actually converting it to the wrong encoding yourself:
```
 // ahora $line es ISO-8859-1
 $line=utf8_decode(fgets($content));
```

If instead step 1 indicates that it is ISO-8859-1, then what you are looking for is the inverse function

 // ahora $line es UTF-8
 $line=utf8_encode(fgets($content));

But in my experience utf8_decode and encode are the most brittle of PHP's encoding conversion functions and any input with characters longer than 2 bytes (UTF-8 can use up to 4) will corrupt the output. mb_convert_encoding is preferable , but even better than this is iconv .

 $line=iconv('ISO-8859-1','UTF-8',fgets($content));

Iconv is not infallible but it is very versatile with character sets, and can also receive modifiers such as "if the character has no equivalence, skip it, don't give me a �":

 $line=iconv('ISO-8859-1','UTF-8//IGNORE',fgets($content));

There is one last edge case that can account for corrupt characters even when working with a UTF-8 file and a utf8 table, and that is because utf8_general_ci and all MySQL utf8_xxxxxx_yy are not really UTF-8 .

As I put before, UTF-8 can use up to four bytes. MySQL's utf-8 only uses 3 bytes. If you find yourself that level of depth in the rabbit hole and you already question your sanity, the utf8_mb4 encoding (for multibyte 4) is real UTF-8 and is what is currently used for example when installing Laravel. (I think wordpress keeps installing the old encoding).

Laravel - Error inserting records with special characters

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?