This question was born based on this other My database was not utf-8 , how to fix the tildes, once PHP [duplicate] has been inserted, marked as duplicate for a possible answer here, why does the character inspector (�) appear in some data obtained from the Database?
It is true that in new projects there is no inconvenience in recreating the database with the appropriate character encodings if we did not establish it correctly at first, since the tables are not populated or contain few records.
But in some cases we assume projects with tables populated with thousands of records, maybe millions, in which it is essential to preserve the data and correct it, doing this by hand is not feasible. Therefore the question goes in this sense.
How can we migrate/fix the data already entered from a table in latin1 with data entered in utf8?
Here is an example as basic as possible:
--
-- Estructura para la tabla `tabla_latin1` con datos insertados en UTF8
--
DROP TABLE IF EXISTS `tabla_latin1`;
CREATE TABLE IF NOT EXISTS `tabla_latin1` (
`idTabla` int(11) NOT NULL AUTO_INCREMENT,
`texto` varchar(255) CHARACTER SET latin1 COLLATE latin1_general_ci NOT NULL,
PRIMARY KEY (`idTabla`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
--
-- Volcado de datos para la tabla `tabla_latin1`
--
INSERT INTO `tabla_latin1` (`idTabla`, `texto`) VALUES
(1, 'Ç'),
(2, 'È'),
(3, 'É'),
(4, 'Ê'),
(5, 'Ë'),
(6, 'Ì'),
(8, 'ÃŽ'),
(11, 'Ñ'),
(12, 'Ã’'),
(13, 'Ó'),
(14, 'Ô'),
(15, 'Õ'),
(16, 'Ö');
--
-- datos correctos: ÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ
--
Assuming that we have a table in which records were saved in
UTF8
when the table and columns arelatin1
for example, we can play with CAST() and CONVERT() to obtain the appropriate result.For example for the given table:
First we change the encoding of the table:
We could run the following query, passing the column to binary and then to utf8 (utf8mb4) to review the data:
And we would opt for this:
If everything looks correct, we would only have to make a
UPDATE
column applying the same conversion:And now we can see the result with a
select
normal:Finally we change the encodings of the table: