I was used to putting in the first line of all XMLs:
<?xml version="1.0" encoding="ISO-8859-1"?>
I had been doing it almost without thinking, but then I realized that the accented letters in Spanish looked bad. For example, in Chrome:
From there, I solved it using:
<?xml version="1.0" encoding="utf-8"?>
However, Wikipedia defines ISO-8859-1 as:
ISO 8859-1 is an ISO standard that defines the encoding of the Latin alphabet , including diacritics (such as accented letters, ñ, ç), and special letters (such as ß, Ø), necessary for...
and lists all the Spanish characters.
Ask
Why does it look bad in Chrome, and what encoding
should I use to include Spanish text?
Chrome has no problem displaying source-encoded text as ISO-8859-1, but it has no way of guessing that that's the encoding used if you don't explicitly tell it.
Consider the following file:
This is how it looks in Chrome:
This assuming that the text editor used to generate the file has actually saved it in ISO-8859-1, of course.
As for the question "What encoding should I use?", well it really depends on the context. If it is true that either 1) the generator and the consumer of the text agree on the encoding to be used, or 2) there is an explicit mechanism to manifest what the encoding of the text is (the directive
<?xml?>
for XML files, or the headerContent-Type
for MIME and HTTP), there will be no problem.That said, nowadays there seems to be an informal consensus to use UTF-8 as the default encoding for everything, for a number of reasons that are summarized on sites such as this one: http://utf8everywhere.org
Both encodings work for you.
Encoding problems in most cases come from not saving the file with the encoding indicated in the file code .
This usually in the word processor you use, there is an option to save the file with the encoding you need. If there are discrepancies between both options, it is normal that it does not look good.
Another problem could be, less common, that HTTP headers indicating another encoding were being sent.
In short, you have to make sure that all the indications to the encoding have the same information.
The text you have typed is actually in
utf-8
, and you are treating it as if it wereiso-8859-1
.A clear clue is that 2 bytes appear per accented character.
For XML, it is recommended to use
utf-8
. The XML specification mandates support forutf-8
(alreadyutf-16
, but I'd recommend using itutf-8
instead).Apart from what colleagues have commented, within
Chrome
you can specify the encoding when displaying the document:The best option, I would insist, is UTF-8 . The reason is that UTF-8 allows the encoding of all characters of the Unicode standard in the same document, without resorting to escape sequences in the syntax of the document format (in this case XML).
The second best option is ISO-8859-15 . This code is a newer revision of the old ISO-8859-1, with a few minor changes. But among these changes there is a very important one:
And this example shows why UTF-8 is the best choice. UTF-8 allows you to encode any Unicode text, the dominant international standard to which new characters are added periodically. An application that uses UTF-8 generally does not require changes when new characters such as the Euro symbol are introduced. And additionally with UTF-8 the document can mix text written in many different languages. With UTF-8 you can simply write a Chinese/Spanish dictionary, but neither ISO-8859-15 nor ISO-8859-1 allow such use.
To make sure the file is encrypted correctly, use an editor like Notepad++ or similar. Sometimes you can have the Database in utf8, the text in utf8, the declarations in utf8 and all your life in utf8 but if the file was saved as iso-8859-1 it was worth it...