Context. I try to eliminate the tildes (spelling accents) and tildes, to compare 2 words. I did the function:
let sinDiacriticos = (function(){
let de = 'ÁÃÀÄÂÉËÈÊÍÏÌÎÓÖÒÔÚÜÙÛÑÇáãàäâéëèêíïìîóöòôúüùûñç',
a = 'AAAAAEEEEIIIIOOOOUUUUNCaaaaaeeeeiiiioooouuuunc',
re = new RegExp('['+de+']' , 'ug');
return texto =>
texto.replace(
re,
match => a.charAt(de.indexOf(match))
);
})();
let prue = 'Épico año de mal agüero, sólo Óscar y Ángel ganarán ésta. -Ímpetú Úrsula. ¡Ñañdú corre rápido por ahí!';
console.log(sinDiacriticos(prue));
// -> Epico ano de mal aguero, solo Oscar y Angel ganaran esta. -Impetu Ursula. ¡Nandu corre rapido por ahi!
**Question 1.** Is there a direct way to replace **any [diacritic][1]** without the need to manually generate a replacement map? I am interested in covering the diacritical marks of any language.
Question 2. Bearing in mind that in Spanish la ñ
is a different letter from n
, can diacritics be eliminated except if it is a ñ
?
* Question asked by blonfu in comments
Since ECMAScript 6 (2015), String.prototype.normalize() can be used to bring up the decomposed form of normalization in Unicode (see compatibility ).
This means that a character ( actually a "code point" ) can be broken down into its base character equivalence, followed by its marks. For example:
Both forms are equivalent and print the same.
In the NFD form , the diacritics are different code points (~characters).
And the important thing is that all the diacritical marks are in the range
U+0300
-U+036F
.Code (for all languages)
Leads to the decomposed form, and removes the Combining Diacritical Marks block .
Tests:
Diacritics except in la
ñ
(Spanish only)We can remove only the accents on vowels or the umlaut on the
ü
.We decompose, we eliminate the diacritics exclusively from
áéíóúü
and we compose again:Or we can remove any diacritic (for any language) except if it's a
ñ
:Tests: