Ask
How to validate an e-mail that accepts all Latin characters?
- By Latin characters I mean accented letters,
ñ
,ç
, and all those used by languages like Spanish, Portuguese, Italian... Latin.
Context
- The goal is to display an icon next to the text as the user types their email address.
- I am not interested in accepting all valid cases. It was a design decision to cover only the most frequent emails. That is, letters (including accents and the like) and the symbols
._%+-
. - I can use code from other sources, as long as they are popular (eg jQuery).
Code
document.getElementById('email').addEventListener('input', function() {
campo = event.target;
valido = document.getElementById('emailOK');
emailRegex = /^[-\w.%+]{1,64}@(?:[A-Z0-9-]{1,63}\.){1,125}[A-Z]{2,63}$/i;
//Se muestra un texto a modo de ejemplo, luego va a ser un icono
if (emailRegex.test(campo.value)) {
valido.innerText = "válido";
} else {
valido.innerText = "incorrecto";
}
});
<p>
Email:
<input id="email">
<span id="emailOK"></span>
</p>
cases
I am using the regex
/^[-\w.%+]{1,64}@(?:[A-Z0-9-]{1,63}\.){1,125}[A-Z]{2,63}$/i
Which works perfect in cases like
[email protected]
[email protected]
But it fails with accents and other Latin letters
germá[email protected]
yo@mi-compañía.com
estaçã[email protected]
With this regular expression you can validate any email address that contains Unicode characters:
If you test it in a JavaScript console:
Font
From there, and as you have very well mentioned, an expression that best suits your needs would be the following:
There are certain restrictions for emails but I can comment that they should regularly be based on these rules:
There are restrictions with certain types of email for example if they contain:
Examples not accepted as valid email addresses:
See more :
https://en.wikipedia.org/wiki/Email_address https://www.rfc-editor.org/rfc/rfc5322
I imagine an email with Cyrillic characters, even worse if what you want is to store that data in a DB, what type of SQL collation to use!
But well, the question refers to how to validate this type of emails, this is a script that would help with the task:
for instance:
The script would show you that the email address is correct.
It is now possible to use international characters in domain names and email addresses .
Traditional email addresses are limited to characters from the English alphabet and a few other special characters. The following are valid traditional email addresses:
International email, by contrast, uses Unicode characters encoded as UTF-8 , which allows the text of addresses to be encoded in most of the world's writing systems.
The following are all valid international email addresses:
I've found an article here that talks about a few different regular expression statements that can verify email addresses based on the RFC standard. There are many different recommended regular expression statements and there is no single all-in-one solution. But this regex is probably the one I'd go with, adding accented characters to the list of valid characters as well.
The only 100% secure way to verify if an email is valid is by sending one. If the user typed the email wrong, they will simply retry.
According to RFC 5322 ,
[email protected]
it is a "valid" email, but is anyone going to receive it? Is there a server behind the domain that accepts emails? Those are the concerns you should have. Whatever you are doing, a mailing list, registration, etc. You must send a confirmation email to validate it . The implementation will depend on the stack you use (C#, PHP, Java?) and you will have valid emails that someone receives.You can implement something on the client side that at least says "this is an email address", but it shouldn't be your "validation" tool, it's just trying to make the user realize that what they typed is # ($^ %#$@^( #$^.com. If the client uses a modern browser, you can use
<input type="email">
in your form, this will eliminate the need to maintain the regex.Simply to point out that, according to the official specification , the REGEX that represents an orthographically valid email address is the following:
I put the term spelling valid email address on purpose , because what defines a really valid email address is that it works, that is, that it exists and can receive emails.
It follows that a verification via Javascript is not enough. It can help us do a spell check , provided Javascript is enabled on the client side.
If you want to verify that the email really exists , there is no other way than to send an email and have the recipient reply. This is what can be called with all property real validation of an email .
In fact, that is what all serious subscription services do, they send us an email that we must verify in order to be definitively registered on their sites or in their distribution lists.
Allow me to graphically show the steps to validate an e-mail. We will see that what is discussed here is just stage 2 of a validation process that would comprise 5 stages :
Until we reach stage 5, we cannot say that the email has been validated .
If the OP still asks for a validation method that accepts addresses with ñ and other characters not defined so far by the official w3.org spec (link above), the REGEX mentioned in a previous answer works.
The code that follows is the same used in the question, but implementing on the one hand the official REGEX and the REGEX that allows Latin characters such as ñ.
Spell check in HTML5
HTML5 allows us to declare our
input
email type and handles (partly) the validation for us, as MDN says :It can be combined
email
with the attributepattern
:The downside is that not all clients support HTML5.
According to RFC 6531, more characters than we are used to should be supported. But the servers limit it with previous ones. I don't see a solution with a single range that involves entering "all latin characters". Although they seem to go together (as in this table from 0080 to 00FF ), there are others in between.
A possible regex for the latin characters you might be interested in ( source ) and adding the ( suggestion ):
It could be combined with your regex, the ones already indicated above or one according to RFC 2822, like this, so that it does not exclude the ranges that interest you (there are many types of accents) ( source ):