Question: How can I verify that the format of a Mexican RFC is valid?
What is the RFC? The Federal Taxpayer Registry (RFC) is a unique key that every natural or legal person in Mexico requires to carry out any lawful economic activity. It is the tax code for individuals and companies issued by the SAT .
The RFC is generated from the letters of the name and surname (individuals), or from the initials or the first letters of the name and the date of creation (companies). The generation and validation rules are described in the Algorithm to generate the RFC with homoclave for natural and legal persons.odt ? .
Context: I want to validate that an RFC could be valid. I'm not interested in seeing if it actually exists. I implemented a very generic validation that allows the last 3 digits to be optional:
/^[A-ZÑ&]{3,4}\d{6}(?:[A-Z\d]{3})?$/
but now I am interested in more strictly validating the complete RFC, seeing that the check digit is correct (the last character).
Regular phrase
The following regular expression checks:
Full validation
I publish the code in JavaScript to be able to run it here, but it is very easy to port it to any other language.
Description
Taking as reference the way the RFC is built:
([A-ZÑ&]{3,4})
Group 1 name .* In this case, we could have validated that it is only a vowel or an "X", but if the first surname has 1 or 2 letters, the first letter of the maternal surname is taken (it can be a consonant).
* For companies, if they do not have 3 words, the following letters of the first name are taken.
?(?:- ?)?
.Accepts:
" "
,"-"
," -"
," - "
,"- "
, or no hyphens or spaces.* The pattern looks funny but it is: an optional space, followed by an optional (uncaptured) group, this matches a hyphen optionally followed by a space. If you don't want to allow spaces or hyphens, you can remove this pattern.
(\d{2}(?:0[1-9]|1[0-2])(?:0[1-9]|[12]\d|3[01]))
Group 2 .\d{2}
year.(?:0[1-9]|1[0-2])
month.(?:0[1-9]|[12]\d|3[01])
day.* I am accepting up to 31 for any month. I believe that an error in entering the date data would be validated later with the check digit. However, if you wanted to be more strict, and although it could be validated in the regular expression , I would recommend doing it with the functions of the programming language used.
?(?:- ?)?
.([A-Z\d]{2})
Group 3 .([A\d])
Group 4 .After validating with the regex, we check that the expected check digit for the first 11 or 12 characters matches the check digit entered (the last character). An adaptation of the method for control codes called Module 11 or ISBN 10 is used .
We already have separate captures of the text of the 4 parts of the RFC, where
rfcSinDigito
it will be the first 11 or 12 characters anddigitoVerificador
it will be the last character.But if they are 11 characters (RFC of moral person -company), it is adjusted to be able to use the same algorithm for both. You can precede it with a space, or directly enter the calculated value.
To calculate the expected digit, first add the index of each character from 13 to 2, multiplied by the value of each of the 12 characters, which have a value from 0 to 38 according to this order (dictionary):
And on top of the addition, you take the 11's complement of the remainder of dividing by 11 (or modulo 11, hence the name of the method).
If it gives 11, it becomes
0
. If it gives 10, it becomesA
.Now yes, we can compare if they match to return the result.
But we add 2 exceptions, for the special cases of generic RFCs ( Questions and answers on Tax Verification, points 5 and 6 ? ), which are not valid RFCs of natural or legal persons, but are used for:
1.
XAXX010101000
::: Operations carried out with the general public.2.
XEXX010101000
::: Operations carried out with residents abroad who are not registered in the RFC.* As you can see, I am using an optional second parameter (
aceptarGenerico
), in case they are not allowed.Finally, if it passed all the previous rules, the clean RFC is returned .
RFC validation algorithm
An RFC is valid if and only if it meets the following 7 conditions:
Note that by valid we mean an RFC that meets all the standards of the document, that is, there is a combination of name and date of birth or creation that generates that RFC. While for an invalid RFC there is no name and date that ends up generating that RFC. Just because it's valid doesn't mean it exists.
Check digit calculation.
If the RFC has 12 characters we add a blank space at the beginning.
We use all the characters of the RFC except the last one. That is, we always use 12 characters.
Each character corresponds to a value according to the following table:
The value of the first character is multiplied by 13, the second by 12, the third by 11, and so on until the twelfth character is multiplied by 2.
All these values are added together. And with addition the document describes some unnecessarily complicated operations that are equivalent to :
Example for the RFC
GODE561231GR8
Regular phrase
If I were going to use a regular expression as a pre-validation step I would use:
This expression accepts incorrect dates such as 999999. Neither the 99th day nor the 99th month exist.
I don't see any point in complicating the regex, to reject days that start with 9 for example, since dates will still be left unchecked by the regex (unless you use a very complicated regex). It is better to leave the lexical analysis for the regex (see if the characters are valid) and the semantics for other levels (see if the date is correct). But this is debatable.
Using Python and a library.
Often the best way to implement something is not to do it and use a ready-made implementation if it is available and reliable. Except for educational purposes or certain very special requirements.
There are several libraries that check an RFC. For example stdnum from Arthur de Jong and others. Available at https://github.com/arthurdejong/python-stdnum under the GNU Lesser General Public License version 2.1 or later.
There is little to comment, everything is already done by the library.
Using functional programming. in Scale
The function that does the checking is the only one that is public and is also called
esValido
. Like Python's it returnstrue
if the passed RFC is valid.The code includes comments that explain how it works.
Observations
In the document Algorithm to generate the RFC with homokey for natural and legal persons.odt ? in rule 9 of section 2.2 there are 39 inconvenient words that cannot be used in an RFC. However, Annex 4 contains 41 words. The 2 extra words are CAKO and MEAS. I have assumed that the annex is the correct one.
I have tried a million random RFCs, correct and incorrect, and the two implementations have given the same result in all cases. Although this is not a guarantee of anything.
I do not give as good XEXX010101000 or XAXX010101000 because they are not mentioned in the document. The python library does the same thing. If it is necessary to give them as good, it is trivial to make the modification.
Entering the SAT page https://portalsat.plataforma.sat.gob.mx/ConsultaRFC/ I was able to obtain the regular expression and the function that they use, I hope it works for you:
free API
Open source API always free with a response time of 600 ms and in 2 steps to verify the rfc, find out if it is on the SAT and LRFC blacklist: Documentation .
Regular phrase
The following regular expression (in the code) checks:
Following the logic of SAT SAT 2 by the answer Orlando Alfonso. Using Mariano's example code.
It works in other languages. Here an example in go.
I currently hold this regular expression, it validates the following: