I have a question, I am writing a program in c++
, in which I need to write unicode characters, but dynamically.
For example, with the following code, I write a text file with unicode characters:
#include <iostream>
#include <fstream>
int main(int argc, const char* argv[]){
std::string texto = "Texto con caracteres unicode \u0305";
std::ofstream salida("./archivo_de_salida.txt");
if(salida.is_open()){
//Escribo los datos
salida << texto;
}
salida.close();
return 0;
}
But... How can I write unicode characters dynamically? For example:
std::string code = "0305";
std::string caracter_unicode = "\u" + code;
Is that possible?
An escape sequence is not the same as a sequence of characters.
To understand the difference, we need to understand the translation phases of a C++ program:
\
) is put together on a single line.#include
) are added to the file by making them go through phases 1 to 4 recursively, after this phase there are no pre-processor directives left in the file.So if you have this code:
In the first phase it will become:
And in the fifth it will be:
Note that the line break (
\n
) is now explicit instead of an escape sequence. But if you have the following code:In the first phase it remains unchanged (all the characters belong to the basic set) but in the fifth it will remain as follows:
Notice that the sequence
\u
is left asu
.Unicode cannot be written dynamically, the characters are translated before (phase 5) compiling (phase 7).
A Unicode character, in addition to being part of a static string (for what you have to use
\u
within the string and cannot be generated dynamically), can also be stored individually in a type variablewchar_t
(although this is not very portable since the size of this type depends on the compiler).In this case, you can assign any numeric value to your type variable
wchar_t
at runtime. The problem is that when you dump that variable to a file (or to standard output) you have to:wstring
). In literals that means putting aL
in front of the opening quote.std::ofstream
bystd::wofstream
, orstd::out
bystd::wout
, etc.A proof of concept:
In the file appears:
Notice that the type constant
wchar_t
is actually specified as an integer (I've used0x0305
so that, thanks to the prefix0x
, its hexadecimal value can be seen, but I could just as well have put its decimal value, which is773
). It is not a chain. If you need to have it in a string like in your example ("0305"
), you can convert this string to an integer usingstrtoul()
.Proof of concept: