I am using C++ regular expressions ( std::regex
) to replace leading and trailing spaces and only accept a space between words.
C++ uses the "same" syntax as JavaScript for regular expressions, my expression at the beginning was this:
"[a-zA-Z0-9_#%./\"-+]+(\\s{1}[a-zA-Z0-9_#%./\"-+]+)*"
With this string, I accept alphanumeric text and the characters #
, $
, .
, /
, "
, -
, +
, with no leading and trailing spaces and a single space between words. The function to check that a text matches this pattern is as follows:
bool GestionErrores::es_texto_caracteres_especiales(std::string texto)
{
bool retorno = false;
try
{
std::regex expresion("[a-zA-Z0-9_#%./\"-+]+(\\s{1}[a-zA-Z0-9_#%./\"-+]+)*");
std::smatch match;
retorno = std::regex_match(texto, match, expresion);
}
catch(std::exception &ex)
{
throw ex;
}
return retorno;
}
I expanded this expression to accept those spaces so I could replace them and make it replaceable using the parentheses, and it looks like this:
"(\\s*)([a-zA-Z0-9_#%./\"-+]+)((\\s{1})(\\s*)([a-zA-Z0-9_#%./\"-+]+))*(\\s*)"
According to JavaScript syntax, parentheses can be used as a reference to generate a new substring, an example in JavaScript in the mozilla documentation is:
var expresion = /(\w+)\s(\w+)/;
var cadena = "John Smith";
var nuevaCadena = cadena.replace(expresion, "$2, $1");
print(nuevaCadena);
As I can see, it identifies parentheses sequentially with $1, $2.. etc, but how do I differentiate between parentheses within parentheses? for the expression I'm using, it has three parentheses inside a larger parenthesis:
"(\\s*)([a-zA-Z0-9_#%./\"-+]+)((\\s{1})(\\s*)([a-zA-Z0-9_#%./\"-+]+))*(\\s*)"
^^
//Este parentesis es el problema, se encuentra dentro de otro parentesis mas grande
How could I do it, so that at the end, the new substring has no leading or trailing spaces and only one space between words?
EDITED 1
I want to be able to enter a text as follows
and get as replacement a text as follows with no leading and trailing spaces and a single space between words:
I wanted to do it using the parentheses as a reference, because this way I could even change the text order if I wanted to, but I'm not sure yet.
First, because it lists the expressions grouped by parentheses like $1, $2, $3... sequentially, but what happens with parentheses inside parentheses? I already tried calling them $1.1 for example, and what I did in the replacement was put the expression related to $1 and at the end the .1, for example mario.1, so no.
Second is the replacement itself, I'm testing with this regular expression:
"(\\s*)([a-zA-Z0-9_]+)(((\\s{1})(\\s*)([a-zA-Z0-9_]+))*)(\\s*)"
and this is the function that using to replace:
void GestionErrores::quitar_espacios_inicio_fin(std::string &texto)
{
std::string subString;
try
{
std::regex expresion("(\\s*)([a-zA-Z0-9_]+)(((\\s{1})(\\s*)([a-zA-Z0-9_]+))*)(\\s*)");
std::smatch match;
subString = std::regex_replace(texto, expresion, "$2$3");
texto = subString;
}
catch(std::exception &ex)
{
throw ex;
}
}
Breaking down the regular expression I'm using,
- $1 is
(\\s*)
- $2 is
([a-zA-Z0-9_]+)
- $3 is
(((\\s{1})(\\s*)([a-zA-Z0-9_]+))*)
- $4 is
(\\s*)
In my function quitar_espacios_inicio_fin
to replace, I am testing by just putting $2$3
in std::regex_replace
, it means that my sub-string will be exactly the same only without $1
and $4
that represent the spaces before and after and effectively it happens like this, it removes the spaces before and after.
But , how do I indicate parentheses that are inside parentheses? Look at $3
, it's full of expressions grouped by parentheses, and I specifically want to get rid of one of them in the replacement sub-string, of this (\\s*)
.
Now, if I was able to fix the extra spaces problem, with an idea from this question , but not in the way I'm posing it, this is the function I'm left with:
void GestionErrores::quitar_espacios_inicio_fin(std::string &texto)
{
std::string subString;
try
{
std::regex expresion("(\\s*)([a-zA-Z0-9_]+)(((\\s{1})(\\s*)([a-zA-Z0-9_]+))*)(\\s*)");
std::regex expEspacios("(\\s{2,})");
std::smatch match;
subString = std::regex_replace(texto, expresion, "$2$3");
subString = std::regex_replace(subString, expEspacios, " ");
texto = subString;
}
catch(std::exception &ex)
{
throw ex;
}
}
Briefly explaining it, how I just showed it, I get rid of the spaces in front and back. The regular expression (\\s{2,})
indicates two or more spaces (although it also indicates tab, line break, page break, carriage return) and replaces two or more spaces with a single space.
It works, and I got out of my problem, but it still doesn't solve the question I have, How do I indicate grouped expressions or parentheses, within other grouped or parenthesis expressions? grouped, and expressions within other expressions it simply ignores.
A question with a closed answer, can it be done or not?, and if it is possible, how?
EDITED 2
Updating the function code with the given response, it looks like this:
void GestionErrores::quitar_espacios_inicio_fin(std::string &texto)
{
std::string subString;
try
{
std::regex expEspacios("^ +| +$| +(?!\\S)");
std::smatch match;
subString = std::regex_replace(texto, expEspacios, "");
texto = subString;
}
catch(std::exception &ex)
{
throw ex;
}
}
Try the following:
Search:
^ *| *$| +(?!\S)
with "flags"m
andg
.Replace with:
(nada)
you have a demo here
Explained :
(note: in the explanation, I denote the spaces as
[ ]
so that they are visible)