I have the following text:
Hi, I was calling to ask you to -b-please- take the -b-kids- to -b-school-
How can I with Javascript capture the text that is inside -b- ... - and so what, regardless of whether there is pasted text such as: hello-b-Sofia-
Just what's inside. Well, I'd like to make that text bold, italic, or strikethrough.
Something like:
Hello, I was calling to ask you to take the children to school .
I had found a regular expression that worked but it only worked in PHP, I don't know much about regular expressions.
Well, considering that hyphens are allowed inside -b-... - I would like to know which method is more insightful or efficient, but, explaining why, of course. And I would like to know in what circumstances indexOf is better and what others which RegExp
Regarding Mariano's Response
The regular expression works very well, although it only works if the content does not have hyphens within it. As can be seen in fav-or .
I think this regular expression is even more complex than it normally looks. Being a string: "I use many --- because -- I am - rebellious -.-."
Then it would be:
-b-I use many --- because -- I am - rebellious -.-.-
Now, there should be a rule that it should always look for the last existing hyphen before another -b- ; Not the first, according to the given regular expression, matches the first occurrence.
After that, if that "last" is not found, there is no match and therefore normal text remains. And thanks for being the fastest cowboy in the west: v
Regarding Montoro's answer
It sounds great to "make life more complicated" sometimes or maybe all the fucking time, because I usually find problems with everything.
The solution with indexOf is faster in execution than RegExp , although in terms of code handling it is a bit complex. I do not understand the use of some -1 (I do not understand much). It sounds crazy, but it really works even with the use of hyphens inside.
And Lol, I usually use JQuery :)
It can be used
replace( regexp, reemplazo)
with the following regular expression:and replacing it with
<b>$1</b>
Description:
-b-
- matches the literal text.([^-]+)
Group 1 - matches:[^-]+
- 1 or more characters that are not hyphens (-
).-
- matches the literal textg
- Find all matches, not just the first one.Group 1, in addition to matching the text that is between hyphens, also creates a trap. When replacing,
$1
contains the value of that capture.Avoid HTML tags within the syntax:
Also, within the syntax used
-b-
...-
there should be no HTML tags, so as not to " break " the structure. One possible way around it would be to match structs that don't have any<
, using the regex:Any of these expressions work in any dialect of Perl- based regex ( Perl-like regex ), so they will work in JavaScript, PHP, or any of the other commonly used languages.
Include dashes within the syntax:
And if we wanted to make it a bit more complicated: how would we go about allowing hyphens within bold text? We could ask them to get away with a
\
. In that case we would use:This structure uses the technique known as unrolling the loop , by having la
\
inside disallowed characters as normal , and then matching a slash followed by any character (\\.
) and more normal characters .End Code:
Answers to edited question:
I'm not going to answer the general question, since it's based on opinions, but I will compare it with @AlvaroMontoro's answer , which is excellent and I recommend giving it a +1 vote. And it is worth clarifying that the proposed implementations seek different results (we are comparing pears with bananas, see the point below).
If we take the general comparison, and for the examples used, differences of approximately 9% (in the order of 6μs) are observed, something that I would not call relevant for JavaScript. However, it all depends on the text being compared. For example, if we take a longer text (6 paragraphs), we can get results with approximately twice the efficiency with regex (comparison in JSPerf ). And it is probably also possible to direct the tests to texts that benefit
lastIndexOf()
.This is not correct. As discussed in this answer, to allow hyphens within the syntax, they must be escaped with a backslash (
\
).Demo en regex101.com
Why do I think it is not convenient to search for the last occurrence of a hyphen? I think it's a wrong decision to look for the last occurrence, since it doesn't allow to effectively close a syntax. Let's consider this example:
If this were the syntax used in the SO posts, we wouldn't be able to use hyphens after the last bolds, we would have no way to close them . If it were used in user-entered text, I wouldn't know how to document the use. Instead, I think it's much more efficient (and more commonly used) to ask it to escape them
"-b-por fav\-or-"
.However, if you are still looking to match the last occurrence, I would ask for clarification in the question as to how a hyphen can be used after the last bold.
It is a myth that a longer regular expression is less efficient, that you hear many times, but it is still false and many times it is just the opposite. In fact, the technique used is very common, and you can read about it in more detail at:
Note: I could have presented it more abbreviated,
/-b-(([^-\\]|\\.)*)-/g
but I preferred to incorporate a much more efficient version, and of higher quality (here longer is more efficient).It basically consists of using:
Where normal is all characters except
-
,\
and<
, and special is any character preceded by a backslash, to match\-
.Mechanism:
[^-<\\]*
, as much as possible,\\.
followed by more normal characters
[^-<\\]*
I didn't mean to :-) I believe in quality above all other things.
I know that the question cries out for regular expressions, and that using them will simplify your life a lot ( Mariano's solution is very elegant and barely occupies a single line)... but sometimes I like to complicate my life :P
Regular expressions are powerful and flexible... but that also makes them slow. If you're looking for a specific string,
indexOf
it will work too. Based on that, I have made a small algorithm that, inside a loop and sequentially:-b-
and replace it with<b>
-
and replace it with</b>
The code isn't very pretty or as clean as Mariano's solution, but testing with JSPerf , its performance seems to be comparable.
This would be the code:
Edit: Mariano told me that the code had a problem if the chain was not closed correctly (if there was a
-b-
without-
after)... and he was right. So I changed the code a bit so that an additional check is done to avoid an infinite loop.The result looks like this:
Assume that if you have left a
-b-
without its closing, then it is bold until the end of the sentence. And the results in JSPerf seem to remain comparable.Máxima Alekz correctly commented that my code did not allow internal hyphens. A workaround for if you allow them would be to traverse the chain backwards instead of forwards. To do this instead of using
indexOf
, we would uselastIndexOf
.What the algorithm does now is look for the last one
-b-
in the chain and link it to the last-
one found after that. If no hyphen is found, the end of string is considered to be the end of bold.The code would look like this:
And here the results in JSPerf , which are still similar to the ones above.