I was making one regex
that extracts the tags
from html
of a string
to just leave the text between these.. Ex:
"<a href='#'>go to <b>start</b> page</a>"
capturo: <a href='#'>, <b>, </b> y </a>
resultado = go to start page
"<div>prueba</div>"
capturo: <div> y </div>
resultado = prueba
My regex is the following:
var reg = new Regex((\<.+\>)|(\<\/.+\>)|(\<.+\/>), 'g')
It is designed so that if it finds a tag
type <tag>
, </tag>
or <tag/>
matches and then makes a replace
with my regex and so I would only have the text... But it also matches the intermediate characters... I have tried several things using (?:)
so that it does not capture the characters between the two tag
but it doesn't work for me.
I also tried with :
\<.+\>(?:.)+\<\/.+\>
I would like if possible to know how not to match characters in the middle of a regex
...
I had to do something similar, if I remember correctly I used something similar to this.
EDIT:
If what you are looking for is to obtain the content of an HTML element as text, that is, by removing tags, it can be done thanks to the
innerText
or propertytextContent
. You could even create an element in memory, add the content as innerHTML and then use the methods mentioned above.innerText
One of the drawbacks of this method is that it returns a single space in the case of finding several in a row.I advise using instead
textContent
.NOTE
In the example code above we apply on an existing element. In the following example we generate everything in memory from a text string.