Good morning everyone.
I'm trying to get the set of "N" words near "Before and After" to a match, this in plain text large amount histories.
details to take into account: the text is normalized, it does not have accents or tildes and everything is in small letters.
I am trying to Search for the Word " xanthoma " or " xanthomas " and get 7 words before and after the match, stopping at line break.
After several attempts this is the closest Regular Expression, but I don't understand how to make it 7 words separated from each other by spaces or punctuation characters or numbers, etc.
([ ,:[\w]*]*\bxantoma[s]?\b[ ,:[\w]*]*)
Example Text:
- chronic corporoantral gastritis, corporoantral lesions suggestive of xanthomas
10/24/2009 evda hiatal hernia. gc corporeal antrum, fundic nodular erosive gastritis, gastric xanthoma . background bx: moderate non-atrophic chronic gastrotus without activity. h. pylori: x/xxx.
- chronic corporoantral gastritis, corporoantral lesions suggestive of xanthoma primary hypothyroidism
I hope to get:
chronic corporoantral gastritis, corporoantral lesions suggestive of xanthomas
gc corporeal antrum, fundic nodular erosive gastritis, gastric xanthoma . fundus bx: chronic non-atrophic gastrotus
chronic corporoantral gastritis, corporoantral lesions suggestive of xanthoma primary hypothyroidism
You are looking for a regular expression similar to this:
a-z
instead of\w
because the expression\w
includes, in addition to letters, numbers and the underscore, which is not appropriate for many cases.{0,7}
causes the regular expression to accept whatever is to its left repeated between 0 and 7 times (in this case).