What is a promise in Javascript?

Question

Asked: 2020-06-12 06:59:11 +0800 CST 2020-06-12 06:59:11 +0800 CST 2020-06-12 06:59:11 +0800 CST

Extract part of a string between two specified points

772

I recently found myself needing to extract all the values between two specified points in a string, in this case everything inside the parentheses "()".

What would be the most optimal or adequate way to do this?

string cadena = string.Empty, resultado = string.Empty;

I have an email that has a predefined format, in which only the values that are between()

Example cadena:

Hola, amigo X, ..........

bla bla bla bla
.......
('A','B','valorX','valorY',N...) //lo que quiero obtener.
.......
mas texto...
....

Se despide, atentamente, Pedro...

Looking for different ways to do it, I solved it using one of these ways presented below:

1- Using Split :

resultado = cadena.Split('(', ')')[1];

either

resultado = cadena.Split("()".ToCharArray())[1];

2- With Regular Expressions Regex.Match :

resultado = Regex.Match(cadena, @"\(([^)]*)\)").Groups[1].Value;

3- With Substring applying a bit of math:

int posInicial = cadena.LastIndexOf("(") + 1;
int longitud = cadena.IndexOf(")") - posInicial;

resultado = cadena.Substring(posInicial, longitud);

Each of those ways of doing it yields the same result:

#resultado 'A','B','valorX','valorY',N...

Honestly, it's hard for me to understand how regular expressions work, I always see them as a bunch of indecipherable hieroglyphic code...

So: What would be the most optimal or appropriate way to do this?

1 Answers

Voted

Andrespengineer · Answer 1 · 2020-06-12T08:09:32+08:00

Just do a complexity analysis.

The most efficient algorithm in terms of memory and speed would be the fourth. Basically you have to look at the linear time and memory consumption of each algorithm.

In the first algorithm:

cadena.Split('(', ')')[1];

The string is iterated in linear time, looking for the number of characters given in the Split array (passed as parameters in the method) and for each character it will iterate the list until N, where Nis the length of the string. Now, he will need to run the list and create Mtemporary variables for each character in it Split, then create a list of values by indexing which is accessed in constant time O(1).

As a result you will obtain O((N * M) + 1)where Nis the length of stringand Mthe number of substringsgenerated in each operation of Split.

The second algorithm:

cadena.Split("()".ToCharArray())[1];

It is basically the same procedure as the first algorithm, only here, it will consume more memory, because it will have to create an array of characters and create a temporary variable and iterate the stringone that in this case has been "()".

The third algorithm:

Regex.Match(cadena, @"\(([^)]*)\)").Groups[1].Value;

It is a double-edged sword. The complexity will lie in the length or complexity of the rule, forgive the redundancy. This should only be used if the rule is a bit complex, validating emails, addresses, number formats, mentions and hashtags, etc... For example, if you were not going to use Regex to validate mentions or hashtags in a string, you would have to create a gigantic algorithm and Interval Tree to obtain the indices where each mention or hashtag is found. To work with strings of massive amounts, you would spend a ton of memory trying to get all the substrings that are mentions or hashtags into giant strings. Regular expressions should be used as a validator for complex strings, as they save you from creating a gigantic algorithm. Obviously in this case, it is the one with the greatest complexity and memory consumption.

For the fourth algorithm:

int posInicial = cadena.LastIndexOf("(") + 1;
int longitud = cadena.IndexOf(")") - posInicial;
resultado = cadena.Substring(posInicial, longitud);

You would have to iterate twice the length Nof the string to then get the result in Nso the complexity would be O((2 * N) + N).

So in a top it would be:

O((2 * N) + N)the fourth algorithm.
O((N * M) + 1)the first algorithm.
O((N * M) + 1)the second algorithm. The first algorithm consumes more memory.
O(?)the fourth algorithm. Regex is the most complicated and the one that consumes more memory. Beforehand, it can be known which is the one with the greatest complexity due to the process that it implies.

Note that in your example these times are insignificant (none reach 1msprocessing time). So if you want to see the result in a better way, you would have to try it with a giant length for the chain). This answer is based on my experience in the algorithm, if someone is willing to document and contradict me or find an error, I am available to discuss it.

You can read the documentation for the analysis of Algorithms Understanding Big O Notation or This link is more complete .

Extract part of a string between two specified points

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?