What is a promise in Javascript?

Question

Jorge Ponti

Asked: 2020-07-28 05:57:38 +0800 CST 2020-07-28 05:57:38 +0800 CST 2020-07-28 05:57:38 +0800 CST

String Matches in Python 3

772

Dear, does anyone know a method to find non-exact matches between text strings?

For example:

I have the following text "STATUS MSG PACK ACM L" (column 1) and it should return "PACK L" (column 2).

I have 2 lists, one written by a person that are longer texts and another that corresponds to the message to search for, which is the correct one.

I enclose an example of the two lists: column 1 should be searched for in column 2, and return the most associated element of column 2:

https://drive.google.com/file/d/0B11sJdX_AaJBd2lvWGszaFpXM2c/view?usp=sharing

1 Answers

Voted

Patricio Moracho · Answer 1 · 2020-07-28T07:24:18+08:00

For fuzzy searches there are multiple tools and methods, but using factory Python we already have the base library difflibthat allows us to obtain a ratiosimilarity between strings. For example:

from difflib import SequenceMatcher as SM

s1 = 'Hola Mundo'
s2 = 'Hola Mundo cruel'
print(SM(None, s1, s2).ratio())

s1 = 'Hola Mundo'
s2 = 'Hola Mundo!'
print(SM(None, s1, s2).ratio())
> 0.7692307692307693
> 0.9523809523809523

In this example we measure the similarity of Hola Mundowith other strings and we see that logically it Hola Mundo!obtains a similarity ratio greater than Hola Mundo cruel. The idea then, would be to go through a list, and for each element, check the ratios with respect to the elements of the second list, the largest will be the most similar. Something like that:

import difflib

lista1 = ["STATUS MSG PACK ACM L"]
lista2 = ["LOW LIMIT VALVE L",
          "LOW LIMIT VALVE R",
          "PACK ACM L",
          "PACK ACM R",
          "PACK L",
          "PACK MODE L",]

d = difflib.Differ()

for search in lista1:
  matches = sorted(lista2, key=lambda x: difflib.SequenceMatcher(None, x, search).ratio(), reverse=True)    
  print("{0} se compara con {1} el más parecido es {2}".format(search, matches, matches[0]))

In matcheswe end up having the elements of the second list, ordered from greater similarity to less, the first element should be the optimal one.

Important : In this way we will always find a "similarity", as an additional improvement you may have to contemplate a ratiominimum of similarity to consider that the "matching" has been achieved, this value can only be defined by experimenting.

Even better is the way suggested by FjSevilla because it is more compact and because it already incorporates the logic to evaluate the minimum ratio:

matches = difflib.get_close_matches(search, possibilities = lista2, n = 1, cutoff = 0.6)

As a curiosity, it would be necessary to indicate that difflibit is strongly based on THE GESTALT APPROACH algorithm of 1987.

String Matches in Python 3

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?