What is a promise in Javascript?

Question

Rafael Monroy Rodriguez

Asked: 2021-11-13 19:52:53 +0800 CST 2021-11-13 19:52:53 +0800 CST 2021-11-13 19:52:53 +0800 CST

Trouble counting words in a text

772

I am trying to make a program which tells me the words of a text that contains 4 sentences but it should only count the word 1 time per sentence, what happens to me is that when counting them it counts the word only once. How could I solve this?

I attach the text of which I have to do that validation.

Podador que podas la parra, que parra podas?
Podas mi parra o tu parra podas?
Ni podo tu parra, ni mi parra podo,
que podo la parra de mi tio Bartolo que apodase tolo.

In my case it counts me 1 times "parra" which should be 4 times. (those that are repeated in the same sentence are not counted)

Thanks.

Attached code.

def lim(x):
x=x.lower()
x=x.rstrip('?')
x=x.rstrip('.')
x=x.rstrip(',')
x=x.rstrip(':')
x=x.rstrip(';')
return x

a=open('discurso.txt','r')
palabras={}
for i in a:
  p=i.split()
  print(p)
  for j in p:
    for k in p:
        j=lim(j)
        if len(j)>4 and j!=k:
            if j not in palabras:
                palabras[j]=1
print(palabras)

2 Answers

Voted

David JP · Answer 1 · 2021-11-13T21:56:24+08:00

Reading your code I can't understand what you are trying to do, but I think you are looking for something like this:

def lim(x):
    x=x.lower()
    x=x.rstrip('?')
    x=x.rstrip('.')
    x=x.rstrip(',')
    x=x.rstrip(':')
    x=x.rstrip(';')
    return x

a=open('discurso.txt','r')
palabras={}
for i in a:
    p=i.split()
    print(p)
    for j in range(len(p)):
        # limpio cada elemento: 
        # p[j] es la palabra y 
        # j su posición en la línea
        p[j]=lim(p[j])
        # p.index(p[j]) es la posición 
        # de la primera palabra que aparece en la línea
        # si no coinciden es que está repetida
        if p.index(p[j])==j and len(p[j])>4:
            try: 
                palabras[p[j]]+=1
            except:
                palabras[p[j]]=1
print(palabras)

You are not really counting, you are missing the +=1, but you assign a 1to the words that are found with more than 4 characters. The j!=kwill not work as expected either, what if a word appears three times on the same line?

I hope this is the solution you are looking for, the result I get is:

{'podador': 1, 'podas': 2, 'parra': 4, 'bartolo': 1, 'apodase': 1}

Candid Moe · Answer 2 · 2021-11-14T09:00:45+08:00

This solution uses regular expressions to extract the words, removing the punctuation marks. It does this by defining a pattern with capture ([a-zA-Z]{5,})that recognizes only words of five letters or more.

We compile this pattern

patron = re.compile("([a-zA-Z]{5,)")

and then we use it to break a phrase into its words:

patron.findall(frase)]

That produces a list of words with repetitions and upper/lower case. We use list compression to convert everything to lowercase and then build a setto remove duplicates from each phrase:

palabras = set([x.lower() for x in patron.findall(frase)])

Then we are left to loop through the list palabrasand update the counters. For that we use a defaultdict. It's just like a standard dictionary, only it automatically creates the entry when the key doesn't exist.

This dictionary uses the word by key and keeps track of how many times it appears in total:

for pal in palabras:
    cuenta[pal] += 1

In short, the complete code is:

from collections import defaultdict
import re
patron = re.compile("([a-zA-Z]{5,})")

cuenta = defaultdict(int)
with open("palabras.txt", "r") as archivo:
    for frase in archivo:
        palabras = set([x.lower() for x in patron.findall(frase)])
        for pal in palabras:
            cuenta[pal] += 1
            
for k, v in cuenta.items():
    print(k, v)

produces:

podador 1
parra 4
podas 2
apodase 1
bartolo 1

Trouble counting words in a text

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?