What is a promise in Javascript?

Question

Asked: 2022-02-10 18:07:19 +0800 CST 2022-02-10 18:07:19 +0800 CST 2022-02-10 18:07:19 +0800 CST

Create text files and store words

772

Update my idea, now I want to create a txt file with the first word of the sentence and have the second word of the sentence inside, then delete the first word of the sentence and create another txt with the new first word and save it again word that follows and so on until ending with a sentence.

Example:

(Yo soy Lola.)

Yo.txt=soy
soy.txt=Lola.
Lola..txt=(no habra nada porque la oración concluyo)

If the second sentence that comes has words that have been created, then only the second word is added, but if the second word already exists in that file then it is not added.

Example.

(Yo seré Lola.)
Yo.txt= soy seré
seré.txt= Lola.
Lola.txt=(no habrá nada aqui oración terminada)

With this function I get the first word of the sentence.

def primera_pal(oracion):
    for palabra in oracion.split():
        print("llege a la funcion: ",palabra)
        return palabra

ignore this

def procesar_parrafo(parrafo):

    completo = ' '.join(parrafo)
    #completo = completo.replace(",", ".")
    completo = completo.replace(";", ".")
    completo = completo.replace("—","")
    completo = completo.replace("«", "")
    completo = completo.replace("»", "")
    lista_punto = completo.split(".")

    return [x.strip() for x in lista_punto]
parrafo=[]
activar_af=0
with open(ruta_libros.format("quijote"), "r", encoding="utf-8") as libro:
    parrafo = []
    for line in libro:
        line = line.strip()  # Botar los whitespaces al final.
        if line == '':
            for oracion in procesar_parrafo(parrafo):
                #print(oracion)
                with open(ruta_libros.format("quijote2"), "a", encoding="utf-8") as librox:

Well here is the invention (I need to add the sentences in their respective files, with line break.)

                    ### ENFOCATE DE AQUI PARA ABAJO #####
                pal_en1=oracion
                pal_en2=pal_en1
                print("-----Pal 2 Antes: ",pal_en2)
                activar_af=0
                for oracionx in pal_en2.split():
                    #print("Oracionx: ", oracionx)

                    pr_pal = primera_pal(pal_en2)

                    #pr_pal=' '.join(pal_en2.split()[1:])
                    with open(ruta_conocimientos.format(pr_pal), "a", encoding="utf-8") as datox:
                        if oracionx not in "" and activar_af <=2:
                            print("La oracionx: ",oracionx)
                            print("Dentro-----------------------------------------")
                            print("primera_palabra: ",pr_pal)
                            datox.write(oracionx+" ")
                            pal_en2=pal_en2.replace(pr_pal,"",1)

                            activar_af+=1
                            if activar_af>=2:
                                datox.write("\n")
                                datox.close()
                            print("Pal 2 despues: ",pal_en2)
                        #if oracion not in "":
                    #    librox.write(oracion+".")

            parrafo = []
        else:
            #print(line)
            parrafo.append(line)

1 Answers

Voted

abulafia · Answer 1 · 2022-02-11T12:37:26+08:00

The data structure you want to store in the background is a list of words (the files) each of which is a reference to another list of words (the contents of the files).

I think you can store all that information much more efficiently if in Python you create a dictionary whose keys are the words, and whose values are the lists with other words.

Following your same example, instead of creating files called Yo.txt, soy.txt, seré.txt, Lola..txtthat contain words, what you would have would be the following dictionary:

diccionario = {
 "Yo": ["soy", "seré"],
 "soy": ["Lola."],
 "seré": ["Lola."],
 "Lola.": [],
}

Once you have built this dictionary in memory (which will be much faster than creating the equivalent structure on disk), you can also save it in a file, if what you are concerned about is the persistence of the data (that is, that they can continue to exist to disk once the program has finished).

Saving it to file can be extremely simple if you use the module pickle:

import pickle

with open("vocabulario.data", "wb") as f:
  pickle.dump(diccionario, f)

And to retrieve it would be like this:

with open("vocabulario.data", "rb") as f:
  diccionario = pickle.load(f)

The use of pickleas you see is quite simple. The drawback is that the resulting file is not editable. If you open it with a text editor you will see "garbage" mixed in with your data (that "garbage" is actually what tells python what kind of data is stored there, which allows it to retrieve it later).

If you prefer an "editable" format (if only to read it from an editor, without needing to load it into Python) you can use json. In this case you would save it like this:

import json

with open("vocabulario.json", "w") as f:
  json.dump(diccionario, f)

And you would retrieve it like this:

import json

with open("vocabulario.json", "r") as f:
  diccionario = json.load(f)

As you can see, the mechanics are practically the same, but the content of the file is now readable and in fact it looks like a Python dictionary just like the one I wrote above (the JSON format, although it is not exactly the same as the Python data syntax, is seems like a lot in many cases, and in this particular case where the data is all of type string, list, and dictionary, the syntax would be identical).

Note

Since the purpose of all this is to store a data structure that captures the pairs of words that can appear one after the other, I think you need some kind of "start of sentence" and "end of sentence" marker , as additional pseudowords . In this way the start marker would be one more key in the dictionary and the associated list would give which words can start a sentence. Similarly, if a word could be the last of a sentence, the end-of-sentence mark would appear among the items in its list.

For example, suppose the start flag is "START" and the end flag is "END" (any other string that can't appear as a word would do). So the dictionary corresponding to your example would be more like this:

diccionario = {
 "START": ["Yo"],
 "Yo": ["soy", "seré"],
 "soy": ["Lola."],
 "seré": ["Lola."],
 "Lola.": ["END"],
}

This allows you to know where to start a sentence, and also allows some words to appear both at the end of the sentence and after others (if the special "END" marker appears among your list of successors).

Bonuses

A dictionary like the one above can be built with a few lines of code, assuming the list oracionescontains a series of phrases, such as:

oraciones = [
 'En un lugar de la Mancha, de cuyo nombre no quiero acordarme, no ha mucho tiempo que vivía un hidalgo de los de lanza en astillero, adarga antigua, rocín flaco y galgo corredor',
 'Una olla de algo más vaca que carnero, salpicón las más noches, duelos y quebrantos los sábados, lantejas los viernes, algún palomino de añadidura los domingos, consumían las tres partes de su hacienda'
]

The following code would be used to build the searched dictionary. During the construction of that dictionary I use sets ( set()) as an efficient way to avoid putting repeated words in the lists. At the end of the loop I convert those sets to alphabetical lists to make them easier to browse:

from collections import defaultdict

d = defaultdict(set)

# Palabras especiales para marcar inicio y fin de frase
START = " START "
END = " END "

for linea in oraciones:
  anterior = START
  for palabra in linea.split():
    d[anterior].add(palabra)
    anterior = palabra
  d[anterior].add(END)

# Una vez procesado todo el libro, pasamos los conjuntos a listas ordenadas
d = {k: list(sorted(v)) for k,v in d.items() }

As a curiosity, for the two phrases from Quixote shown above, the resulting dictionary will be:

{' START ': ['En', 'Una'],
 'En': ['un'],
 'Mancha,': ['de'],
 'Una': ['olla'],
 'acordarme,': ['no'],
 'adarga': ['antigua,'],
 'algo': ['más'],
 'algún': ['palomino'],
 'antigua,': ['rocín'],
 'astillero,': ['adarga'],
 'añadidura': ['los'],
 'carnero,': ['salpicón'],
 'consumían': ['las'],
 'corredor': [' END '],
 'cuyo': ['nombre'],
 'de': ['algo', 'añadidura', 'cuyo', 'la', 'lanza', 'los', 'su'],
 'domingos,': ['consumían'],
 'duelos': ['y'],
 'en': ['astillero,'],
 'flaco': ['y'],
 'galgo': ['corredor'],
 'ha': ['mucho'],
 'hacienda': [' END '],
 'hidalgo': ['de'],
 'la': ['Mancha,'],
 'lantejas': ['los'],
 'lanza': ['en'],
 'las': ['más', 'tres'],
 'los': ['de', 'domingos,', 'sábados,', 'viernes,'],
 'lugar': ['de'],
 'mucho': ['tiempo'],
 'más': ['noches,', 'vaca'],
 'no': ['ha', 'quiero'],
 'noches,': ['duelos'],
 'nombre': ['no'],
 'olla': ['de'],
 'palomino': ['de'],
 'partes': ['de'],
 'que': ['carnero,', 'vivía'],
 'quebrantos': ['los'],
 'quiero': ['acordarme,'],
 'rocín': ['flaco'],
 'salpicón': ['las'],
 'su': ['hacienda'],
 'sábados,': ['lantejas'],
 'tiempo': ['que'],
 'tres': ['partes'],
 'un': ['hidalgo', 'lugar'],
 'vaca': ['que'],
 'viernes,': ['algún'],
 'vivía': ['un'],
 'y': ['galgo', 'quebrantos']}

Create text files and store words

Note

Bonuses

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?