What is a promise in Javascript?

Question

Asked: 2022-02-10 11:01:30 +0800 CST 2022-02-10 11:01:30 +0800 CST 2022-02-10 11:01:30 +0800 CST

Sort text file

772

I have a text file with the book of Don Quixote, my goal is to take the book, remove the line breaks and do a split in the (dots) (commas) and (semicolons) so that I get a list with all the sentences from the book. example

Yo, Juan Gallo de Andrada, escribano de Cámara del Rey nuestro señor, de
los que residen en su Consejo, certifico y doy fe que, habiendo visto por
los señores dél un libro intitulado El ingenioso hidalgo de la Mancha,
compuesto por Miguel de Cervantes Saavedra, tasaron cada pliego del dicho
libro a tres maravedís y medio; el cual tiene ochenta y tres pliegos, que
al dicho precio monta el dicho libro docientos y noventa maravedís y medio,
en que se ha de vender en papel; y dieron licencia para que a este precio
se pueda vender, y mandaron que esta tasa se ponga al principio del dicho
libro, y no se pueda vender sin ella. Y, para que dello conste, di la
presente en Valladolid, a veinte días del mes de deciembre de mil y
seiscientos y cuatro años.

Example of what it should look like

Yo
Juan Gallo de Andrada
escribano de Cámara del Rey nuestro señor
de los que residen en su Consejo

Well, I have this done but it doesn't do it the way I want it to.

with open(ruta_libros.format("quijote"), "r", encoding="utf-8") as libro:
    large_str=""
    for line in libro:
        large_str+=line.rstrip()
    

lista=large_str.split(".")
print(lista)

The book has some characters that might be unnecessary like this (- « » ... "" ) I don't know if this really affects the process, but the book doesn't seem to be saved completely with the loop, at least not the first part.

Here's an online look at how the book is in txt, maybe you'll understand if python is incompatible with something. Don Quixote TXT

ingo Revulgo
Estos dos príncipes, sin que los solicite adulación mía ni otro género de aplauso, por sola su bondad, han tomado a su cargo el hacerme merced y favorecerme
en lo que me tengo por más dichoso y más rico que si la fortuna por camino ordinario me hubiera puesto en su cumbre
La honra puédela tener el pobre, pero no el vicioso

This is a part of what is on top of ingo Revulgo, which looks like Mingo is cut.(this is not what is printed but part of what is not processed)

Dile también que de la amenaza que me hace, que me ha de quitar la ganancia
con su libro, no se me da un ardite, que, acomodándome al entremés famoso
de La Perendenga, le respondo que me viva el Veinte y cuatro, mi señor, y
Cristo con todos. Viva el gran conde de Lemos, cuya cristiandad y
liberalidad, bien conocida, contra todos los golpes de mi corta fortuna me
tiene en pie, y vívame la suma caridad del ilustrísimo de Toledo, don
Bernardo de Sandoval y Rojas, y siquiera no haya emprentas en el mundo, y
siquiera se impriman contra mí más libros que tienen letras las Coplas de
Mingo Revulgo.

1 Answers

Voted

Candid Moe · Answer 1 · 2022-02-10T12:07:51+08:00

Since the text is long, it is convenient to process it in parts. In this case, we choose to process in paragraphs, so as not to cut off sentences that span more than one line.

The end of the paragraph is recognized by finding an empty line. When it is end of paragraph, we pass the list of accumulated lines to a function that will process it. If it is not end of paragraph, the line is accumulated in the list parrafo.

with open("quijote.txt", "r", encoding="utf-8") as libro:
    parrafo = []
    for line in libro:
        line = line.strip()   # Botar los whitespaces al final.
        if line == '':
            for oracion in procesar_parrafo(parrafo):
                print(oracion)
            parrafo = []
        else:
            parrafo.append(line)

The process of breaking the paragraph apart would be done properly using regular expressions. I will not use them to make the code simpler.

The function receives a list of lines. The first thing is to build a single line using join.

We do not have a function splitthat divides by several separators at the same time. What I will do is replace the ";" and with "." (using replace), to then make split(".")and return the list of sentences.

The final step, in the return, is to remove the whitespace at the beginning and end of each sentence using list compression.

def procesar_parrafo(parrafo):

    completo = ' '.join(parrafo)
    completo = completo.replace(",", ".")
    completo = completo.replace(";", ".")
    lista_punto = completo.split(".")

    return [x.strip() for x in lista_punto]

produces:

El ingenioso hidalgo don Quijote de la Mancha
TASA
Yo
Juan Gallo de Andrada
escribano de Cámara del Rey nuestro señor
de los que residen en su Consejo
certifico y doy fe que
habiendo visto por los señores dél un libro intitulado El ingenioso hidalgo de la Mancha
compuesto por Miguel de Cervantes Saavedra
tasaron cada pliego del dicho libro a tres maravedís y medio
el cual tiene ochenta y tres pliegos
que al dicho precio monta el dicho libro docientos y noventa maravedís y medio
en que se ha de vender en papel
y dieron licencia para que a este precio se pueda vender
y mandaron que esta tasa se ponga al principio del dicho libro
y no se pueda vender sin ella

Sort text file

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?