What is a promise in Javascript?

Question

Asked: 2021-12-31 19:39:31 +0800 CST 2021-12-31 19:39:31 +0800 CST 2021-12-31 19:39:31 +0800 CST

Using For to modify tables

772

I have the following code in Pyspark:

%%time

lst_tablas = [tabla_pivote_0_15, tabla_pivote_15_30, tabla_pivote_30_60, tabla_pivote_60_90, tabla_pivote_90_120,
             tabla_pivote_120_150, tabla_pivote_150_180,
             tabla_pivote_180_210, tabla_pivote_210_240, tabla_pivote_240_270,
             tabla_pivote_270_300, tabla_pivote_300_330, tabla_pivote_330_360]

trans = ['COMPRAS_CREDITO', 'RETIROS_CREDITO','RETIROS_DEBITO','COMPRAS_MADRUGADA','COMPRAS_MANANA','COMPRAS_TARDE','COMPRAS_NOCHE','RETIROS_MADRUGADA','RETIROS_MANANA','RETIROS_TARDE','RETIROS_NOCHE','COMPRAS_ENTRE_SEMANA','RETIROS_ENTRE_SEMANA','MAX_DEBITO_DISPONIBLE','MAX_PORC_ENDEUDAMIENTO','PORCENTAJE_COMPRAS_ECOMMERCE','razon_comercios','razon_transacciones','razon_compras','razon_retiros','razon_ecomerce']
i = 1
for t in lst_tablas:
    for var in trans:
        nombre = var + '_m' + str(i)
        t = t.withColumnRenamed(var, nombre)
        
    i += 1
    #display(t.show())
tabla_pivote_30_60.show()

In which I have in a list the tables with information of the 12 months of the year. And the first month from 0 to 15 days and from 15 to 30 days.

What I am trying to do is that to each one of those tables that have different fields, only to the fields that are in the list transI put a suffix, that is why the part of the code of var + '_m' + str(i).

When I see the results and do a showin the table tabla tabla_pivote_30_60, I realize that the name of the variables that I wanted was not modified:

However, if I uncomment the line

display(t.show())

To validate what happened with it, forI realize that within forit it does it well, but finishing forit does not change the name of the tables, that is, as if they were only temporary:

Could someone tell me what is going on and help me to permanently modify the tables. First of all, Thanks

1 Answers

Voted

Rubiales Alberto · Answer 1 · 2022-01-03T02:18:28+08:00

Your bug has NOT to do with pyspark, it has to do with how Python works

To do this, first I am going to explain it to you with pure Python, without using pyspark.

Explanation

In Python, the operator =reserves a memory space to assign information. Two things can happen:

If the variable does not exist, a new memory space will be reserved on your computer and assigned to the variable. Therefore it will be completely new and will not be related to anything before.
If the variable exists, two things can happen, either the existing memory space is used or it is destroyed (due to incompatibility) and a new one is created.

In your case, when you iterate through the loop, you create a new variable called twhich you then overwrite, that is, Python destroys the variable tand creates a tnew variable, which has nothing to do with the Pyspark DataFrame , which is in your list lst_tablas. Let's demonstrate this:

lista_de_listas = [[1,2,3], [4,5,6]]

print("Posición en memoria primer elemento: ", hex(id(lista_de_listas[0])))
print("Posición en memoria segundo elemento: ",hex(id(lista_de_listas[1])))

Departure:

Posición en memoria primer elemento: 0x7f05be787cd0
Posición en memoria segundo elemento: 0x7f05ad300780

Great, now let's reproduce the mapping you're doing with pysparkthis list:

for lista in lista_de_listas:
    print("Posición en memoria antes: ", hex(id(lista)))
    lista = [1,2]
    print("Posición en memoria después: ", hex(id(lista)))

Departure:

Posición en memoria antes: 0x7f05be787cd0
Posición en memoria después: 0x7f05ad2f4280
Posición en memoria antes: 0x7f05ad300780
Posición en memoria después: 0x7f05ad2f4280

Exactly what has been mentioned has happened, in this case, when iterating with forI have gone to the memory position of each element of lista_de_listasand then I overwrite lista, this variable has been assigned to a new memory position, that is, it has nothing to do with the items insidelista_de_listas

Solution

There are several solutions, I leave you a couple following the previous example:

use enumerate

It consists of indicating the element of the list that we refer to so that a new variable is not created, and an assignment is made to an existing memory space.

for idx, lista in enumerate(lista_de_lista):
    lista_de_lista[idx] = [1,2]
    
print(lista_de_lista)

Departure: [[1, 2], [1, 2]]

enumeratereturns an iterator with the position of the element and the element (in this order). We could do the same with range()andlen()

Your case would be:

# Notese que he eliminado la i, ya que con el enumerate no es necesario para el nombre de la variable
for idx, t in enumerate(lst_tablas):
    for var in trans:
        nombre = var + '_m' + str(idx+1)
        lst_tablas[idx] = t.withColumnRenamed(var, nombre)

use range

range: creates a range of numbers up to the indicated one.
len: tells us the length of the list.

for idx in range(len(lista_de_lista)):
    lista_de_lista[idx] = [1,2]

print(lista_de_lista)

Departure: [[1, 2], [1, 2]]

Your case would be:

for idx in range(len(lst_tablas)):
    for var in trans:
        nombre = var + '_m' + str(idx+1)
        lst_tablas[idx] = t.withColumnRenamed(var, nombre)

Using For to modify tables

Explanation

Solution

use enumerate

use range

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?