What is a promise in Javascript?

Question

Juan

Asked: 2020-04-02 21:03:03 +0800 CST 2020-04-02 21:03:03 +0800 CST 2020-04-02 21:03:03 +0800 CST

Replace cell values by cells within a for in pandas

772

I need to fix an error that appears when replacing cell values by cells within a loop in pandas:

Using the following information:

Reduced info for the example

import pandas as pd
data = pd.read_csv('iEC7R76C.txt', sep=",")
data.head(2)

I want to complete the year based on the title and I get it like this:

import re
cadena = data.title[0]
cadena
# 'Nicosia 2013 VulkÃ\xa0 Bianco  (Etna)'
cadena = re.sub("\D", "", cadena)
cadena
# 2013

Now if I match the year in position 0 to that string, it changes it but I get this alert that I don't know what it means and although I already google enough I don't understand it:

data.anio[0] = cadena
data.head(2)

With something more or less like this I can go through all the data and complete it but the error keeps coming up:

for i in range(0, len(data)):
    cadena = data.title[i]
    cadena = re.sub("\D", "", cadena)
    if cadena != '':
        if 1900 <= int(cadena) <= 2020:
            data['anio'][i] = cadena

I put that empty condition because sometimes the title does not have the year and if that year is between 1900 and 2020 because there are titles that have for example "Name of wine 2001 batch 2" and when I get the numbers within that string I returns 20012

Enter the link that shows me the link error but I do not understand.

1 Answers

Voted

abulafia · Answer 1 · 2020-04-02T23:40:44+08:00

First, what does the error mean?

It's a warning that when you do something like:

 dataframe[columna][indice]="otra cosa"

the value may not be assigned. This is because the select dataframe[columna]sometimes returns a reference to the actual column of your dataframe, in which case subsequent access to [indice]yes will modify the column, but in other circumstances it may dataframe[columna]return a copy of your column, in which case subsequent assignment would modify that copy, but not the original dataframe.

It seems that Pandas sometimes returns a copy, although I'm not clear on the circumstances. I guess it will depend on what kind of expression you put to select the column. In any case, just in case, it warns you not to do that.

What to do then?

The correct way to reference a cell to change its value would be:

dataframe.loc[indice, columna] = "otra cosa"

This way of accessing the cell will always modify the original dataframe, and therefore will not generate the warning.

In your code it would translate to:

for i in range(0, len(data)):
    cadena = data.title[i]
    cadena = re.sub("\D", "", cadena)
    if cadena != '':
        if 1900 <= int(cadena) <= 2020:
            data.loc[i,'anio'] = cadena

a better way

Whenever possible, loops should be avoided when doing pandas operations, and change them to vector operations that act on the entire dataframe in a single line (naturally, internally pandas will make loops for it, but they will be much more efficient as they are done in C than what you can do in Python).

In this case all of your previous loop can be reduced to one line:

data["anio"] = data.title.str.extract(r"(\d{4})", expand=False)

The operator .stron a column returns a vector of objects that are the contents of the column, but have vector methods like .extract() (and you have many more, like .startswith(), .strip(), etc) that work on all of them "at once" so to speak.

In this case the method extract()expects a regular expression with at least one capturing group indicating which part of the text you want to extract. In this case I have asked for a group of 4 digits. This way you avoid the problem that the result would include extra digits that were not part of the year.

Update

The user indicates in a comment that some rows contain other four-digit numbers such as 1840, in addition to the year sought. My solution with .extract()extracts only the first match , so if there are two it will keep the first one. The user asks if it would be possible to verify if the extracted year is between 1900 and 2019 and if not, leave the result empty.

I have a better solution. You can refine the regular expression so that instead of saying "any sequence of four digits" it says "the digits 19 followed by two more digits, or the digits 20 followed by a 1 or 0, and then another digit" .

The regular expression that says that would be:

(19\d\d|20[01]\d)

Note the use of |to separate the two desired options. The first is 19\d\dthat it will match any year in the last century, and the second is 20[01]\dthat it will match a year whose first two digits are "20", the next is either a 1 or a 0, and the last is any digit, that is, the years between 2000 and 2019.

So now you would put:

data["anio"] = data.title.str.extract(r"(19\d\d|20[01]\d)", expand=False)

Thanks to this, in cells where you had two possible groups of four digits, such as "18401 Cellars 2013 Proprietary Red (Walla Walla Valley (OR))" only one of the groups will match the regular expression, and that will be the one returned.

Note that a case like: "19235 Cellars 2013 Proprietary Red (Walla Walla Valley (OR))" could still appear, in which case there would be a match with 1923 that would be wrong. You can be even more precise with the regular expression and force the group of digits you are looking for to be a "whole word", that is, there is a "word border" on both sides of the number you are looking for. So "19235" would not fit because after the 3 there is no word border. In a regular expression the character \bmeans just that ( word boundary ), so the following regular expression would be even safer to avoid cases like that:

data["anio"] = data.title.str.extract(r"\b(19\d\d|20[01]\d)\b", expand=False)

Replace cell values by cells within a for in pandas

First, what does the error mean?

What to do then?

a better way

Update

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?

Replace cell values ​​by cells within a for in pandas

1 Answers

First, what does the error mean?

What to do then?

a better way

Update

Replace cell values by cells within a for in pandas