What is a promise in Javascript?

Question

Asked: 2020-08-23 04:56:49 +0800 CST 2020-08-23 04:56:49 +0800 CST 2020-08-23 04:56:49 +0800 CST

Filter data when uploading CSV using Pandas

772

I am uploading very large CSV files where there is information that I do not use. What I currently do is pass to lists and then filter with conditions.

For example, in one column I have "A" and "B" but I only want those rows that have "A", so I filter out the "B's" and do it for all other lists that I load from Pandas. I think it's very inefficient what I'm doing, of loading with Pandas and then converting the columns to lists to filter them.

Is there a more efficient way to filter the data. I load the data like this in Pandas:

   import pandas as pd
   df = pd.read_csv('C:\\Users\\4209456\\Downloads\\TECNICO\\BBDD FD.csv', header=0, sep=';',parse_dates = ['Date'],dayfirst = True)

and then I create lists of some columns:

   FD=df["FD Text"]
   Fecha=pd.to_datetime(df["Date"],dayfirst = True)

Hence for both filter lists according to a condition of the list "FD", creating another list "FD2", within the condition I have the list "Date" creating a corrected list "Date2". This way I have the lists I need to start my code, which are "FD2" and "Date2", keeping the original positions of the data left over from the filter.

1 Answers

Voted

FJSevilla · Answer 1 · 2020-08-23T05:39:35+08:00

There are several ways to filter rows in Pandas, the simplest is to create a mask (boolean array) using a conditional on the column in question and then filter the rows of the Dataframe with it. It is basically the procedure followed in NumPy to filter arrays. This is known as Boolean indexing .

We can create a small example to see it:

import pandas as pd
from io import StringIO

data = StringIO('''
Nombre,Edad,Sexo,Fecha
Juan,12,M,20/08/2017
Laura,21,F,24/07/2017
Pedro,53,M,13/01/2017
María,17,F,15/03/2017
Luís,19,M,15/07/2017
Miguel,23,M,14/08/2017
''')

df = pd.read_csv(data, header=0, parse_dates = ['Fecha'], dayfirst = True)

We therefore have the following Dataframe:

>>> df

   Nombre  Edad Sexo      Fecha
0    Juan    12    M 2017-08-20
1   Laura    21    F 2017-07-24
2   Pedro    53    M 2017-01-13
3   María    17    F 2017-03-15
4    Luís    19    M 2017-07-15
5  Miguel    23    M 2017-08-14

We can filter using the column sexoto obtain another Dataframe that only contains the rows corresponding to women simply by doing:

>>> df2 = df[df['Sexo'] == 'F']
>>> df

  Nombre  Edad Sexo      Fecha
1  Laura    21    F 2017-07-24
3  María    17    F 2017-03-15

With df['Sexo'] == 'F'simply creates a mask that in this case is a series of Pandas and that contains a single column of booleans: [False, True, False, True, False, False]result of comparing if each value of the column is equal to 'F'. We can filter with any other iterable of boolean values, for example a NumPy array, a list, a column from another DataFrame, etc.

Another example, filtering out those who are 18 or older:

>>> df2 = df[df['Edad'] >= 18]

>>> df2

   Nombre  Edad Sexo      Fecha
1   Laura    21    F 2017-07-24
2   Pedro    53    M 2017-01-13
4    Luís    19    M 2017-07-15
5  Miguel    23    M 2017-08-14

Edition:

You can filter by dates following the same idea, for example we can filter the rows with Fechabetween the current date and 30 days ago:

>>> fecha_limite = pd.datetime.now().date() - pd.Timedelta(days=30)
>>> df2 = df[(df['Fecha'] > fecha_limite)]
>>> df2

   Nombre  Edad Sexo      Fecha
0    Juan    12    M 2017-08-20
1   Laura    21    F 2017-07-24
5  Miguel    23    M 2017-08-14

Several conditions can also be used, for example the previous condition but also being a woman:

>>> fecha_limite = pd.datetime.now().date() - pd.Timedelta(days=30)
>>> df2 = df[(df['Fecha'] > fecha_limite) & (df['Sexo'] == 'F')]
>>> df2

  Nombre  Edad Sexo      Fecha
1  Laura    21    F 2017-07-24

It can be filtered using the index in the same way or using loc, for example, if our column Fechawas in the DataFrame index (DateTimeIndex) we can do:

>>> fecha_limite = pd.datetime.now().date() - pd.Timedelta(days=30)
>>> df2 = df[df.index.date > fecha_limite]

Filter data when uploading CSV using Pandas

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?