What is a promise in Javascript?

Question

LopezAi

Asked: 2022-04-19 07:26:43 +0800 CST 2022-04-19 07:26:43 +0800 CST 2022-04-19 07:26:43 +0800 CST

Nested For to compare the records of a dataframe

772

I want to make a nested for in python, where I compare all the records of a dataframe record by record, I have tried this so far (the condition compares if the record of the first for has a Null date and the record of the second for also):

for i in df.index:
    for j in df.index:
        if i!=j and df['idDCL'][i]==df['idDCL'][j] and pd.isnull(df['fechaFin'][i])==True and pd.isnull(df['fechaFin'][j])==True:

1 Answers

Voted

abulafia · Answer 1 · 2022-04-20T04:52:08+08:00

The data

There is a dataframe whose columns include two that are of interest, called "idDCL" and "fechaFin". A piece of this dataframe would be:

     idCDU   idDCL  ...      fechaActivacion             fechaFin
0  AAA-018  557DGQ  ...  2021-02-18 16:36:40  2021-02-28 21:04:51
1  AAB-200  097BAS  ...  2021-04-22 13:30:07  2021-08-17 22:35:59
2  AAC-013  090CND  ...  2021-09-12 11:51:37  2021-09-20 12:22:59
3  AAC-371  622QQD  ...                  NaN   2021-06-14 9:46:50
4  AAD-466  638HUB  ...  2021-06-21 18:54:51                  NaN

"idDCL"The same value can appear in the column many times. For example, the value "557DGQ" appears ten times:

>>> print(df[df.idDCL=="557DGQ"])

         idCDU   idDCL  ...      fechaActivacion             fechaFin
0      AAA-018  557DGQ  ...  2021-02-18 16:36:40  2021-02-28 21:04:51
4387   EGC-712  557DGQ  ...  2021-03-09 19:38:08  2021-04-26 13:00:53
5504   FJH-821  557DGQ  ...   2021-05-11 9:18:08   2021-05-11 9:32:50
6800   GRG-384  557DGQ  ...  2021-05-19 15:34:21   2021-05-24 8:54:50
8019   HXM-326  557DGQ  ...  2021-03-04 16:32:08  2021-03-04 21:08:59
8023   HXQ-144  557DGQ  ...   2021-05-11 9:17:09   2021-05-11 9:18:07
12238  MDG-186  557DGQ  ...   2021-05-03 9:06:23   2021-05-03 9:54:15
14931  OTS-052  557DGQ  ...   2021-05-19 7:53:51   2021-05-19 9:55:28
16662  QMM-242  557DGQ  ...   2021-05-11 9:33:01  2021-05-17 11:52:29
23961  XRW-652  557DGQ  ...   2021-04-29 9:32:31   2021-05-03 9:06:22

In some cases in fechaFinappears NaN. Some of the idDCLs can have one NaN or more than one, for example, it happens to "534OUT" that appears 13 times but in two of them it has NaN in EndDate.

>>> print(df[df.idDCL=="534OUT"])

         idCDU   idDCL  ...      fechaActivacion             fechaFin
142    ADO-478  534OUT  ...  2020-10-14 15:58:04  2020-11-01 16:44:32
4225   EBR-429  534OUT  ...   2021-05-12 9:08:10  2021-05-12 10:31:34
6068   FXT-010  534OUT  ...   2021-05-19 7:08:17   2021-05-19 7:12:23
8734   IPS-343  534OUT  ...   2021-05-14 8:04:34   2021-05-14 8:07:24
9277   JCU-926  534OUT  ...  2021-01-05 21:26:19  2021-01-06 15:04:56
11704  LOT-549  534OUT  ...   2021-05-14 8:50:55   2021-05-14 9:32:16
14642  OMH-138  534OUT  ...  2021-01-11 14:18:00  2021-03-10 15:45:40
15799  PPO-756  534OUT  ...   2021-05-12 9:03:09   2021-05-12 9:06:28
19128  SXS-454  534OUT  ...   2021-05-19 7:47:41                  NaN
19533  THQ-267  534OUT  ...  2021-05-12 10:32:35  2021-05-13 12:55:42
21814  VOF-283  534OUT  ...                  NaN                  NaN
22148  VWN-535  534OUT  ...  2020-07-23 10:48:24  2020-07-23 11:00:07
26263  ZZT-611  534OUT  ...  2021-05-11 15:17:20  2021-05-11 15:58:42

The problem posed

(As I understood it, correct me in comments if I'm wrong)

Find all those values of for which there are two or more NaNs idDCLin the column .fechaFin

The solution

Group by idDCLand for each group count how many NaNthere are, keeping only (filtering) the groups that have more than 1 NaN. From the result we are left with the idDCL column, which we convert to a set to remove duplicates:

ids = set(df.groupby("idDCL").filter(lambda x: x.fechaFin.isna().sum()>1).idDCL)

This gives us the set of ids(idDCL) that have two or more NaNs in EndDate. We can take a look at the result:

>>> print(ids)
{'206NIO', '999EBF', '517QMS', '130VTQ', '406LWW', '529KFZ', '389LCG', 
'753NND', '738WSS', '709RAP', '102BKR', '421LMV', '648RIP', '931FUO', 
'823TCA', '057EFS', '759CRI', '401KRY', '042LAD', '502SFJ', '427UBT', 
'322UTU', '047IYT', '053PHC', '819FTL', '431FMH', '784ITU', '093GFK', 
'815PXN', '224VUQ', '251NLF', '874SPB', '053CFW', '512LTV', '716XGW', 
'516FDE', '702RGE', '401WHA', '025ITJ', 'DCL_034', '583AEK', 'ERROR', 
'478LZR', '132ZJL', '534OUT', '572QXO', '434TPX', '966QUX', '517DBK', 
'001ITJ', '381HXF', '034EPI', '729DSI', '247UUV', '593YMN', '181RFD', 
'619EMC', '441IDT'}

There are 58 cases.

If we now want to see the entire rows where the problem occurs, we can use these idsto filter the entire dataframe (along with the condition that fechaFinit is NaN):

problemas = df[df.idDCL.isin(ids) & df.fechaFin.isna()]

This is the dataframe you were looking for. You can dump it to csv or do whatever you want with it. For example, let's see how it starts (ordering by the idDCL column so that those of the same idDCL come out together):

>>> print(problemas.sort_values(by="idDCL").head(10))

         idCDU   idDCL  ...      fechaActivacion fechaFin
8014   HXI-850  001ITJ  ...  2021-10-15 20:36:27      NaN
20752  UNI-390  001ITJ  ...  2021-05-12 13:57:15      NaN
11292  LEC-358  025ITJ  ...                  NaN      NaN
9606   JLO-450  025ITJ  ...  2021-09-09 19:24:27      NaN
16210  QAJ-307  034EPI  ...  2021-11-12 14:31:23      NaN
18341  SCW-028  034EPI  ...  2021-05-13 10:35:23      NaN
24676  YKX-045  042LAD  ...  2021-06-30 15:57:36      NaN
7860   HTI-370  042LAD  ...  2021-09-01 15:24:48      NaN
22426  WEA-371  047IYT  ...   2021-05-19 7:47:40      NaN
11988  LWK-060  047IYT  ...                  NaN      NaN

As you can see, a for loop has been avoided. Let's not say two nested for loops, which would have a complexity O(n^2) that in a dataframe as big as this would mean several seconds of processing (my solution ends in less than 1s)

Nested For to compare the records of a dataframe

The data

The problem posed

The solution

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?