What is a promise in Javascript?

Question

Asked: 2022-07-25 11:38:13 +0800 CST 2022-07-25 11:38:13 +0800 CST 2022-07-25 11:38:13 +0800 CST

Cumulative Sales of a Past Period

772

I have the following code that starts like this:

# Import Libraies
import numpy as np 
import pandas as pd
import datetime as dt

#Montarte a Drive
from google.colab import drive
drive.mount('/content/drive')

ruta = '/content/drive/MyDrive/example.csv'
df = pd.read_csv(ruta)
df.head(10)

The file that I imported can be downloaded from here: Data

And it looks like this:

Then what I do is group the values and then create a metric called Rolling Year (RY_ACTUAL) and (RY_LAST), these help me to know the sales of each category, for example the Blue category, twelve months ago. This metric works fine:

# ROLLING YEAR
# I want to make a Roling Year for each category. Thats mean how much sell each category since 12 moths ago TO current month

# RY_ACTUAL One year have 12 months so I pass as parameter in the rolling 12
f = lambda x:x.rolling(12).sum()
df_group["RY_ACTUAL"]  = df_group.groupby(["CATEGORY"])['Sales'].apply(f)

# RY_24 I create a rolling with 24 as parameter to compare actual RY vs last RY
f_1 = lambda x:x.rolling(24).sum()
df_group["RY_24"]  = df_group.groupby(["CATEGORY"])['Sales'].apply(f_1)

#RY_LAST Substract RY_24 - RY_Actual to get the correct amount. Thats mean amount of RY vs the amount of RY-1
df_group["RY_LAST"]  = df_group["RY_24"] - df_group["RY_ACTUAL"]

My problem is in the metric called Year To Date, which is nothing more than the accumulated sales of each category from JANUARY to the month where you read the table, in ejemplocase I stop in March 2015, to know how much each category of Enero a Marzo. The column I created called YTD_CURRENT does just that for me and I achieve it like this:

# YTD_ACTUAL
df_group['YTD_ACTUAL'] = df_group.groupby(["CATEGORY","DATE"]).Sales.cumsum()

However, what I have not been able to do is the column YTD_LAST, that is to say from the past period, that following the previous example where I was standing in March 2015, suppose in the blue category, it should return to me how much was the accumulated sales for the blue category of JANUARY to MARCH but from 2014.

My try >.<

#YTD_LAST
df_group['YTD_LAST'] = df_group.groupby(["CATEGORY", "DATE"]).Sales.apply(f)

Could someone help me to make this column correctly?

Thank you in advance, community!

2 Answers

Voted

HeytalePazguato · Answer 1 · 2022-08-03T05:07:19+08:00

Good day,

It was a good exercise to solve your question

First of all, it seems to me that your calculation YTD_ACTUALis not entirely correct, I did it as you put it in the question but it did not work for me (Calculate the total accumulated by category regardless of the year), what I did to calculate the sum accumulated by category per year was as follows:

df_group['YTD_ACTUAL'] = df_group.groupby(['CATEGORY', df_group['DATE'].dt.year]).Sales.cumsum()

It is important to group by category and by the year of your date ( df_group['DATE'].dt.year), otherwise the accumulated sum is not calculated correctly

Now, to calculate the YTD_LASTyou have to do a shift()but you have to be careful to find the correct category and the correct month so that when you move the values they are positioned in the correct row

For that you have to group by category and by month ( df['DATE'].dt.month) and then move the values withshift()

df_group['YTD_LAST'] = df_group.groupby(['CATEGORY', df['DATE'].dt.month])['YTD_ACTUAL'].shift()

Edition:

After reading your comment I checked the results and it works correctly, I attach an image. Maybe there are other formulas in the process to get your values that are not written in your question

I attach the complete code that I made for the tests

# Import Libraies
import numpy as np 
import pandas as pd
import datetime as dt

df = pd.read_csv('example.csv')
df['DATE'] = pd.to_datetime(df['DATE'], format='%d/%m/%Y %H:%M') 
df_group = df
# ROLLING YEAR
# I want to make a Roling Year for each category. Thats mean how much sell each category since 12 moths ago TO current month

# RY_ACTUAL One year have 12 months so I pass as parameter in the rolling 12
f = lambda x:x.rolling(12).sum()
df_group["RY_ACTUAL"]  = df_group.groupby(["CATEGORY"])['Sales'].apply(f)

# RY_24 I create a rolling with 24 as parameter to compare actual RY vs last RY
f_1 = lambda x:x.rolling(24).sum()
df_group["RY_24"]  = df_group.groupby(["CATEGORY"])['Sales'].apply(f_1)

#RY_LAST Substract RY_24 - RY_Actual to get the correct amount. Thats mean amount of RY vs the amount of RY-1
df_group["RY_LAST"]  = df_group["RY_24"] - df_group["RY_ACTUAL"]
# YTD_ACTUAL
df_group['YTD_ACTUAL'] = df_group.groupby(['CATEGORY', df_group['DATE'].dt.year]).Sales.cumsum()
#YTD_LAST
df_group['YTD_LAST'] = df_group.groupby(['CATEGORY', df['DATE'].dt.month])['YTD_ACTUAL'].shift()

Ricardo J. Martínez Suástegui · Answer 2 · 2022-08-03T11:36:03+08:00

Good day,

First of all, many thanks to the person who took the time to understand this exercise, I think no one else did, so I will accept your answer as the correct one.

However, I also publish my answer, which after many head stops can be achieved because there is something that your code does not do.

Let's go in parts, it's true, as there are gaps between dates and for the function to work correctly shiftI made the following df and joined it with a merge:

d = pd.date_range(start="2015-01-01",end="2022-01-01", freq='MS')
dates = pd.DataFrame({"DATE":d})
df_merge = pd.merge(dates, df, how='outer', on='DATE')
df_merge.head(5)

Then what I did to obtain the required column YTD_LASTwas a much longer and more complex procedure than the previous answer:

# # YEAR TO DATE
df_merge["YEAR"] = pd.to_datetime(df_merge["DATE"], format = '%Y-%m-%d').dt.year 

df_merge = df_merge.sort_values(by=["DATE","YEAR","MONTH"], ascending = True)

# # YTD_ACTUAL

df_merge['YTD_ACTUAL'] = df_merge.groupby(["YEAR","CATEGORY"]).Sales.cumsum()

#YTD_LAST
allDataframes = []
for cat in df_merge['CATEGORY'].unique():
  print(cat)
  fil_cat = df_merge['CATEGORY'] == cat
  cate = df_merge[fil_cat]
  cate["YTD_LAST"] = cate.YTD_ACTUAL.shift(12)
  df = cate.copy()
  allDataframes.append(df)

allDataframes[0]
for i in range(len(df_merge['CATEGORY'].unique())):
  print(i)
  if i == 0:
    cate_fin = pd.concat([allDataframes[i], allDataframes[i+1]], axis = 0)
  cate_fin = pd.concat([cate_fin,allDataframes[i]], axis = 0)

cate_fin.head(50)

Because for my problem it was necessary to do all this procedure and not only as the previous answer has it, because what I needed for the column YTD_LASTwas to compare the accumulated of a certain year and specific month, Let's suppose Diciembre 2015vs the accumulated of the stopped year for that same period, ie Diciembre 2014and that's just what I get with the final dataframe cate_fin:

Again, many thanks to @HeytalePazguato for stopping to read and attack the case, bravo! and I think that your solution in another similar problem can be useful, but what this solution does is give me the accumulated sales of the previous period, that is, if I am again in Diciembre 2015, what is given to me in the column YTD_LASTis what was in Noviembre 2015for each category:

Cumulative Sales of a Past Period

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?