I have the following code that starts like this:
# Import Libraies
import numpy as np
import pandas as pd
import datetime as dt
#Montarte a Drive
from google.colab import drive
drive.mount('/content/drive')
ruta = '/content/drive/MyDrive/example.csv'
df = pd.read_csv(ruta)
df.head(10)
The file that I imported can be downloaded from here: Data
And it looks like this:
Then what I do is group the values and then create a metric called Rolling Year (RY_ACTUAL) and (RY_LAST), these help me to know the sales of each category, for example the Blue category, twelve months ago. This metric works fine:
# ROLLING YEAR
# I want to make a Roling Year for each category. Thats mean how much sell each category since 12 moths ago TO current month
# RY_ACTUAL One year have 12 months so I pass as parameter in the rolling 12
f = lambda x:x.rolling(12).sum()
df_group["RY_ACTUAL"] = df_group.groupby(["CATEGORY"])['Sales'].apply(f)
# RY_24 I create a rolling with 24 as parameter to compare actual RY vs last RY
f_1 = lambda x:x.rolling(24).sum()
df_group["RY_24"] = df_group.groupby(["CATEGORY"])['Sales'].apply(f_1)
#RY_LAST Substract RY_24 - RY_Actual to get the correct amount. Thats mean amount of RY vs the amount of RY-1
df_group["RY_LAST"] = df_group["RY_24"] - df_group["RY_ACTUAL"]
My problem is in the metric called Year To Date
, which is nothing more than the accumulated sales of each category from JANUARY to the month where you read the table, in ejemplo
case I stop in March 2015, to know how much each category of Enero a Marzo
. The column I created called YTD_CURRENT does just that for me and I achieve it like this:
# YTD_ACTUAL
df_group['YTD_ACTUAL'] = df_group.groupby(["CATEGORY","DATE"]).Sales.cumsum()
However, what I have not been able to do is the column YTD_LAST
, that is to say from the past period, that following the previous example where I was standing in March 2015, suppose in the blue category, it should return to me how much was the accumulated sales for the blue category of JANUARY to MARCH but from 2014.
My try >.<
#YTD_LAST
df_group['YTD_LAST'] = df_group.groupby(["CATEGORY", "DATE"]).Sales.apply(f)
Could someone help me to make this column correctly?
Thank you in advance, community!
Good day,
It was a good exercise to solve your question
First of all, it seems to me that your calculation
YTD_ACTUAL
is not entirely correct, I did it as you put it in the question but it did not work for me (Calculate the total accumulated by category regardless of the year), what I did to calculate the sum accumulated by category per year was as follows:It is important to group by category and by the year of your date (
df_group['DATE'].dt.year
), otherwise the accumulated sum is not calculated correctlyNow, to calculate the
YTD_LAST
you have to do ashift()
but you have to be careful to find the correct category and the correct month so that when you move the values they are positioned in the correct rowFor that you have to group by category and by month (
df['DATE'].dt.month
) and then move the values withshift()
Edition:
After reading your comment I checked the results and it works correctly, I attach an image. Maybe there are other formulas in the process to get your values that are not written in your question
I attach the complete code that I made for the tests
Good day,
First of all, many thanks to the person who took the time to understand this exercise, I think no one else did, so I will accept your answer as the correct one.
However, I also publish my answer, which after many head stops can be achieved because there is something that your code does not do.
Let's go in parts, it's true, as there are gaps between dates and for the function to work correctly
shift
I made the following df and joined it with amerge
:Then what I did to obtain the required column
YTD_LAST
was a much longer and more complex procedure than the previous answer:Because for my problem it was necessary to do all this procedure and not only as the previous answer has it, because what I needed for the column
YTD_LAST
was to compare the accumulated of a certain year and specific month, Let's supposeDiciembre 2015
vs the accumulated of the stopped year for that same period, ieDiciembre 2014
and that's just what I get with the final dataframecate_fin
:Again, many thanks to @HeytalePazguato for stopping to read and attack the case, bravo! and I think that your solution in another similar problem can be useful, but what this solution does is give me the accumulated sales of the previous period, that is, if I am again in
Diciembre 2015
, what is given to me in the columnYTD_LAST
is what was inNoviembre 2015
for each category: