I have a df with the rating, title and gender of those who have rated a movie (there is much more data, the movies are repeated because there are many user ratings, I have simplified it to make it more visible)
df
Rating Title Gender
0 5 One Flew Over the Cuckoo's Nest (1975) F
1 3 James and the Giant Peach (1996) M
2 3 My Fair Lady (1964) F
3 4 Erin Brockovich (2000) F
4 5 Bug's Life, A (1998) M
5 3 One Flew Over the Cuckoo's Nest (1975) M
... ... ... ...
I must make a function that returns a df with the average user ratings per movie and separated by gender, that is, it should be something like this (with df structure or not, like this example):
media_valoraciones():
"Calcula la puntuación media de cada película por sexo del usuario"
media_por_sexo Media_mujer Media_hombre
titulo
One Flew Over the Cuckoo's Nest (1975) 3.375000 2.761905
James and the Giant Peach (1996) 3.388889 3.352941
My Fair Lady (1964) 2.675676 2.733333
... ... ...
I'm trying to make loops with for
this style:
for i in df.Title.unique():
df[df.Title == i].Rating.sum()/len(df[df.Title == i])
I would read it as: for each element of the unique values of the titles, make a filter in which for each one, you add the ratings and divide it by the number of ratings, it returns nothing. I don't know if I'm going the right way.
I should also do the gender filter, for this I intend to do the gender filter first and then apply the above loop, for the gender filter I have no problem.
You don't need a loop, use
pandas.DataFrame.groupby
, group by title and genre, apply the mean to each group and unstack to get the multiindex to columns:A complete example: