I am developing a sentiment analysis of tweets using MapReduce and MrJob. The calculation of sentiments by geographical area that I want to analyze works correctly, but at the same time that I calculate the score of an area, I want to count the tweets published for those areas. Briefly, the functions are as follows:
Mapper function example:
def tweets_mapper(self, _, line):
weights = self.Dictionary("/Data/Redondo_words.csv")
try:
jsonLine = json.loads(line)
place = jsonLine["place"].get('country').encode('utf8')
place = place.encode('utf-8')
text = jsonLine["text"]
score = self.tweet_Score(text, weights)
yield (place, score)
except:
pass
Reducer function example:
def reducer(self, word, value):
yield word, sum(value)
Example steps function:
def steps(self):
return [MRStep(mapper=self.tweets_mapper,
reducer=self.tweets_reducer_scores)
]
With this mapreduce structure, I already correctly execute the scores by geographical zone, if I want to count tweets by geographical zone I only change the following line of the mapper:
yield (place, 1)
The problem is that I need both results at the same time and I can't include them in the same execution.
I have tried to create some mapper and reducer functions for the sentiment scores by geographical area and another mapper and reducer function to count the tweets by geographical area, for which I modify the steps function like this:
def steps(self):
return [MRStep(mapper=self.tweets_mapper_scores,
reducer=self.tweets_reducer_scores),
MRStep(mapper=self.tweets_mapper_counts,
reducer=self.tweets_reducer_counts)
]
When I execute the code only for the scores it works, if I execute it only for the counts it works, but when I try to perform the two calculations at the same time, it stops giving results.
Can you tell me how to execute several mapReduce processes with MrJob using yield in the same class?
As a work-around to solve what I was looking for (calculate the score per area and count the tweets published for those areas) and obtain the average per area. I found the following option with a single mapper and a single reducer like this: