I have a method that, thanks to Twython, saves the tweets to MongoDB as it is in my question Maintaining a mongodb with tweets that match a given tag
def getSearchTagTwitter(hashtag):
db = connexMongoDB()
t = loginTwython()
search = t.search(q=hashtag, count=100)
data = search['statuses']
try:
db.twittersearch.create_index('id_str')
for tweet in data:
try :
db.twittersearch.insert_one(tweet)
except :
db.twittersearch.update_one({"id_str": tweet['id_str']}, tweet)
except Exception:
print "Error al buscar hashtag"
time.sleep(60*15) #15 minutos
getSearchTagTwitter(hashtag)
I think it's not working correctly and I want to check if the value of id_str
is not echoed through the MongoDB shell and/or from Python. I've tried the following but it doesn't give me any results:
db.twittersearch.find({'id_str':{$in:["numerodeid_str"]}})
Edit: I simplify the question. From Python, how could I check if I don't have duplicates in a mongodb already created? I'm currently connecting with pymongo, and I can see that I've created the collection.
To write to your MongoDB collection you are using
id_str
:And when doing the query you are using the wrong field
str_id
(reverse ofid_str
):The correct way would be:
Unless, of course, it's just a typo or copy/paste error.
Update after editing
I have created a simple script to replicate your case using the hashtag
python
and getting only 10 tweets:And it doesn't give me any problems, if I do some test queries in the MongoDB console:
I think the problem is somewhere else, maybe there is something else in your code that we are missing.
I add another solution that I have found, which is to use update with the value upsert to True. This will overwrite if there are any duplicates and create a new record if there is none.