I want to extract the text that gives a description of a Twitter account, but I can't access it, the only thing I've managed to do is extract the number of followers, but I can't figure out how to get to the text that describes the Twitter account.
It prints the number of followers, but I can't print the description.
Code:
import requests
from bs4 import BeautifulSoup
USER_AGENT = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
def obtener_resultados(termino_busqueda, numero_resultados, codigo_lenguaje):
url_google = 'https://www.google.com/search?q={}&num={}&hl={}'.format(termino_busqueda, numero_resultados, codigo_lenguaje)
respuesta = requests.get(url_google, headers=USER_AGENT)
respuesta.raise_for_status()
return termino_busqueda, respuesta.text
def procesar_resultados(html, palabra):
soup = BeautifulSoup(html, 'html.parser')
resultados_encontrados = []
bloque = soup.find_all("div", class_="g")
for resultado in bloque:
titulo = resultado.find('h3').string
resultados_encontrados.append(titulo)
return resultados_encontrados
def scrap(termino_busqueda, numero_resultados, codigo_lenguaje):
palabra, html = obtener_resultados(termino_busqueda, numero_resultados, codigo_lenguaje)
resultados = procesar_resultados(html, palabra)
return resultados
if __name__ == '__main__':
palabra = 'Quantika14'
h5 = (palabra, 1, "es")
h6 = (h5[0])
username=h6
url = 'https://www.twitter.com/'+username
r = requests.get(url)
soup = BeautifulSoup(r.content,'html.parser')
f = soup.find('li', class_="ProfileNav-item--followers")
title = f.find('a')['title']
print (title)
g=soup.find_all('title', limit=1)
h = soup.find('data-testid', {'UserDescription': 'textContent'})
title2 =g
print (title2)
title3=h
print(title3)
This is what comes out when I do the "Copy Selector" option in the inspector
#react-root > div > div > div > main > div > div > div > div.css-1dbjc4n.r-14lw9ot.r-1tlfku8.r-1ljd8xs.r-13l2t4g.r-1phboty.r-1jgb5lz.r-11wrixw.r-61z16t.r-1ye8kvj.r-13qz1uu.r-184en5c > div > div:nth-child(2) > div > div > div:nth-child(2) > div > div > div:nth-child(1) > div > div:nth-child(3) > div > span
and this is what comes out when I do in the inspector copy outerHTML
<span class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0">Damos soluciones a través de nuestras aplicaciones y peritajes informáticos. Encontramos a los autores de crímenes usando nuevas tecnologías.</span>
With this data, on the Twitter page, I can't extract the Description using BeautifulSoup, the most I can do is None.
I already have the solution, I have had negative votes due to little research, but I have achieved the answer, so I do not see a justified negative score.
The most reliable option is to use the Twitter API, although it is true that it has a lot of information, it can be easily accessed using the pandas library.
We did the following example in a BigData master's degree analyzing word by word in detail
Starting from the example that you can get the output of tweets through the api or load it dynamically, try this
Result:
We get only the text column of the tweets limiting to 30 rows
and if you wish you can separate word by word from the text
As you can see in the previous code, several functions have been applied, in each tweet, we have chosen only the 'text' column, we have broken it down with split(" "), in words to be able to search one by one in the feelings file . Subsequently, we've forced them all to be lowercase, since if we encounter these words "Hellow" and "hellow", in the tweets, we now show the Tweets broken down into spaces in a list.
Try it and tell me what it is, I hope it will be useful to you