What is a promise in Javascript?

Question

Asked: 2020-02-21 12:45:29 +0800 CST 2020-02-21 12:45:29 +0800 CST 2020-02-21 12:45:29 +0800 CST

Extract twitter account description (not twitters or twitter numbers) Scraping

772

I want to extract the text that gives a description of a Twitter account, but I can't access it, the only thing I've managed to do is extract the number of followers, but I can't figure out how to get to the text that describes the Twitter account.

It prints the number of followers, but I can't print the description.

Code:

import requests
from bs4 import BeautifulSoup

USER_AGENT = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}

def obtener_resultados(termino_busqueda, numero_resultados, codigo_lenguaje):
    url_google = 'https://www.google.com/search?q={}&num={}&hl={}'.format(termino_busqueda, numero_resultados, codigo_lenguaje)
    respuesta = requests.get(url_google, headers=USER_AGENT)
    respuesta.raise_for_status()
    return termino_busqueda, respuesta.text

def procesar_resultados(html, palabra):
    soup = BeautifulSoup(html, 'html.parser')
    resultados_encontrados = []
    bloque = soup.find_all("div", class_="g")
    for resultado in bloque:
        titulo = resultado.find('h3').string
        resultados_encontrados.append(titulo)
    return resultados_encontrados

def scrap(termino_busqueda, numero_resultados, codigo_lenguaje):
    palabra, html = obtener_resultados(termino_busqueda, numero_resultados, codigo_lenguaje)
    resultados = procesar_resultados(html, palabra)
    return resultados

if __name__ == '__main__':
    palabra = 'Quantika14'
    h5 = (palabra, 1, "es")
h6 = (h5[0])



username=h6
url = 'https://www.twitter.com/'+username
r = requests.get(url)
soup = BeautifulSoup(r.content,'html.parser')



f = soup.find('li', class_="ProfileNav-item--followers")
title = f.find('a')['title']
print (title)

g=soup.find_all('title', limit=1)



h = soup.find('data-testid', {'UserDescription': 'textContent'})



title2 =g
print (title2)
title3=h
print(title3)

This is what comes out when I do the "Copy Selector" option in the inspector

#react-root > div > div > div > main > div > div > div > div.css-1dbjc4n.r-14lw9ot.r-1tlfku8.r-1ljd8xs.r-13l2t4g.r-1phboty.r-1jgb5lz.r-11wrixw.r-61z16t.r-1ye8kvj.r-13qz1uu.r-184en5c > div > div:nth-child(2) > div > div > div:nth-child(2) > div > div > div:nth-child(1) > div > div:nth-child(3) > div > span

and this is what comes out when I do in the inspector copy outerHTML

<span class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0">Damos soluciones a través de nuestras aplicaciones y peritajes informáticos. Encontramos a los autores de crímenes usando nuevas tecnologías.</span>

With this data, on the Twitter page, I can't extract the Description using BeautifulSoup, the most I can do is None.

2 Answers

Voted

Alberto Bedmar Montaño · Answer 1 · 2020-02-23T14:33:10+08:00

I already have the solution, I have had negative votes due to little research, but I have achieved the answer, so I do not see a justified negative score.

import requests
from bs4 import BeautifulSoup


USER_AGENT = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}

def obtener_resultados(termino_busqueda, numero_resultados, codigo_lenguaje):
    url_google = 'https://www.google.com/search?q={}&num={}&hl={}'.format(termino_busqueda, numero_resultados, codigo_lenguaje)
    respuesta = requests.get(url_google, headers=USER_AGENT)
    respuesta.raise_for_status()
    return termino_busqueda, respuesta.text

def procesar_resultados(html, palabra):
    soup = BeautifulSoup(html, 'html.parser')
    resultados_encontrados = []
    bloque = soup.find_all("div", class_="g")
    for resultado in bloque:
        titulo = resultado.find('h3').string
        resultados_encontrados.append(titulo)
    return resultados_encontrados

def scrap(termino_busqueda, numero_resultados, codigo_lenguaje):
    palabra, html = obtener_resultados(termino_busqueda, numero_resultados, codigo_lenguaje)
    resultados = procesar_resultados(html, palabra)
    return resultados

if __name__ == '__main__':
    palabra = 'Quantika14'
    h5 = (palabra, 1, "es")
h6 = (h5[0])

username=h6
url = 'https://www.twitter.com/'+username
r = requests.get(url)
soup = BeautifulSoup(r.content,'html.parser')



f = soup.find('li', class_="ProfileNav-item--followers")
title = f.find('a')['title']
print (title)

g=soup.find_all('title', limit=1)
h = soup.select('.bio',limit=1)



title2 =g
print (title2)
title3=h
print(title3)

hispasofttv · Answer 2 · 2020-02-22T03:16:46+08:00

The most reliable option is to use the Twitter API, although it is true that it has a lot of information, it can be easily accessed using the pandas library.

We did the following example in a BigData master's degree analyzing word by word in detail

Starting from the example that you can get the output of tweets through the api or load it dynamically, try this

import pandas as pd

df = pd.read_json("salida_tweets.txt", lines=True)

df.head(20)  // muestra primeras 20 lineas
df.columns   // obtiene los nombres de columnas

Result:

Index(['contributors', 'coordinates', 'created_at', 'delete', 'entities',
       'extended_entities', 'favorite_count', 'favorited', 'filter_level',
       'geo', 'id', 'id_str', 'in_reply_to_screen_name',
       'in_reply_to_status_id', 'in_reply_to_status_id_str',
       'in_reply_to_user_id', 'in_reply_to_user_id_str', 'lang', 'place',
       'possibly_sensitive', 'retweet_count', 'retweeted', 'retweeted_status',
       'source', 'text', 'timestamp_ms', 'truncated', 'user'],
      dtype='object')

We get only the text column of the tweets limiting to 30 rows

df["text"].head(30)

and if you wish you can separate word by word from the text

    import json
    tweets = []

    tw = open("salida_tweets.txt")
for linea in tw:
    tweet_en_bruto = json.loads(linea)
    if 'text' in tweet_en_bruto.keys():
        tweets.append(tweet_en_bruto['text'].lower().split(" "))

As you can see in the previous code, several functions have been applied, in each tweet, we have chosen only the 'text' column, we have broken it down with split(" "), in words to be able to search one by one in the feelings file . Subsequently, we've forced them all to be lowercase, since if we encounter these words "Hellow" and "hellow", in the tweets, we now show the Tweets broken down into spaces in a list.

tweets[0:20]

Try it and tell me what it is, I hope it will be useful to you

Extract twitter account description (not twitters or twitter numbers) Scraping

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?