I want to get information from a website (saved in a class div) and there are many class div with that name
At the moment this is the code
from bs4 import BeautifulSoup
import requests
r = requests.get("https://fortnitetracker.com/profile/all/d1ego-fraggerツ/competitive")
soup = BeautifulSoup(r.content, "lxml")
puntos = soup.find("div", class_="trn-defstat__value").text
print(puntos.strip())
my mistake is
But it doesn't print anything as there are more than two class div with the same name in the whole website
I'm working on a project, and I've been paused due to this problem, I'd really appreciate your help
The problem you have is not that there are several
div
with the same name. That is not a big problem, because if you use instead offind()
youfind_all()
will get a list with each of those div. The problem is that none of thosediv
have the values you're interested in seeing:If you run that, it comes out all blank, because none of the divs contain any text. They all contain things like this:
which looks like a template in which a javascript engine inside the browser will fill in the data it will get from a call to an API on the server.
Why is this happening?
This is what I was trying to explain to you in the comments. The page you are getting from the server is somehow incomplete . It contains scripts written in JavaScript that the browser will execute once the page loads, and those scripts will continue making requests and filling the HTML with the results, in the places marked in that template .
From python you can't perform those actions because python doesn't have a javascript interpreter. The general solution to this problem is to use Selenium with Python, which is a library that starts a real browser (for example Firefox or Chrome), and from Python it orders it to load the page, and when it is fully loaded and the javascript has been executed in the browser, from Python it will be possible to "manage by remote control" the browser, to request the resulting html, or to scrape it.
This solution is general and would work for any page, but it is very complex, it requires installing specific drivers for your browser, and studying how the Selenium library is handled.
Luckily sometimes there are simpler alternatives...
Is there a simpler alternative?
In this case, yes! If you use your web browser's developer tools and inspect the network panel, you'll see that there are lots of requests that the browser makes to complete the view that is shown to the user. Many of these requests are to download images, or fonts, or CSS styles, or javascript code... but one of them is to download the player's statistics that will later be used to complete the template . The request is to the following URL:
and the response is a very long JSON with all kinds of statistics.
We see that this URL is an API call, as I had assumed, and that the player's profile is a hexadecimal string,
3df75048-eb63-41e0-95d2-753db1dc28a3
in this case.To make things more generic, we can extract that profile identifier from the downloaded HTML page itself in the first place, since it appears as part of it in this part:
So that:
accountId
through a regular expressionaccountId
we make another GET request to the API and download the json with all the statistics.We don't need BeautifulSoup for any of this.
The JSON is pretty huge. I have discovered that it is a list with three dictionaries. Each dictionary corresponds to a platform, the first being for "kbm", the second for "gamepad" and the third for "None". Apparently the one shown on the website is the one that corresponds to the third party.
Each one of these dictionaries has a field
"stats"
with the keys'trios', 'solo', 'duos', 'all'
The one shown by the Web would be'all'
.Thus, the statistics we are looking for would come from:
This is itself a list of dictionaries, all with the same structure. For example:
We see that it contains the name of the statistic (Score) and its value (75663.0), and also the way it should be displayed on the web, its
displayValue
('75,663').We can iterate through all of these dictionaries to display their name and value:
And what we see is: