What is a promise in Javascript?

Question

Asked: 2022-04-23 07:14:05 +0800 CST 2022-04-23 07:14:05 +0800 CST 2022-04-23 07:14:05 +0800 CST

How can you get the information of a class div, when there are more than two with the same name?

772

I want to get information from a website (saved in a class div) and there are many class div with that name

At the moment this is the code

from bs4 import BeautifulSoup
import requests

r = requests.get("https://fortnitetracker.com/profile/all/d1ego-fraggerツ/competitive")
soup = BeautifulSoup(r.content, "lxml")

puntos = soup.find("div", class_="trn-defstat__value").text
print(puntos.strip())

my mistake is

But it doesn't print anything as there are more than two class div with the same name in the whole website

I'm working on a project, and I've been paused due to this problem, I'd really appreciate your help

1 Answers

Voted

abulafia · Answer 1 · 2022-04-23T12:41:53+08:00

The problem you have is not that there are several divwith the same name. That is not a big problem, because if you use instead of find()you find_all()will get a list with each of those div. The problem is that none of those divhave the values you're interested in seeing:

from bs4 import BeautifulSoup
import requests

r = requests.get("https://fortnitetracker.com/profile/all/d1ego-fraggerツ/competitive")
soup = BeautifulSoup(r.content, "lxml")

divs = soup.find_all("div", class_="trn-defstat__value")
for div in divs:
    print(div.text)

If you run that, it comes out all blank, because none of the divs contain any text. They all contain things like this:

<div class="trn-defstat__value">
{{ getStat(activeStats['all'], 'TRNRating').displayValue }}
<span class="trn-defstat__value-label" :style="{ color: getStat(activeStats['all'], 'TRNRating').metadata.color }">{{ getStat(activeStats['all'], 'TRNRating').metadata.description }}</span>
</div>

which looks like a template in which a javascript engine inside the browser will fill in the data it will get from a call to an API on the server.

Why is this happening?

This is what I was trying to explain to you in the comments. The page you are getting from the server is somehow incomplete . It contains scripts written in JavaScript that the browser will execute once the page loads, and those scripts will continue making requests and filling the HTML with the results, in the places marked in that template .

From python you can't perform those actions because python doesn't have a javascript interpreter. The general solution to this problem is to use Selenium with Python, which is a library that starts a real browser (for example Firefox or Chrome), and from Python it orders it to load the page, and when it is fully loaded and the javascript has been executed in the browser, from Python it will be possible to "manage by remote control" the browser, to request the resulting html, or to scrape it.

This solution is general and would work for any page, but it is very complex, it requires installing specific drivers for your browser, and studying how the Selenium library is handled.

Luckily sometimes there are simpler alternatives...

Is there a simpler alternative?

In this case, yes! If you use your web browser's developer tools and inspect the network panel, you'll see that there are lots of requests that the browser makes to complete the view that is shown to the user. Many of these requests are to download images, or fonts, or CSS styles, or javascript code... but one of them is to download the player's statistics that will later be used to complete the template . The request is to the following URL:

https://fortnitetracker.com/api/v0/profile/3df75048-eb63-41e0-95d2-753db1dc28a3/stats?season=16&isCompetitive=true

and the response is a very long JSON with all kinds of statistics.

We see that this URL is an API call, as I had assumed, and that the player's profile is a hexadecimal string, 3df75048-eb63-41e0-95d2-753db1dc28a3in this case.

To make things more generic, we can extract that profile identifier from the downloaded HTML page itself in the first place, since it appears as part of it in this part:

  "playerInfo": {
    "accountId": "3df75048-eb63-41e0-95d2-753db1dc28a3",
    "playerName": "d1ego-fraggerツ",
    "displayName": "d1ego-fraggerツ",
    "platformSlug": "epic",
    "externalAccounts": {},
    "socialAccounts": {},
    "isPremium": false,
    "avatarUrl": "https://trackercdn.com/legacycdn/fortnite/ABED988_small.png",
    "epicVerified": false
  },

So that:

We are going to download the original HTML, which is not very useful to us because the DIV does not contain the information we are looking for, but it will help us to obtain it accountIdthrough a regular expression
With that accountIdwe make another GET request to the API and download the json with all the statistics.
We study the structure of that JSON to try to extract the information of interest.

We don't need BeautifulSoup for any of this.

# 1. Descargar HTML y buscar accountID
import requests
import re

r = requests.get("https://fortnitetracker.com/profile/all/d1ego-fraggerツ/competitive")
uid = re.search(r'"accountId": "([0-9a-f-]+)"', r.text).group(1)

# 2. Acceder a la API y obtener JSON con estadísticas
r = requests.get(f"https://fortnitetracker.com/api/v0/profile/{uid}/stats?season=16&isCompetitive=true")
data = r.json()

The JSON is pretty huge. I have discovered that it is a list with three dictionaries. Each dictionary corresponds to a platform, the first being for "kbm", the second for "gamepad" and the third for "None". Apparently the one shown on the website is the one that corresponds to the third party.

Each one of these dictionaries has a field "stats"with the keys 'trios', 'solo', 'duos', 'all'The one shown by the Web would be 'all'.

Thus, the statistics we are looking for would come from:

stats = data[2]["stats"]["all"]

This is itself a list of dictionaries, all with the same structure. For example:

>>> stats[0]
{'displayValue': '75,663',
 'metadata': {'categoryKey': 'general',
  'categoryName': 'General',
  'isReversed': False,
  'key': 'Score',
  'name': 'Score'},
 'percentile': 0.0,
 'value': 75663.0}

We see that it contains the name of the statistic (Score) and its value (75663.0), and also the way it should be displayed on the web, its displayValue('75,663').

We can iterate through all of these dictionaries to display their name and value:

for e in stats:
  print(f"{e['metadata']['name']:>16s}: {e['displayValue']}")

And what we see is:

           Score: 75,663
            Wins: 6
           Top 3: 16
           Top 5: 41
           Top 6: 37
          Top 10: 7
          Top 12: 103
          Top 25: 19
             K/d: 0.41
           Win %: 1.50
         Matches: 389
           Kills: 158
     Time Played: 2d 21h 24m 
       Kills/Min: 0.04
     Kills/Match: 0.41
 Avg. Match Time: 10m 42s
     Score/Match: 194.51
       Score/Min: 18.17
      Top 3/5/10: 64
     Top 6/12/25: 159

How can you get the information of a class div, when there are more than two with the same name?

Why is this happening?

Is there a simpler alternative?

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?