What is a promise in Javascript?

Question

Asked: 2020-07-20 00:14:21 +0800 CST 2020-07-20 00:14:21 +0800 CST 2020-07-20 00:14:21 +0800 CST

Read a .json from python

772

I have a .json with the following structure:

[
  {
    "Country": "Spain",
    "Age": "14"
  },
  {
    "Country": "China",
    "Age": "16"
  },
]

I try to read it with the following method:

import json
from pprint import pprint

with open('json.json') as f:
    data = json.load(f)

pprint(data)

but it throws me the following error:

ValueError: No JSON object could be decoded

The JSON is returned to me by the Octoparse software, so I don't think it's malformed.

How to store the values in a json local to my script?

I would like it to have the following format:

{"14":"Spain","16":"China"}

Thanks.

1 Answers

Voted

abulafia · Answer 1 · 2020-07-20T01:36:51+08:00

Diagnosis

Although the JSON pasted into the question is correct (except for a copying error that left an extra trailing comma), when the user tries the same operations on their own JSON, they get the error ValueError, which is not very informative .

After some conversations with the user, I get the json file that he is really working with, and I try to replicate the execution of his code with Python2 (which is the version that the user uses), and sure enough, although the supplied JSON looks correct, I get the error:

ValueError: No JSON object could be decoded

If, instead, I repeat the execution using Python3, the diagnosis is much more precise and confirms my suspicions that there are hidden characters at the beginning of the file that are causing the problems:

json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)

The problem

The file initially contains a series of bytes called "BOM" (Byte Order Mark) that are invisible when displayed on the screen or loaded in an editor, but not when read from a program.

The purpose of those bytes, if the file were in UTF-16, is to allow programs that read it to deduce the endianity of the architecture on which the file was generated (that is, whether it is little endian or big endian ). However, in a UTF-8 file it makes no sense to introduce these bytes because the UTF-8 format is immune to the endianity problem .

However, many editors and Windows programs still insert these bytes when saving to UTF-8, and this is apparently not compatible with the JSON standard.

Solution

Using python3 it is possible to pass open()a parameter that specifies the encoding of the file to read (if not passed, assume utf-8). In this case, it would have to be utf-8-sigpassed, as Python3 itself is telling us in its error message.

However, since the user uses Python2, he does not have the possibility to pass that parameter when opening the file, so we have no choice but to read the entire file to a byte string, and then encode that string to Unicode, using the format in question. Later we will use json.loads()instead of json.load(), since this way we can pass the correctly decoded unicode string instead of the file.

Namely:

import json
with open("json.json") as f:
  raw_data = f.read()
data = json.loads(raw_data.decode("utf-8-sig"))

This solution occupies more memory than Python3's, since we have to load the entire file before parsing the json, while in python3 it would be parsed as it is read, but since the file is not very large (61K) No problem.

Read a .json from python

Diagnosis

The problem

Solution

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?