I have an error opening a file in Python, the error is UnicodeEncodeError
.
The program opens and processes a file yaml
created in Windows, but I don't know what encoding it was created with. This issue occurs on both Ubuntu 16.10 and macOS Sierra.
The specific error is:
Traceback (most recent call last):
File "metas.py", line 75, in <module>
for meta in metas:
File "/usr/local/opt/pyenv/versions/metas/lib/python2.7/site-packages/yaml/__init__.py", line 80, in load_all
loader = Loader(stream)
File "/usr/local/opt/pyenv/versions/metas/lib/python2.7/site-packages/yaml/loader.py", line 34, in __init__
Reader.__init__(self, stream)
File "/usr/local/opt/pyenv/versions/metas/lib/python2.7/site-packages/yaml/reader.py", line 79, in __init__
self.determine_encoding()
File "/usr/local/opt/pyenv/versions/metas/lib/python2.7/site-packages/yaml/reader.py", line 135, in determine_encoding
self.update(1)
File "/usr/local/opt/pyenv/versions/metas/lib/python2.7/site-packages/yaml/reader.py", line 165, in update
exc.encoding, exc.reason)
yaml.reader.ReaderError: 'utf8' codec can't decode byte #xf3: invalid continuation byte
in "<string>", position 273
The file metas.py
is more or less like this:
# coding: utf-8
import yaml
from imp import reload
MIEMBRO = 'JMM'
VERSION = 1.0
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
IMPORTS = """# -*- coding: UTF-8 -*-
from django.conf import settings
from django.db import models
from core.models import Pipol, PUESTOS
from django.contrib.contenttypes.models import ContentType
from django import forms
from metas.models import Evidencia
from metas.models import subir_archivo
from metas.forms import FormEvidenciaBase
from django.contrib.auth import get_user_model
User = get_user_model()
"""
class Generador:
def __init__(self, meta):
self.miembro = meta['miembro']
self.id = meta['id']
self.nombre = meta['nombre']
self.repeticiones = meta['repeticiones']
self.campos = meta['campos']
def get_campos(self):
return self.campos
def get_meta(self):
return '%s%02d' % (self.miembro.upper(), self.id)
def get_model(self):
clase = """class %s(Evidencia):""" % self.get_meta()
for c in self.get_campos():
for k, v in c.iteritems():
blank = u'blank=True, null=True' if v[1] else ''
clase += u"\n %s = models.FileField('%s', \
upload_to=subir_archivo, %s)" % (k, v[0], blank)
clase += u"""\n
class Meta:
app_label = 'metas'
"""
return clase
def get_form(self):
clase = """class Formulario%s(FormEvidenciaBase):
class Meta:
model = %s\n\n""" % (self.get_meta(), self.get_meta())
return clase
if __name__ == '__main__':
file = '%s.yml' % MIEMBRO.lower()
metas = yaml.load_all(open(file).read())
print IMPORTS
for meta in metas:
m = Generador(meta)
print(m.get_model())
print(m.get_form())
The file yaml
is something like this:
miembro: vol
id: 1
nombre: u'3 propuestas OE'
repeticiones: 1
campos:
- correo: ['Correo Electrónico', false]
- oficio: ['Oficio de cumplimiento', false]
- propuestas: ['Propuestas', false]
---
miembro: vol
id: 2
nombre: u'Modelo operativo recepción paquetes'
repeticiones: 1
campos:
- correo: ['Correo Electrónico', false]
- oficio: ['Oficio de cumplimiento', false]
- modelo_operativo: ['modelo operativo', false]
- acuse_entrega: ['acuse', false]
- observaciones: ['observaciones', true]
The character #xf3
corresponds to the ó
.
Finally, the premises of my team are the following:
(metas) javier@toledano:Projects/metas_sdk ‹master*›$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
EDITION
When I do cat archivo.yml
it in the console, this is what it looks like:
miembro: VRL
id: 1
nombre: Entrega CECyRD
repeticiones: 12
campos:
- acta: ['Acta', false]
- oficio_acta: ['Oficio Acta', false]
- oficio_entrega: ['Oficio Entrega', false]
- correo: ['Correo', false]
- estad▒stico: ['Estad▒stico', false]
@ixi's comment about using
with open(filename, encoding="latin-1")
is the correct way for Python 3.x. Since you are using Python 2.x this is not possible since the built-in functionopen
does not accept the argumentencoding
in Python 2.x:Unlike Python 3.x:
There would be several possibilities but I think the simplest is to open it in binary and decode it using the correct encoding (ISO 8859-1 or Latin-1):
Creating a file called
archivo.yml
low codedLatin-1
and with the content you provide I get the following output:I think that is correct, although to be honest I have never driven
PyYaml
. The code works under Windows 10 and Kubuntu 16.10 with Python 2.7. For MACOS there should be no problem if there is not for Ubuntu in principle...Edit: As I mention in a comment below another solution is to make a copy of the file but encoded with UTF-8. It can be done with python or since we are on an Ubuntu system we can do the copy using the terminal. Located in the directory where we have the file:
this creates for us a copy of
archivo.yml
(Latin-1) calledarchivo2.yml
that uses UTF-8 and that we can use directly in our Python script. We can even run this command from our own python script using thesubprocess
.