本文翻译自:How to fix: “UnicodeDecodeError: 'ascii' codec can't decode byte”

as3:~/ngokevin-site# nano content/blog/20140114_test-chinese.mkd
as3:~/ngokevin-site# wok
Traceback (most recent call last):
File "/usr/local/bin/wok", line 4, in
Engine()
File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 104, in init
self.load_pages()
File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 238, in load_pages
p = Page.from_file(os.path.join(root, f), self.options, self, renderer)
File "/usr/local/lib/python2.7/site-packages/wok/page.py", line 111, in from_file
page.meta['content'] = page.renderer.render(page.original)
File "/usr/local/lib/python2.7/site-packages/wok/renderers.py", line 46, in render
return markdown(plain, Markdown.plugins)
File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 419, in markdown
return md.convert(text)
File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 281, in convert
source = unicode(source)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 1: ordinal not in range(128). -- Note: Markdown only accepts unicode input!

How to fix it? 如何解决?

In some other python-based static blog apps, Chinese post can be published successfully. 在其他基于python的静态博客应用程序中,中文帖子可以成功发布。 Such as this app: http://github.com/vrypan/bucket3 . 像这个程序: http : //github.com/vrypan/bucket3 。 In my site http://bc3.brite.biz/ , Chinese post can be published successfully. 在我的网站http://bc3.brite.biz/中 ,中文帖子可以成功发布。


#1楼

参考:https://stackoom.com/question/1Qece/如何解决-UnicodeDecodeError-ascii-编解码器无法解码字节


#2楼

This is the classic "unicode issue". 这是经典的“ unicode问题”。 I believe that explaining this is beyond the scope of a StackOverflow answer to completely explain what is happening. 我相信解释这一点超出了StackOverflow答案的范围,无法完全解释正在发生的事情。

It is well explained here . 这里有很好的解释。

In very brief summary, you have passed something that is being interpreted as a string of bytes to something that needs to decode it into Unicode characters, but the default codec (ascii) is failing. 在简短的摘要中,您已将某些内容解释为字节字符串,并将其解码为Unicode字符,但是默认编解码器(ascii)失败了。

The presentation I pointed you to provides advice for avoiding this. 我为您指出的演示文稿提供了避免这种情况的建议。 Make your code a "unicode sandwich". 使您的代码为“ unicode三明治”。 In Python 2, the use of from __future__ import unicode_literals helps. 在Python 2中,使用from __future__ import unicode_literals帮助。

Update: how can the code be fixed: 更新:如何修复代码:

OK - in your variable "source" you have some bytes. 确定-在变量“源”中,您有一些字节。 It is not clear from your question how they got in there - maybe you read them from a web form? 从您的问题中不清楚它们是如何到达的-也许您是从网络表单中读取它们的? In any case, they are not encoded with ascii, but python is trying to convert them to unicode assuming that they are. 无论如何,它们都不是用ascii编码的,但是python会假设它们是ASCII并尝试将它们转换为unicode。 You need to explicitly tell it what the encoding is. 您需要明确告诉它编码是什么。 This means that you need to know what the encoding is! 这意味着您需要知道什么是编码! That is not always easy, and it depends entirely on where this string came from. 这并不总是那么容易,它完全取决于此字符串的来源。 You could experiment with some common encodings - for example UTF-8. 您可以尝试一些常见的编码-例如UTF-8。 You tell unicode() the encoding as a second parameter: 您将unicode()的编码作为第二个参数:

source = unicode(source, 'utf-8')

#3楼

Finally I got it: 终于我明白了:

as3:/usr/local/lib/python2.7/site-packages# cat sitecustomize.py
# encoding=utf8
import sys  reload(sys)
sys.setdefaultencoding('utf8')

Let me check: 让我检查一下:

as3:~/ngokevin-site# python
Python 2.7.6 (default, Dec  6 2013, 14:49:02)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.getdefaultencoding()
'utf8'
>>>

The above shows the default encoding of python is utf8 . 上面显示了python的默认编码为utf8 Then the error is no more. 然后错误不再存在。


#4楼

In some cases, when you check your default encoding ( print sys.getdefaultencoding() ), it returns that you are using ASCII. 在某些情况下,当您检查默认编码( print sys.getdefaultencoding() )时,它将返回您正在使用ASCII。 If you change to UTF-8, it doesn't work, depending on the content of your variable. 如果更改为UTF-8,则无法使用,具体取决于变量的内容。 I found another way: 我发现了另一种方法:

import sys
reload(sys)
sys.setdefaultencoding('Cp1252')

#5楼

I find the best is to always convert to unicode - but this is difficult to achieve because in practice you'd have to check and convert every argument to every function and method you ever write that includes some form of string processing. 我发现最好的方法是始终转换为unicode-但这很难实现,因为在实践中,您必须检查每个参数并将其转换为曾经编写的包括某种形式的字符串处理的每个函数和方法。

So I came up with the following approach to either guarantee unicodes or byte strings, from either input. 因此,我想出了以下方法来从任一输入保证unicode或字节字符串。 In short, include and use the following lambdas: 简而言之,请包含并使用以下lambda:

# guarantee unicode string
_u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
_uu = lambda *tt: tuple(_u(t) for t in tt)
# guarantee byte string in UTF8 encoding
_u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t
_uu8 = lambda *tt: tuple(_u8(t) for t in tt)

Examples: 例子:

text='Some string with codes > 127, like Zürich'
utext=u'Some string with codes > 127, like Zürich'
print "==> with _u, _uu"
print _u(text), type(_u(text))
print _u(utext), type(_u(utext))
print _uu(text, utext), type(_uu(text, utext))
print "==> with u8, uu8"
print _u8(text), type(_u8(text))
print _u8(utext), type(_u8(utext))
print _uu8(text, utext), type(_uu8(text, utext))
# with % formatting, always use _u() and _uu()
print "Some unknown input %s" % _u(text)
print "Multiple inputs %s, %s" % _uu(text, text)
# but with string.format be sure to always work with unicode strings
print u"Also works with formats: {}".format(_u(text))
print u"Also works with formats: {},{}".format(*_uu(text, text))
# ... or use _u8 and _uu8, because string.format expects byte strings
print "Also works with formats: {}".format(_u8(text))
print "Also works with formats: {},{}".format(*_uu8(text, text))

Here's some more reasoning about this . 这是一些更多的理由 。


#6楼

tl;dr / quick fix tl; dr /快速修复

  • Don't decode/encode willy nilly 不要对Willy Nilly进行解码/编码
  • Don't assume your strings are UTF-8 encoded 不要以为您的字符串是UTF-8编码的
  • Try to convert strings to Unicode strings as soon as possible in your code 尝试在代码中尽快将字符串转换为Unicode字符串
  • Fix your locale: How to solve UnicodeDecodeError in Python 3.6? 修复您的语言环境: 如何在Python 3.6中解决UnicodeDecodeError?
  • Don't be tempted to use quick reload hacks 不要试图使用快速reload黑客

Unicode Zen in Python 2.x - The Long Version Python 2.x中的Unicode Zen-完整版

Without seeing the source it's difficult to know the root cause, so I'll have to speak generally. 在没有看到来源的情况下,很难知道根本原因,因此,我将不得不大胆地讲。

UnicodeDecodeError: 'ascii' codec can't decode byte generally happens when you try to convert a Python 2.x str that contains non-ASCII to a Unicode string without specifying the encoding of the original string. UnicodeDecodeError: 'ascii' codec can't decode byte当您尝试将包含非ASCII的Python 2.x str转换为Unicode字符串而不指定原始字符串的编码时, UnicodeDecodeError: 'ascii' codec can't decode byte

In brief, Unicode strings are an entirely separate type of Python string that does not contain any encoding. 简而言之,Unicode字符串是一种完全独立的Python字符串类型,不包含任何编码。 They only hold Unicode point codes and therefore can hold any Unicode point from across the entire spectrum. 它们仅保存Unicode 点代码 ,因此可以保存整个频谱中的任何Unicode点。 Strings contain encoded text, beit UTF-8, UTF-16, ISO-8895-1, GBK, Big5 etc. Strings are decoded to Unicode and Unicodes are encoded to strings . 字符串包含编码的文本,贝特UTF-8,UTF-16,ISO-8895-1,GBK,Big5等。 字符串被解码为Unicode,Unicodes被编码为字符串 Files and text data are always transferred in encoded strings. 文件和文本数据始终以编码的字符串传输。

The Markdown module authors probably use unicode() (where the exception is thrown) as a quality gate to the rest of the code - it will convert ASCII or re-wrap existing Unicodes strings to a new Unicode string. Markdown模块的作者可能会使用unicode() (引发异常的地方)作为其余代码的质量门-它将转换ASCII或将现有的Unicode字符串重新包装为新的Unicode字符串。 The Markdown authors can't know the encoding of the incoming string so will rely on you to decode strings to Unicode strings before passing to Markdown. Markdown作者不知道传入字符串的编码,因此在传递给Markdown之前,将依靠您将字符串解码为Unicode字符串。

Unicode strings can be declared in your code using the u prefix to strings. 可以使用字符串的u前缀在代码中声明Unicode字符串。 Eg 例如

>>> my_u = u'my ünicôdé strįng'
>>> type(my_u)
<type 'unicode'>

Unicode strings may also come from file, databases and network modules. Unicode字符串也可能来自文件,数据库和网络模块。 When this happens, you don't need to worry about the encoding. 发生这种情况时,您无需担心编码。

Gotchas 陷阱

Conversion from str to Unicode can happen even when you don't explicitly call unicode() . 即使您没有显式调用unicode()也可能发生从str到Unicode的转换。

The following scenarios cause UnicodeDecodeError exceptions: 以下情况导致UnicodeDecodeError异常:

# Explicit conversion without encoding
unicode('€')# New style format string into Unicode string
# Python will try to convert value string to Unicode first
u"The currency is: {}".format('€')# Old style format string into Unicode string
# Python will try to convert value string to Unicode first
u'The currency is: %s' % '€'# Append string to Unicode
# Python will try to convert string to Unicode first
u'The currency is: ' + '€'

Examples 例子

In the following diagram, you can see how the word café has been encoded in either "UTF-8" or "Cp1252" encoding depending on the terminal type. 在下图中,您可以看到单词café是如何根据终端类型以“ UTF-8”或“ Cp1252”编码进行编码的。 In both examples, caf is just regular ascii. 在两个示例中, caf都是常规ascii。 In UTF-8, é is encoded using two bytes. 在UTF-8中, é使用两个字节进行编码。 In "Cp1252", é is 0xE9 (which is also happens to be the Unicode point value (it's no coincidence)). 在“ Cp1252”中,é是0xE9(这也恰好是Unicode点值(这不是巧合))。 The correct decode() is invoked and conversion to a Python Unicode is successfull: 调用正确的decode() ,并成功转换为Python Unicode:

In this diagram, decode() is called with ascii (which is the same as calling unicode() without an encoding given). 在此图中,使用ascii调用decode() (与在不提供编码的情况下调用unicode()相同)。 As ASCII can't contain bytes greater than 0x7F , this will throw a UnicodeDecodeError exception: 由于ASCII不能包含大于0x7F字节,这将引发UnicodeDecodeError异常:

The Unicode Sandwich Unicode三明治

It's good practice to form a Unicode sandwich in your code, where you decode all incoming data to Unicode strings, work with Unicodes, then encode to str s on the way out. 好的做法是在代码中形成Unicode三明治,在其中将所有传入数据解码为Unicode字符串,使用Unicode,然后在输出时编码为str This saves you from worrying about the encoding of strings in the middle of your code. 这使您不必担心代码中间的字符串编码。

Input / Decode 输入/解码

Source code 源代码

If you need to bake non-ASCII into your source code, just create Unicode strings by prefixing the string with a u . 如果需要将非ASCII烘烤到源代码中,只需在字符串前面加上u即可创建Unicode字符串。 Eg 例如

u'Zürich'

To allow Python to decode your source code, you will need to add an encoding header to match the actual encoding of your file. 要允许Python解码您的源代码,您将需要添加一个编码标头以匹配文件的实际编码。 For example, if your file was encoded as 'UTF-8', you would use: 例如,如果您的文件编码为“ UTF-8”,则可以使用:

# encoding: utf-8

This is only necessary when you have non-ASCII in your source code . 仅当源代码中包含非ASCII时才需要这样做。

Files 档案

Usually non-ASCII data is received from a file. 通常从文件接收非ASCII数据。 The io module provides a TextWrapper that decodes your file on the fly, using a given encoding . io模块提供了一个TextWrapper,它使用给定的encoding动态地解码文件。 You must use the correct encoding for the file - it can't be easily guessed. 您必须为文件使用正确的编码-不容易猜测。 For example, for a UTF-8 file: 例如,对于UTF-8文件:

import io
with io.open("my_utf8_file.txt", "r", encoding="utf-8") as my_file:my_unicode_string = my_file.read()

my_unicode_string would then be suitable for passing to Markdown. 然后my_unicode_string将适合传递给Markdown。 If a UnicodeDecodeError from the read() line, then you've probably used the wrong encoding value. 如果read()行出现UnicodeDecodeError ,则您可能使用了错误的编码值。

CSV Files CSV文件

The Python 2.7 CSV module does not support non-ASCII characters

如何解决:“ UnicodeDecodeError:#39;ascii#39;编解码器无法解码字节”相关推荐

  1. UnicodeEncodeError:#39;ascii#39;编解码器无法在位置20编码字符u#39;\\ xa0#39;:序数不在范围内(128)

    我在处理从不同网页(在不同站点上)获取的文本中的unicode字符时遇到问题. 我正在使用BeautifulSoup. 问题是错误并非总是可重现的. 它有时可以在某些页面上使用,有时它会通过抛出Uni ...

  2. 解决‘utf8‘编解码器无法解码字节0xa3异常

    今天用Python编写自动数据检索脚本的时候遇到一个问题.花了很多的时间.就是通过xlwt包往Excel中写入DB数据的时候,编译器一直报错UnicodeDecodeError: 'utf8' cod ...

  3. 已解决UnicodeDecodeError: ‘ascii‘ codec can‘t decode byte 0x8e in position 0: ordinal not in range(128)

    已解决(Python编码问题)UnicodeDecodeError: 'ascii' codec can't decode byte 0x8e in position 0: ordinal not i ...

  4. 解决UnicodeDecodeError:'ascii' codec can't decode byte 0xe2 in position 123: ordinal not in range(128)

    解决UnicodeDecodeError:'ascii' codec can't decode byte 0xe2 in position 123: ordinal not in range(128) ...

  5. 成功解决UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 0: invalid continuation byt

    成功解决UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 0: invalid continuation byt ...

  6. 成功解决UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9a in position 0: invalid start byte

    成功解决UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9a in position 0: invalid start byte 目录 解决 ...

  7. 成功解决UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 28: illegal multibyte sequenc

    成功解决UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 28: illegal multibyte sequenc ...

  8. 成功解决UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 2: invalid start byte

    成功解决UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 2: invalid start byte 目录 解决 ...

  9. 成功解决UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 130: invalid continuation b

    成功解决UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 130: invalid continuation b ...

最新文章

  1. Linux之远程登录、远程拷贝命令 ssh scp
  2. pandas 字符串数据类型转换成数字型
  3. pfsense软路由防火墙(安装过程、L2tp配置)
  4. android自定义LinearLayout和View
  5. 771. Jewels and Stones 宝石与石头
  6. 关于vector的迭代器失效的问题
  7. java struts2 ajax_在struts2的Action中返回Ajax数据
  8. 熊猫直播 使用什么sdk_没什么可花的-但是16项基本操作才能让您开始使用熊猫
  9. 山东大学 2020级数据库系统 实验七
  10. Spring模板对象
  11. 如何构建一个简单的语音识别应用程序
  12. for循环如果先--_97-for循环嵌套 输出反倒直角三角形
  13. 为什么你的MySQL跑得很慢?
  14. 我的NAS安装之旅(1)——硬件和软件选型篇
  15. U8Cloud 3.5 试用笔记
  16. win10电脑360调用不到JAVA,win7/win10系统360浏览器打不开原因及解决方法
  17. antares任务调度系统预研
  18. Canvas API详解
  19. 人工智能未来发展论文
  20. 计算机网络——(3)网络体系结构和协议

热门文章

  1. POS主密钥与工作密钥关联详解
  2. 计算机硬件2部件指的是什么,计算机基础-2.计算机硬件基础.doc
  3. 读书笔记-《增长黑客》-低成本、高效率的精准营销
  4. Android之View的绘制流程解析
  5. volatile对原子性、可见性、有序性的保证
  6. php bc 取字符串长度,PHP bcsqrt()用法及代码示例
  7. codeforces425C
  8. ZBrush关于遮罩的一些操作
  9. Docker 的 Web 管理工具 DockerFly
  10. memcached ---- 学习笔记