UnicodeDecodeError: 'shift_jis' codec can't decode byte 0x93 in position 4: illegal multibyte sequen
- 背景
今天想找个日语的分词工具,就看到了mecab,然后就在网上找到了相关的示例,运行一下各种报错。
先后安装的包有:
pip install mecab-python-windows
pip install mecab-python3
pip install mecab
pip install whoosh
Microsoft Visual C++ Build Tools": https://visualstudio.microsoft.com/downloads/
pip install tiny_tokenizer[all]pip install SudachiPy
pip install https://object-storage.tyo2.conoha.io/v1/nc_2520839e1f9641b08211a5c85243124a/sudachi/SudachiDict_core-20191030.tar.gzsudachipy link -t corepip install -U pytest
- 错误信息
return self.__parse_tostr(text, **kwargs)File "C:\Users\lixianwei\venv\lib\site-packages\natto\mecab.py", line 318, in __parse_tostrreturn self.__bytes2str(raw).strip()File "C:\Users\lixianwei\venv\lib\site-packages\natto\support.py", line 26, in bytes2strreturn b.decode(py3enc)
UnicodeDecodeError: 'shift_jis' codec can't decode byte 0x93 in position 4: illegal multibyte sequence
- 问题分析
很明显是编码问题,但是找了版本没找到哪里设置编码,python文件里已经设置了通常写代码时写的那句utf-8,这里显然取的不是那个,跟踪代码,发现获取的是系统编码。然后找到了这样一句话:
If natto-py for some reason cannot locate the mecab library, or if it cannot determine the correct charset used internally by mecab, then you will need to set the MECAB_PATH and MECAB_CHARSET environment variables.Set the MECAB_PATH environment variable to the exact name/path to your mecab library.
Set the MECAB_CHARSET environment variable to the charset character encoding used by your system dictionary.
e.g., for Mac OS:export MECAB_PATH=/usr/local/Cellar/mecab/0.996/lib/libmecab.dylib
export MECAB_CHARSET=utf8
e.g., for bash on UNIX/Linux:export MECAB_PATH=/usr/local/lib/libmecab.so
export MECAB_CHARSET=euc-jp
e.g., on Windows:set MECAB_PATH=C:\Program Files\MeCab\bin\libmecab.dll
set MECAB_CHARSET=shift-jis
e.g., from within a Python program:import osos.environ['MECAB_PATH']='/usr/local/lib/libmecab.so'
os.environ['MECAB_CHARSET']='utf-16'
出自:https://github.com/buruzaemon/natto-py
3. 解决办法
找到你安装的mecab位置,然后在python脚本里加入如下代码:
os.environ['MECAB_PATH']='C:\\Program Files (x86)\\MeCab\\bin\\libmecab.dll'
os.environ['MECAB_CHARSET']='utf-8'
UnicodeDecodeError: 'shift_jis' codec can't decode byte 0x93 in position 4: illegal multibyte sequen相关推荐
- 编码调试:UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xaf in position 12: illegal multibyte sequen
在程序段: stopkey = [w.strip() for w in codecs.open('data/stopWord.txt', 'r').readlines()] 出现错误: Unicode ...
- anaconda -spyder报错解决-UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 611: illegal
此文首发于我的个人博客:anaconda -spyder报错解决-UnicodeDecodeError 'gbk' codec can't decode byte 0x93 in position 6 ...
- 解决Python报错UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 658: illegal multibyte
解决Python报错–UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 658: illegal multibyte ...
- UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x80 in position 658: illegal multibyte sequence
解决Python报错–UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 658: illegal multibyte ...
- 成功解决UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xba in position 2: illegal multibyte sequence
成功解决UnicodeDecodeError: 'gbk' codec can't decode byte 0xba in position 2: illegal multibyte sequence ...
- 成功解决UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 28: illegal multibyte sequenc
成功解决UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 28: illegal multibyte sequenc ...
- UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 43: illegal multibyte sequence
python读取txt文件时报错: UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 43: illegal mul ...
- UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 199: illegal multibyte sequence
在做<机器学习实战>里的朴素贝叶斯算法时提示错误 UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 19 ...
- UnicodeDecodeError: 'gbk' codec can't decode byte 0xd2 in position 85: illegal multibyte sequence
1.今天,写一个小代码运行时,报了这个错误:UnicodeDecodeError: 'gbk' codec can't decode byte 0xd2 in position 85: illegal ...
- 踩坑记-- UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xa6 in position 17: illegal multibyte seque
在使用exejs运行js代码的时候发生如下报错,但是在命令行去运行js文件正常: Exception in thread Thread-1: Traceback (most recent call l ...
最新文章
- JDBC批处理读取指定Excel中数据到Mysql关系型数据库
- 前工404见闻,让我怀疑我是不是身处东南大学……
- Android 三角形控件
- python编写表白程序_python如何写出表白程序
- 已知三角形三点坐标求角度_细心研磨椭圆焦点三角形,这肯定是最全的解释。...
- 常用算法2 - 广度优先搜索 深度优先搜索 (python实现)
- js刷新当前页面的几种方式
- 如何提高在外国网站下载软件或文件的速度
- C++产生随机数的例题:投骰子的随机游戏
- 作者已死?AI正用艺术征服人类
- 【教程】批量删除B站抽奖动态
- PS一键生成鎏金字特效插件(糖果滤镜Skin Eye Candy)
- 电脑ftp服务器信息,电脑上的ftp信息服务器地址
- Jlink 烧录stm32 提示- ERROR: Verification of RAMCode failed @ address 0x20000000.
- 滑雪与时间胶囊 题解 BZOJ2753
- 疫情中的日本东京it工作
- 有N个台阶,可以走两步也可以走一步 一共有多少种走法
- 不开电脑机箱,Ubuntu下软件清除bios密码
- java8的stream流的使用
- Android 下实现 vlc 播放器解码网络摄像头