I have a string ë́aúlt that I want to get the length of a manipulate based on character positions and so on. The problem is that the first ë́ is being counted twice, or I guess ë is in position 0 and ´ is in position 1.

Is there any possible way in Python to have a character like ë́ be represented as 1?

I'm using UTF-8 encoding for the actual code and web page it is being outputted to.

edit: Just some background on why I need to do this. I am working on a project that translates English to Seneca (a form of Native American language) and ë́ shows up quite a bit. Some rewrite rules for certain words require knowledge of letter position (itself and surrounding letters) and other characteristics, such as accents and other diacritic markings.

解决方案

UTF-8 is an unicode encoding which uses more than one byte for special characters. If you don't want the length of the encoded string, simple decode it and use len() on the unicode object (and not the str object!).

Here are some examples:

>>> # creates a str literal (with utf-8 encoding, if this was

>>> # specified on the beginning of the file):

>>> len('ë́aúlt')

9

>>> # creates a unicode literal (you should generally use this

>>> # version if you are dealing with special characters):

>>> len(u'ë́aúlt')

6

>>> # the same str literal (written in an encoded notation):

>>> len('\xc3\xab\xcc\x81a\xc3\xbalt')

9

>>> # you can convert any str to an unicode object by decoding() it:

>>> len('\xc3\xab\xcc\x81a\xc3\xbalt'.decode('utf-8'))

6

Of course, you can also access single characters in an unicode object like you would do in a str object (they are both inheriting from basestring and therefore have the same methods):

>>> test = u'ë́aúlt'

>>> print test[0]

ë

If you develop localized applications, it's generally a good idea to use only unicode-objects internally, by decoding all inputs you get. After the work is done, you can encode the result again as 'UTF-8'. If you keep to this principle, you will never see your server crashing because of any internal UnicodeDecodeErrors you might get otherwise ;)

PS: Please note, that the str and unicode datatype have changed significantly in Python 3. In Python 3 there are only unicode strings and plain byte strings which can't be mixed anymore. That should help to avoid common pitfalls with unicode handling...

Regards,

Christoph

python 返回字符串长度,当使用特殊字符时,Python返回错误的字符串长度相关推荐

  1. python使用作为转义字符_当需要在字符串中使用特殊字符时,Python使用作为转义字符的起始符号...

    当需要在字符串中使用特殊字符时,Python使用作为转义字符的起始符号 答:\\ 最早出现的时间是 答:经前12小时 要想把握说话的艺术,需要掌握一下几个方面: 答:准确地说 清晰地说 礼貌地说 幽默 ...

  2. php 返回英文乱码,使用php 5时MySQL返回乱码的解决办法_php

    在使用 php 5 中,通过 mysql 查询得到的值全部成为 '???????',原来是字符集设错了. 我在安装 MySQL 5 时,已经选择默认字符集为 gb2312,但还是返回乱码,解决的办法是 ...

  3. python打开一个不存在的文件时-python判断文件是否存在,不存在就创建一个的实例...

    python判断文件是否存在,不存在就创建一个的实例 如下所示: try: f =open("D:/1.txt",'r') f.close() except IOError: f ...

  4. 在sphinx中处理使用特殊字符时所引起错误的办法

    用sphinx搜索引擎时,有时会遇到一些特殊字符,这些特殊字符可能会是sphinx致命性错误来源:如 "$"."^",在sphinx搜索关键字的最前或最尾出现这 ...

  5. python rarfile不支持unicode_python – zipfile提取时的unicode错误

    一个建议: 我这样做时收到错误: >>> c = chr(129) >>> c + u'2' Traceback (most recent call last): ...

  6. python中字母是什么类型_Python中只有一个字母的字符串属于字符类型。( )_学小易找答案...

    [单选题]下列选项中,用于标识为静态方法的是( ). [单选题]"多.夺.躲.惰"的区别在于( ) (7.0分) [单选题]子类能继承父类的一切属性和方法.( ) [单选题]使用类 ...

  7. 使用memcpy函数时要注意拷贝数据的长度

    memcpy函数简介 memcpy函数是C/C++语言中的一个用于内存复制的函数,声明在 string.h 中(C++是 cstring).其原型是: void *memcpy(void *desti ...

  8. 关于python的比赛_【蓝桥杯】——python集团的比赛技巧,Python,组

    [蓝桥杯]-- Python组比赛技巧 蓝桥杯是大学生IT学科赛事,由工业和信息化部人才交流中心主办,所以对于大学生还说还是非常值得去参加的,2020年第十一届蓝桥杯新增了大学Python组,不分组别 ...

  9. oracle sql字符拆分字符串函数,oracle-是否有在PL / SQL中拆分字符串的功能?

    oracle-是否有在PL / SQL中拆分字符串的功能? 我需要编写一个过程来规范具有由一个字符连接的多个令牌的记录. 我需要获得这些令牌来分割字符串,并将每个令牌作为新记录插入表中. Oracle ...

最新文章

  1. 二叉搜索树-创建最小高度树(递归)
  2. 【Uva - 10935】 Throwing cards away I (既然是I,看来还有Ⅱ、Ⅲ、Ⅳ?)(站队问题队列问题)
  3. 实属无奈!华为加入不送充电器阵营
  4. Opencv之斑点(Blob)检测--SimpleBlobDetector_create
  5. 【SICP练习】151 练习4.7
  6. C#多线程之线程同步篇2
  7. ibm服务器查看刀片状态,IBM刀片服务器 blade center s常见问答
  8. [Google] 再见 SharedPreferences 拥抱 Jetpack DataStore
  9. GUI图形用户接口编写QQ登录界面
  10. 许昌科三魏武路考试技巧
  11. 自定义listview实现第一章之“初九”
  12. waves服务器系统盘,Waves Soundgrid服务器(DIY)
  13. 爬虫chromedriver被识别怎么办?
  14. python scrapy框架 抓取的图片路径打不开图片_Python使用Scrapy爬虫框架全站爬取图片并保存本地的实现代码...
  15. 打了10次电话,才总结出来的抖音封号原因分析,能避免大量封号
  16. 使用快照启动 FIBOS、EOS 节点
  17. 基于深度学习的人脸性别识别系统(含UI界面,Python代码)
  18. Congested Crowd Instance Localization with Dilated Convolutional Swin Transformer
  19. Shell简单编程实例
  20. 潮牌服装专卖店装饰CAD图,设计属于自己的高档店!

热门文章

  1. 最新汇总!这些高校已确定开学时间!
  2. 渔民之友:Google X 新项目,用计算机视觉养鱼
  3. 推荐几个(抖音/阿里/腾讯)年薪100W大佬的硬核公众号
  4. 科技公司最常用的50款开源工具,提升你的逼格~
  5. fastText实现文本分类
  6. python进程执行带有参数的任务(args、kwargs)
  7. Redis中集合set数据类型(增加(添加元素)、获取(获取所有元素)、删除(删除指定元素))
  8. Linux之切换目录命令
  9. schema.sql自动写入。由于版本问题。2.x之后。就不行了。·
  10. spring cloud互联网分布式微服务云平台规划分析--服务统一配置中心