本文翻译自:Stripping everything but alphanumeric chars from a string in Python

What is the best way to strip all non alphanumeric characters from a string, using Python? 使用Python从字符串中剥离所有非字母数字字符的最佳方法是什么?

The solutions presented in the PHP variant of this question will probably work with some minor adjustments, but don't seem very 'pythonic' to me. 这个问题的PHP变体中提供的解决方案可能会进行一些小的调整,但对我来说似乎并不是很“ pythonic”。

For the record, I don't just want to strip periods and commas (and other punctuation), but also quotes, brackets, etc. 作为记录,我不仅要删除句点和逗号(以及其他标点符号),还希望删除引号,方括号等。




Regular expressions to the rescue: 救援的正则表达式:

import re
re.sub(r'\W+', '', your_string)

By Python definition '\\W == [^a-zA-Z0-9_] , which excludes all numbers , letters and _ 通过Python定义'\\W == [^a-zA-Z0-9_] ,其中排除了所有numbersletters_


How about: 怎么样:

def ExtractAlphanumeric(InputString):from string import ascii_letters, digitsreturn "".join([ch for ch in InputString if ch in (ascii_letters + digits)])

This works by using list comprehension to produce a list of the characters in InputString if they are present in the combined ascii_letters and digits strings. 如果组合的ascii_lettersdigits字符串中存在字符,则可以使用列表ascii_letters来生成InputString中的字符列表。 It then joins the list together into a string. 然后,它将列表连接到一个字符串中。


>>> import re
>>> string = "Kl13@£$%[};'\""
>>> pattern = re.compile('\W')
>>> string = re.sub(pattern, '', string)
>>> print string



print ''.join(ch for ch in some_string if ch.isalnum())


I just timed some functions out of curiosity. 我只是出于好奇而对某些功能进行了计时。 In these tests I'm removing non-alphanumeric characters from the string string.printable (part of the built-in string module). 在这些测试中,我从字符串string.printable (内置string模块的一部分)中删除了非字母数字字符。 The use of compiled '[\\W_]+' and pattern.sub('', str) was found to be fastest. 发现编译后的'[\\W_]+'pattern.sub('', str)使用最快。

$ python -m timeit -s \"import string" \"''.join(ch for ch in string.printable if ch.isalnum())"
10000 loops, best of 3: 57.6 usec per loop$ python -m timeit -s \"import string" \"filter(str.isalnum, string.printable)"
10000 loops, best of 3: 37.9 usec per loop$ python -m timeit -s \"import re, string" \"re.sub('[\W_]', '', string.printable)"
10000 loops, best of 3: 27.5 usec per loop$ python -m timeit -s \"import re, string" \"re.sub('[\W_]+', '', string.printable)"
100000 loops, best of 3: 15 usec per loop$ python -m timeit -s \"import re, string; pattern = re.compile('[\W_]+')" \"pattern.sub('', string.printable)"
100000 loops, best of 3: 11.2 usec per loop


  1. MySQL 5.7.18忘记密码和密码过期解决
  2. 一种在旧代码上增加新需求的重构模式
  3. android:inputType 参数详解
  4. jfreechart 多参数传递
  5. MacPE+WinPE-黑苹果之路
  6. IE与FireFox的不同点(不断更新中..)
  7. numpy学习之创建数组
  8. 《机器学习实战》学习笔记第七章 —— AdaBoost元算法
  9. Birt报表安装及制作
  10. iOS中UINavigationController控制器使用详解