fuzzywuzzy包一个可以对字符串进行模糊匹配的包

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
字符串的对比

fuzz.ratio()对位置敏感,全匹配

fuzz.partial_ratio()非完全匹配

str1 = '毛利是个小菜比'str2 = '毛利是个小菜比,毛利是个小菜比'print("fuzz.ratio相似度:",fuzz.ratio(str1,str2))print("fuzz.partial_ratio相似度:",fuzz.partial_ratio(str1,str2))
fuzz.ratio相似度: 64
fuzz.partial_ratio相似度: 100
str1 = '毛利说:是个小菜比'
str2 = '毛利说是个小菜比'
print("fuzz.ratio相似度:",fuzz.ratio(str1,str2))
print("fuzz.partial_ratio相似度:",fuzz.partial_ratio(str1,str2))
fuzz.ratio相似度: 94
fuzz.partial_ratio相似度: 88

忽略顺序匹配(token_sort_ratio)

str1 = '毛利说:是个小菜比'
str2 = '是个小菜比:毛利说'
print("fuzz.ratio相似度:",fuzz.ratio(str1,str2))
print("fuzz.partial_ratio相似度:",fuzz.partial_ratio(str1,str2))
print("token_sort_ratio相似度:",fuzz.token_sort_ratio(str1,str2))
fuzz.ratio相似度: 56
fuzz.partial_ratio相似度: 56
token_sort_ratio相似度: 100

去重子集匹配(token_set_ratio)

str1 = '毛利说:是个小菜比'
str2 = '毛利说:是个小小菜比'
print("fuzz.ratio相似度:",fuzz.ratio(str1,str2))
print("fuzz.partial_ratio相似度:",fuzz.partial_ratio(str1,str2))
print("token_sort_ratio相似度:",fuzz.token_sort_ratio(str1,str2))
print("token_set_ratio相似度:",fuzz.token_set_ratio(str1,str2))
fuzz.ratio相似度: 95
fuzz.partial_ratio相似度: 89
token_sort_ratio相似度: 95
token_set_ratio相似度: 95
print(fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear"))
print(fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear"))
84
100

process

用来返回模糊匹配的字符串和相似度

choices = ["python爬虫教程", "python机器学习教程", "Python数据分析教程", "pythonweb开发教程"]
print(process.extract("数据分析", choices, limit=3))
print(process.extractOne("分析", choices))
[('Python数据分析教程', 90), ('python爬虫教程', 0), ('python机器学习教程', 0)]
('Python数据分析教程', 90)
案例

求和

import numpy as np
import pandas as pd
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
def enum_row(row):print(row['state'])
def find_state_code(row):if row['state'] != 0:print(process.extractOne(row['state'], states, score_cutoff=80))
def capital(str):return str.capitalize()
def correct_state(row):if row['state'] != 0:state = process.extractOne(row['state'], states, score_cutoff=80)if state:state_name = state[0]return ' '.join(map(capital, state_name.split(' ')))return row['state']
def fill_state_code(row):if row['state'] != 0:state = process.extractOne(row['state'], states, score_cutoff=80)if state:state_name = state[0]return state_to_code[state_name]return ''if __name__ == "__main__":pd.set_option('display.width', 200)data = pd.read_excel('.\\sales.xlsx', sheet_name='sheet1', header=0)print('data.head() = \n', data.head())print('data.tail() = \n', data.tail())print('data.dtypes = \n', data.dtypes)print('data.columns = \n', data.columns)for c in data.columns:print(c, end=' ')print()data['total'] = data['Jan'] + data['Feb'] + data['Mar']print(data.head())print(data['Jan'].sum())print(data['Jan'].min())print(data['Jan'].max())print(data['Jan'].mean())print('=============')# 添加一行s1 = data[['Jan', 'Feb', 'Mar', 'total']].sum()print(s1)s2 = pd.DataFrame(data=s1)print(s2)print(s2.T)print(s2.T.reindex(columns=data.columns))# 即:s = pd.DataFrame(data=data[['Jan', 'Feb', 'Mar', 'total']].sum()).Ts = s.reindex(columns=data.columns, fill_value=0)print(s)data = data.append(s, ignore_index=True)data = data.rename(index={15:'Total'})print(data.tail())# apply的使用print('==============apply的使用==========')data.apply(enum_row, axis=1)state_to_code = {"VERMONT": "VT", "GEORGIA": "GA", "IOWA": "IA", "Armed Forces Pacific": "AP", "GUAM": "GU","KANSAS": "KS", "FLORIDA": "FL", "AMERICAN SAMOA": "AS", "NORTH CAROLINA": "NC", "HAWAII": "HI","NEW YORK": "NY", "CALIFORNIA": "CA", "ALABAMA": "AL", "IDAHO": "ID","FEDERATED STATES OF MICRONESIA": "FM","Armed Forces Americas": "AA", "DELAWARE": "DE", "ALASKA": "AK", "ILLINOIS": "IL","Armed Forces Africa": "AE", "SOUTH DAKOTA": "SD", "CONNECTICUT": "CT", "MONTANA": "MT","MASSACHUSETTS": "MA","PUERTO RICO": "PR", "Armed Forces Canada": "AE", "NEW HAMPSHIRE": "NH", "MARYLAND": "MD","NEW MEXICO": "NM","MISSISSIPPI": "MS", "TENNESSEE": "TN", "PALAU": "PW", "COLORADO": "CO","Armed Forces Middle East": "AE","NEW JERSEY": "NJ", "UTAH": "UT", "MICHIGAN": "MI", "WEST VIRGINIA": "WV", "WASHINGTON": "WA","MINNESOTA": "MN", "OREGON": "OR", "VIRGINIA": "VA", "VIRGIN ISLANDS": "VI","MARSHALL ISLANDS": "MH","WYOMING": "WY", "OHIO": "OH", "SOUTH CAROLINA": "SC", "INDIANA": "IN", "NEVADA": "NV","LOUISIANA": "LA","NORTHERN MARIANA ISLANDS": "MP", "NEBRASKA": "NE", "ARIZONA": "AZ", "WISCONSIN": "WI","NORTH DAKOTA": "ND","Armed Forces Europe": "AE", "PENNSYLVANIA": "PA", "OKLAHOMA": "OK", "KENTUCKY": "KY","RHODE ISLAND": "RI","DISTRICT OF COLUMBIA": "DC", "ARKANSAS": "AR", "MISSOURI": "MO", "TEXAS": "TX", "MAINE": "ME"}states = list(state_to_code.keys())print(fuzz.ratio('Python Package', 'PythonPackage'))print(process.extract('Mississippi', states))print(process.extract('Mississipi', states, limit=1))print(process.extractOne('Mississipi', states))data.apply(find_state_code, axis=1)print('Before Correct State:\n', data['state'])data['state'] = data.apply(correct_state, axis=1)print('After Correct State:\n', data['state'])data.insert(5, 'State Code', np.nan)data['State Code'] = data.apply(fill_state_code, axis=1)print(data)# group byprint('==============group by================')print(data.groupby('State Code'))print('All Columns:\n')print(data.groupby('State Code').sum())print('Short Columns:\n')print(data[['State Code', 'Jan', 'Feb', 'Mar', 'total']].groupby('State Code').sum())# 写入文件data.to_excel('sales_result.xlsx', sheet_name='Sheet1', index=False)

这方法好复杂,看来以后要写下office的笔记了

被忽视的fuzzywuzzy库相关推荐

  1. 原 python实现模糊匹配_使用python中的fuzzywuzzy库进行模糊匹配实例

    fuzzywuzzy库是Python中的模糊匹配库,它依据 Levenshtein Distance 算法 计算两个序列之间的差异. Levenshtein Distance 算法,又叫 Edit D ...

  2. Python中实现模糊匹配的魔法库:FuzzyWuzzy

    参考链接:https://mp.weixin.qq.com/s/5qzPb7HOCfRRGJICYUsAOQ FuzzyWuzzy一个简单易用的模糊字符串匹配工具包.让你轻松解决烦恼的匹配问题! 前言 ...

  3. Python字符串模糊匹配库FuzzyWuzzy

    Python字符串模糊匹配库FuzzyWuzzy 在计算机科学中,字符串模糊匹配(fuzzy string matching)是一种近似地(而不是精确地)查找与模式匹配的字符串的技术.换句话说,字符串 ...

  4. 一个非常好用的 Python 魔法库

    公众号关注 "视学算法" 设为"星标",第一时间知晓最新干货~ 来源:Be_melting https://blog.csdn.net/lys_828/arti ...

  5. 肝!一个非常好用的 Python 魔法库

    来源:Be_melting https://blog.csdn.net/lys_828/article/details/106489371 [导语]:还在为日常工作中不同的数据集的字段进行匹配烦恼?今 ...

  6. TensorFlow Hub介绍:TensorFlow中可重用的机器学习模块库

    摘要: 本文对TensorFlow Hub库的介绍,并举例说明其用法. 在软件开发中,最常见的失误就是容易忽视共享代码库,而库则能够使软件开发具有更高的效率.从某种意义上来说,它改变了编程的过程.我们 ...

  7. Linux 03day--基础命令04(vim编辑器、gcc、静态库、动态库)

    vim编辑器 安装 : sudo apt-get install vim 命令模式 移动光标 gg – 光标移动文件开头 G – 光标移动到文件末尾 0 – 光标移动到行首 $ – 光标移动到行尾 1 ...

  8. python字符串模糊匹配_NLP教程:用Fuzzywuzzy进行字符串模糊匹配

    在计算机科学中,字符串模糊匹配( fuzzy string matching)是一种近似地(而不是精确地)查找与模式匹配的字符串的技术.换句话说,字符串模糊匹配是一种搜索,即使用户拼错单词或只输入部分 ...

  9. python中令人惊艳的小众数据科学库

    Python是门很神奇的语言,历经时间和实践检验,受到开发者和数据科学家一致好评,目前已经是全世界发展最好的编程语言之一.简单易用,完整而庞大的第三方库生态圈,使得Python成为编程小白和高级工程师 ...

最新文章

  1. java正则表达式课程_通过此免费课程学习正则表达式
  2. [svn] 分支开发
  3. ip6tables 无法基于端口过滤IPv6 分片报文问题解决
  4. 斯坦福CS231n项目实战(二):线性支持向量机SVM
  5. oracle列设置标题,oracle实现某一列的值转换为列标题
  6. socket通信之最简单的socket通信
  7. [css] 解释下为什么css的reset不建议直接这么写:*{ margin:0; padding:0;}?
  8. wordpress安装_WordPress第三课:使用SOFTACULOUS安装WORDPRESS
  9. SVC较好的介绍资料
  10. ##(C语言) CSP 201612-2 工资计算(打表法)(100分)
  11. C++程序员迈向百万年薪的最后一道坎
  12. 顶点计划:秸秆问题讨论
  13. 泛微oa服务器文件,泛微oa云服务器要求
  14. 常用计算机维修方法有哪些,计算机常见硬件故障的诊断及其处理分析
  15. Linux下 lnmp一键安装
  16. ORBSLAM3整体框架
  17. 武-NC15522(Dijsktra最短路算法)
  18. java 北京时区_世界时区和Java时区详解
  19. 算法工程师13——机器学习强化
  20. [mp3 @ 000002bbaa0d8500] Format mp3 detected only with low score of 1, misdetection possible!

热门文章

  1. python中的请求方法_http协议的9种请求方法
  2. 唯一标识 微信小程序_微信小程序获取用户唯一标识(不用授权)
  3. python中集合所用的reduce_Python中reduce函数和lambda表达式的学习
  4. 民用建筑工程给水排水设计深度图样_「安装工程识图」建筑给水排水施工图的识读方法...
  5. CENTOS elasticsearch plugin install:Failed: SSLException[java.security.ProviderException,解决
  6. SpringBoot配置Druid
  7. 【freemaker】之include,import区别
  8. VB6.0连接MySQL数据库
  9. csuoj 1350: To Add Which?
  10. .net访问PostgreSQL数据库发生“找不到函数名”的问题追踪