python写的crf训练代码

原文地址：https://gist.github.com/neubig/7352832

This is a script to train conditional random fields. It is written to minimize the number of lines of code, with no regard for efficiency.

[python] view plaincopy

#!/usr/bin/python
# crf.py (by Graham Neubig)
# This script trains conditional random fields (CRFs)
# stdin: A corpus of WORD_POS WORD_POS WORD_POS sentences
# stdout: Feature vectors for emission and transition properties
from collections import defaultdict
from math import log, exp
import sys
import operator
# The L2 regularization coefficient and learning rate for SGD
l2_coeff = 1
rate = 10
# A dictionary to map tags to integers
tagids = defaultdict(lambda: len(tagids))
tagids["<S>"] = 0
############# Utility functions ###################
def dot(A, B):
return sum(A[k]*B[k] for k in A if k in B)
def add(A, B):
C = defaultdict(A, lambda: 0)
# for k, v in A.items(): C[k] += v
for k, v in B.items(): C[k] += v
return C
def logsumexp(A):
k = max(A)
return log(sum( exp(i-k) for i in A ))+k
############# Functions for memoized probability
def calc_feat(x, i, l, r):
return { ("T", l, r): 1, ("E", r, x[i]): 1 }
def calc_e(x, i, l, r, w, e_prob):
if (i, l, r) not in e_prob:
e_prob[i,l,r] = dot(calc_feat(x, i, l, r), w)
return e_prob[i,l,r]
def calc_f(x, i, l, w, e, f):
if (i, l) not in f:
if i == 0:
f[i,0] = 0
else:
prev_states = (range(1, len(tagids)) if i != 1 else [0])
f[i,l] = logsumexp([
calc_f(x, i-1, k, w, e, f) + calc_e(x, i, k, l, w, e)
for k in prev_states])
return f[i,l]
def calc_b(x, i, r, w, e, b):
if (i, r) not in b:
if i == len(x)-1:
b[i,0] = 0
else:
prev_states = (range(1, len(tagids)) if i != len(x)-2 else [0])
b[i,r] = logsumexp([
calc_b(x, i+1, k, w, e, b) + calc_e(x, i, r, k, w, e)
for k in prev_states])
return b[i,r]
############# Function to calculate gradient ######
def calc_gradient(x, y, w):
f_prob = {(0,0): 0}
b_prob = {(len(x)-1,0): 0}
e_prob = {}
grad = defaultdict(lambda: 0)
# Add the features for the numerator
for i in range(1, len(x)):
for k, v in calc_feat(x, i, y[i-1], y[i]).items(): grad[k] += v
# Calculate the likelihood and normalizing constant
norm = calc_b(x, 0, 0, w, e_prob, b_prob)
lik = dot(grad, w) - norm
# Subtract the features for the denominator
for i in range(1, len(x)):
for l in (range(1, len(tagids)) if i != 1 else [0]):
for r in (range(1, len(tagids)) if i != len(x)-1 else [0]):
# Find the probability of using this path
p = exp(calc_e(x, i, l, r, w, e_prob)
+ calc_b(x, i, r, w, e_prob, b_prob)
+ calc_f(x, i-1, l, w, e_prob, f_prob)
- norm)
# Subtract the expectation of the features
for k, v in calc_feat(x, i, l, r).items(): grad[k] -= v * p
# print grad
# Return the gradient and likelihood
return (grad, lik)
############### Main training loop
if __name__ == '__main__':
# load in the corpus
corpus = []
for line in sys.stdin:
words = [ "<S>" ]
tags = [ 0 ]
line = line.strip()
for w_t in line.split(" "):
w, t = w_t.split("_")
words.append(w)
tags.append(tagids[t])
words.append("<S>")
tags.append(0)
corpus.append( (words, tags) )
# for 50 iterations
w = defaultdict(lambda: 0)
for iternum in range(1, 50+1):
grad = defaultdict(lambda: 0)
# Perform regularization
reg_lik = 0;
for k, v in w.items():
grad[k] -= 2*v*l2_coeff
reg_lik -= v*v*l2_coeff
# Get the gradients and likelihoods
lik = 0
for x, y in corpus:
my_grad, my_lik = calc_gradient(x, y, w)
for k, v in my_grad.items(): grad[k] += v
lik += my_lik
l1 = sum( [abs(k) for k in grad.values()] )
print >> sys.stderr, "Iter %r likelihood: lik=%r, reg=%r, reg+lik=%r gradL1=%r" % (iternum, lik, reg_lik, lik+reg_lik, l1)
# Here we are updating the weights with SGD, but a better optimization
# algorithm is necessary if you want to use this in practice.
for k, v in grad.items(): w[k] += v/l1*rate
# Reverse the tag strings
strs = range(0, len(tagids))
for k, v in tagids.items(): strs[v] = k
# Print the features
for k, v in sorted(w.iteritems(), key=operator.itemgetter(1)):
if k[0] == "E": print "%s %s %s\t%r" % (k[0], strs[k[1]], k[2], v)
else: print "%s %s %s\t%r" % (k[0], strs[k[1]], strs[k[2]], v)

python写的crf训练代码相关推荐

python写一个游戏多少代码-使用Python写一个贪吃蛇游戏实例代码
我在程序中加入了分数显示,三种特殊食物,将贪吃蛇的游戏逻辑写到了SnakeGame的类中,而不是在Snake类中. 特殊食物: 1.绿色:普通,吃了增加体型 2.红色:吃了减少体型 3.金色:吃了回到 ...
女神相册密码忘记了，我只用Python写了20行代码
视频地址我用20行代码,帮女神破解相册密码一.事情是这样的今早上班,公司女神小姐姐说,她去年去三亚旅游的照片打不开了好奇问了一下才知道. 原来是,她把照片压缩了,而且还加了密码. 但是密码不 ...
python扫雷游戏实验分析_用python写扫雷游戏实例代码分享
扫雷是一个非常经典的WIN游戏,我们教给大家用python语言来写出这个游戏,以下是全部实例代码: #!/usr/bin/python #coding:utf-8 #python 写的扫雷游戏 imp ...
用python写一段表白代码
您好!以下是用 Python 写的表白代码: print("亲爱的,我一直在想念你.") print("你是我生命中最重要的人,你是我一直以来的支持者,我的朋友,我的爱人 ...
女神相册密码忘记了，我只用Python写了20行代码就破解了！
一.事情是这样的今早上班,公司女神小姐姐说,她去年去三亚旅游的照片打不开了好奇问了一下才知道. 原来是,她把照片压缩了,而且还加了密码. 但是密码不记得了,只记得是一串6位数字. 话说照片压缩率也 ...
python写一个游戏多少代码-使用50行Python代码从零开始实现一个AI平衡小游戏
集智导读: 本文会为大家展示机器学习专家 Mike Shi 如何用 50 行 Python 代码创建一个 AI,使用增强学习技术,玩耍一个保持杆子平衡的小游戏.所用环境为标准的 OpenAI Gym, ...
python爬虫代码-学Python=写爬虫？不用代码也能爬下95%网站的数据！
你好,这里是BIMBOX,我是老孙. 前些天BOX群里一位小伙伴问我们,现在市面上有一千多块钱的Python网络课程,两个月学完,能入门网络爬虫,大部分网站的数据都可以爬下来,这个学费值不值得? 我们 ...
用python写父亲节祝福代码_父亲节，程序员几条代码硬核示爱
摘要:祝所有的父亲,节日快乐! 父亲节要送什么? 对老爸的爱在心口难开怎么办? 都说父爱如山,山也需要偶尔的温情问候,与其在网上遍寻各种攻略,不如敲起手中的键盘,码出几行代码,用你最熟悉的方式表达对父 ...
python量化投资必背代码-重磅！我把自己耗费两年用Python写的量化投资代码开源了！...
原文地址:https://mp.weixin.qq.com/s?__biz=MzU4ODcyMTI1Nw==&mid=2247483842&idx=1&sn=024de1af0 ...
python写一个游戏多少代码-Python项目实战之猜数字游戏（含实现代码）
猜数字游戏,旨在提高初学者对 Python 变量类型以及循环结构的使用. 此游戏的游戏规则如下:程序随机内置一个位于一定范围内的数字作为猜测的结果,由用户猜测此数字.用户每猜测一次,由系统提示猜测结果 ...

python写的crf训练代码

python写的crf训练代码相关推荐

最新文章

热门文章