内容汇总：https://blog.csdn.net/weixin_43093481/article/details/114989382?spm=1001.2014.3001.5501
课程笔记：1.2 情感分析与朴素贝叶斯法(Sentiment Analysis with Naïve Bayes)
代码：https://github.com/Ogmx/Natural-Language-Processing-Specialization
————————————————————————————————————

作业 2: 朴素贝叶斯(Naive Bayes)

学习目标：
学习朴素贝叶斯原理，并应用其对推特进行情感分析。给出一条推特，判断其是正向情感还是负向情感。

具体而言，将会学习：

训练朴素贝叶斯模型用于情感分析
测试模型
计算正向词和负向词比率
进行错误分析
使用自己的数据预测

你可能已经熟悉朴素贝叶斯法及其原理和条件概率与独立性

在本项目中，将使用正向情感与负向情感的概率比率
这种方法能简单快速的解决二分类问题

导入python库

from utils import process_tweet, lookup
import pdb
from nltk.corpus import stopwords, twitter_samples
import numpy as np
import pandas as pd
import nltk
import string
from nltk.tokenize import TweetTokenizer
from os import getcwd

下载数据

nltk.download('stopwords')
nltk.download('twitter_samples')

划分数据集

# get the sets of positive and negative tweets
all_positive_tweets = twitter_samples.strings('positive_tweets.json')
all_negative_tweets = twitter_samples.strings('negative_tweets.json')# split the data into two pieces, one for training and one for testing (validation set)
test_pos = all_positive_tweets[4000:]
train_pos = all_positive_tweets[:4000]
test_neg = all_negative_tweets[4000:]
train_neg = all_negative_tweets[:4000]train_x = train_pos + train_neg
test_x = test_pos + test_neg# avoid assumptions about the length of all_positive_tweets
train_y = np.append(np.ones(len(train_pos)), np.zeros(len(train_neg)))
test_y = np.append(np.ones(len(test_pos)), np.zeros(len(test_neg)))

Part 1: 数据处理

对于任何机器学习项目，当获取完数据后，第一步操作一定是对数据进行处理，使其符合模型的输入

去除噪音: 移除数据中的噪音，即移除那些不关键的单词，如一些常见词’ ‘I, you, are, is, etc…’ ,因为这些词不会提供任何情感信息。
同样要移除标签符号，如转发符号、超链接和标签，因为它们同样不会提供任何情感信息.
对于标点符号，虽然其会包含一些情感信息，但出于简单考虑，同样将其移除
最后，对各单词进行词根化处理，如 “motivation”, “motivated”, and “motivate” 将其转换为同一词根 “motiv-”.

使用函数 process_tweet() 来处理数据.

custom_tweet = "RT @Twitter @chapagain Hello There! Have a great day. :) #good #morning http://chapagain.com.np"# print cleaned tweet
print(process_tweet(custom_tweet))

[‘hello’, ‘great’, ‘day’, ‘: )’, ‘good’, ‘morn’]

Part 1.1 实现帮助函数

为了训练朴素贝叶斯模型，需要先构建一个词频字典，键为(word, label)，值为对应的频率。其中，label为1或0，表示正向情感和负向情感。

实现lookup() 帮助函数，其输入freqs 字典,一个单词,和一个标签(1 or 0)，返回该(word, label)在语料库中出现次数。

例如：对于这两条推特 ["i am rather excited", "you are rather happy"] 和标签 1, 其频率字典如下：

{
(“rather”, 1): 2
(“happi”, 1) : 1
(“excit”, 1) : 1
}

对于语料库中的各个单词，都为其指定相同的标签1
对于 “i” 和 “am” 这样的单词并没被保存，因为其作为停用词在数据处理时被移除
因为 “rather” 在两文本中都出现一次，因此其频率为2

实现`count_tweets()`函数

实现 count_tweets()函数，其输入一系列推特，对其进行处理，最后返回词频字典

键为单词词根和其标签, 如 (“happi”,1).
值为该单词在语料库中出现次数 (一个整数).

# UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def count_tweets(result, tweets, ys):'''Input:result: a dictionary that will be used to map each pair to its frequencytweets: a list of tweetsys: a list corresponding to the sentiment of each tweet (either 0 or 1)Output:result: a dictionary mapping each pair to its frequency'''### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###for y, tweet in zip(ys, tweets):for word in process_tweet(tweet):# define the key, which is the word and label tuplepair = (word,y)# if the key exists in the dictionary, increment the countif pair in result:result[pair] += 1# else, if the key is new, add it to the dictionary and set the count to 1else:result[pair] = 1### END CODE HERE ###return result

# Testing your function
result = {}
tweets = ['i am happy', 'i am tricked', 'i am sad', 'i am tired', 'i am tired']
ys = [1, 0, 0, 0, 0]
count_tweets(result, tweets, ys)

{(‘happi’, 1): 1, (‘trick’, 0): 1, (‘sad’, 0): 1, (‘tire’, 0): 2}

Part 2: 训练朴素贝叶斯模型

朴素贝叶斯是一种算法可用于情感分析，可以在较短时间内完成训练和预测

如何训练朴素贝叶斯分类器?

训练朴素贝叶斯分类器的第一步是确定分类的类别
对于每一类建立一个概率.
P(Dpos)P(D_{pos})P(Dpos) 是正向文本的概率.
P(Dneg)P(D_{neg})P(Dneg) 是负向文本的概率.
可通过以下公式计算:

P(Dpos)=DposD(1)P(D_{pos}) = \frac{D_{pos}}{D}\tag{1}P(Dpos)=DDpos(1)

P(Dneg)=DnegD(2)P(D_{neg}) = \frac{D_{neg}}{D}\tag{2}P(Dneg)=DDneg(2)

其中 DDD 是文本总数, 即总推特数, DposD_{pos}Dpos 是正向推特的总数， DnegD_{neg}Dneg 是负向推特的总数。

先验(Prior)与对数先验(Logprior)

先验概率表示数据集中一条推特是正向还是负向的潜在概率。即当我们不知道具体信息时，随机从数据集中抽取一条推特，其为正向的概率是多少？为负向的概率是多少？这就是先验

先验是概率的比值 P(Dpos)P(Dneg)\frac{P(D_{pos})}{P(D_{neg})}P(Dneg)P(Dpos).
可以对其取对数进行缩放，即得到对数先验

logprior=log(P(Dpos)P(Dneg))=log(DposDneg)\text{logprior} = log \left( \frac{P(D_{pos})}{P(D_{neg})} \right) = log \left( \frac{D_{pos}}{D_{neg}} \right)logprior=log(P(Dneg)P(Dpos))=log(DnegDpos).

注意 log(AB)log(\frac{A}{B})log(BA) 等价于 log(A)−log(B)log(A) - log(B)log(A)−log(B). 所以对数先验也可表示为两对数的差值：

logprior=log⁡(P(Dpos))−log⁡(P(Dneg))=log⁡(Dpos)−log⁡(Dneg)(3)\text{logprior} = \log (P(D_{pos})) - \log (P(D_{neg})) = \log (D_{pos}) - \log (D_{neg})\tag{3}logprior=log(P(Dpos))−log(P(Dneg))=log(Dpos)−log(Dneg)(3)

词的正向概率与负向概率

为了计算一个单词的正向概率和负向概率，使用如下输入：

freqposfreq_{pos}freqpos 和 freqnegfreq_{neg}freqneg 表示一个词在正向类和负向类中的频率，例如一个词的正向频率即其被标记为1的次数
NposN_{pos}Npos 和 NnegN_{neg}Nneg 是数据集(全部推特)中正向词和负向词的总数
VVV 数据集总单词数，不计算重复单词

通过下式来计算一个词的正向概率和负向概率
P(Wpos)=freqpos+1Npos+V(4)P(W_{pos}) = \frac{freq_{pos} + 1}{N_{pos} + V}\tag{4} P(Wpos)=Npos+Vfreqpos+1(4)
P(Wneg)=freqneg+1Nneg+V(5)P(W_{neg}) = \frac{freq_{neg} + 1}{N_{neg} + V}\tag{5} P(Wneg)=Nneg+Vfreqneg+1(5)

注意分子中 “+1” 用于实现加法平滑.详细解释见 wiki article

对数似然(Log likelihood)

为了计算一个词的对数似然，可使用下式：

loglikelihood=log⁡(P(Wpos)P(Wneg))(6)\text{loglikelihood} = \log \left(\frac{P(W_{pos})}{P(W_{neg})} \right)\tag{6}loglikelihood=log(P(Wneg)P(Wpos))(6)

建立 `freqs` 字典

给出 count_tweets() 函数, 计算建立 freqs 字典，包含全部频率.
freqs 字典中, 键为(word, label)
值为对应键出现的次数

该字典将会被多次使用

# Build the freqs dictionary for later usesfreqs = count_tweets({}, train_x, train_y)

训练模型

给出频率字典, train_x (推特文本) 和 train_y (对应标签)，实现朴素贝叶斯分类器

计算 VVV

统计freqs字典中不重复单词个数VVV (可使用 set 函数).

计算 freqposfreq_{pos}freqpos 和 freqnegfreq_{neg}freqneg

使用 freqs 字典, 计算各单词的正向频率 freqposfreq_{pos}freqpos 和负向频率 freqnegfreq_{neg}freqneg.

计算 NposN_{pos}Npos 和 NnegN_{neg}Nneg

使用 freqs 字典，计算正向词总数NposN_{pos}Npos 和负向词总数 NnegN_{neg}Nneg.

计算 DDD, DposD_{pos}Dpos, DnegD_{neg}Dneg

使用 train_y 计算推特总数 DDD, 正向推特数 DposD_{pos}Dpos 和负向推特数DnegD_{neg}Dneg.
计算一条推特是正向的概率 P(Dpos)P(D_{pos})P(Dpos), 和是负向的概率P(Dneg)P(D_{neg})P(Dneg)

计算对数先验(logprior)

对数先验为 log(Dpos)−log(Dneg)log(D_{pos}) - log(D_{neg})log(Dpos)−log(Dneg)

计算对数似然(loglikelihood)

最后，遍历词典中的每个单词，使用 lookup 函数得到各单词的正向频率 freqposfreq_{pos}freqpos，和负向频率freqnegfreq_{neg}freqneg.
计算各单词的正向概率 P(Wpos)P(W_{pos})P(Wpos), 负向概率P(Wneg)P(W_{neg})P(Wneg) ，使用下式：

P(Wpos)=freqpos+1Npos+V(4)P(W_{pos}) = \frac{freq_{pos} + 1}{N_{pos} + V}\tag{4} P(Wpos)=Npos+Vfreqpos+1(4)
P(Wneg)=freqneg+1Nneg+V(5)P(W_{neg}) = \frac{freq_{neg} + 1}{N_{neg} + V}\tag{5} P(Wneg)=Nneg+Vfreqneg+1(5)

注意: 使用字典存储各单词的对数似然，键为单词，值为该单词的对数似然

最后计算对数似然： log(P(Wpos)P(Wneg))log \left( \frac{P(W_{pos})}{P(W_{neg})} \right)log(P(Wneg)P(Wpos)).

# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def train_naive_bayes(freqs, train_x, train_y):'''Input:freqs: dictionary from (word, label) to how often the word appearstrain_x: a list of tweetstrain_y: a list of labels correponding to the tweets (0,1)Output:logprior: the log prior. (equation 3 above)loglikelihood: the log likelihood of you Naive bayes equation. (equation 6 above)'''loglikelihood = {}logprior = 0### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) #### calculate V, the number of unique words in the vocabularyvocab = set([pair[0] for pair in freqs.keys()])V = len(vocab)# calculate N_pos and N_negN_pos = N_neg = 0for pair in freqs.keys():# if the label is positive (greater than zero)if pair[1] > 0:# Increment the number of positive words by the count for this (word, label) pairN_pos += freqs[pair]# else, the label is negativeelse:# increment the number of negative words by the count for this (word,label) pairN_neg += freqs[pair]# Calculate D, the number of documentsD = len(train_y)# Calculate D_pos, the number of positive documents (*hint: use sum(<np_array>))D_pos = sum(train_y==1)# Calculate D_neg, the number of negative documents (*hint: compute using D and D_pos)D_neg = D - D_pos# Calculate logpriorlogprior = np.log(D_pos) - np.log(D_neg)# For each word in the vocabulary...for word in vocab:# get the positive and negative frequency of the wordfreq_pos = freqs.get((word,1),0)freq_neg = freqs.get((word,0),0)# calculate the probability that each word is positive, and negativep_w_pos = (freq_pos+1) / (N_pos+V)p_w_neg = (freq_neg+1) / (N_neg+V)# calculate the log likelihood of the wordloglikelihood[word] = np.log(p_w_pos / p_w_neg)### END CODE HERE ###return logprior, loglikelihood

# UNQ_C3 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# You do not have to input any code in this cell, but it is relevant to grading, so please do not change anything
logprior, loglikelihood = train_naive_bayes(freqs, train_x, train_y)
print(logprior)
print(len(loglikelihood))

0.0
9089

Part 3: 测试模型

现在我们有了 logprior 和 loglikelihood，可以通过对一些推特进行预测来验证模型

实现 `naive_bayes_predict`

方法：
实现 naive_bayes_predict 函数，用于预测推特.

该函数输入 tweet, logprior, loglikelihood.
返回该条推特是正向还是负向的概率.
对于每条推特, 对其中各单词的对数似然求和.
最后再加上对数先验，来预测该推特的情感分类

p=logprior+∑iN(loglikelihoodi)p = logprior + \sum_i^N (loglikelihood_i)p=logprior+i∑N(loglikelihoodi)

注意

通过训练数据计算先验，训练数据为平衡数据集(包含4000条正向推特和4000条负向推特)。因此正负数据比值为1，则对数先验为0。

本实验中对数先验为0，然而对于非平衡数据集，对数先验不为0，因此不要忘记加上对数先验。

# UNQ_C4 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def naive_bayes_predict(tweet, logprior, loglikelihood):'''Input:tweet: a stringlogprior: a numberloglikelihood: a dictionary of words mapping to numbersOutput:p: the sum of all the logliklihoods of each word in the tweet (if found in the dictionary) + logprior (a number)'''### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) #### process the tweet to get a list of wordsword_l = process_tweet(tweet)# initialize probability to zerop = 0# add the logpriorp += logpriorfor word in word_l:# check if the word exists in the loglikelihood dictionaryif word in loglikelihood:# add the log likelihood of that word to the probabilityp += loglikelihood[word]### END CODE HERE ###return p

# UNQ_C5 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# You do not have to input any code in this cell, but it is relevant to grading, so please do not change anything# Experiment with your own tweet.
my_tweet = 'She smiled.'
p = naive_bayes_predict(my_tweet, logprior, loglikelihood)
print('The expected output is', p)

The expected output is 1.5740278623499175

实现 test_naive_bayes

方法：

实现 test_naive_bayes 用来检测预测的准确性.
该函数输入 test_x, test_y, log_prior, 和 loglikelihood
返回模型的准确度.
使用 naive_bayes_predict 函数对每个 text_x 中的推特进行预测.

# UNQ_C6 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def test_naive_bayes(test_x, test_y, logprior, loglikelihood):"""Input:test_x: A list of tweetstest_y: the corresponding labels for the list of tweetslogprior: the logpriorloglikelihood: a dictionary with the loglikelihoods for each wordOutput:accuracy: (# of tweets classified correctly)/(total # of tweets)"""accuracy = 0  # return this properly### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###y_hats = []for tweet in test_x:# if the prediction is > 0if naive_bayes_predict(tweet, logprior, loglikelihood) > 0:# the predicted class is 1y_hat_i = 1else:# otherwise the predicted class is 0y_hat_i = 0# append the predicted class to the list y_hatsy_hats.append(y_hat_i)# error is the average of the absolute values of the differences between y_hats and test_yerror = sum(y_hats != test_y) / len(test_y)# Accuracy is 1 minus the erroraccuracy = 1 - error### END CODE HERE ###return accuracy

print("Naive Bayes accuracy = %0.4f" %(test_naive_bayes(test_x, test_y, logprior, loglikelihood)))

Naive Bayes accuracy = 0.9940

# UNQ_C7 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# You do not have to input any code in this cell, but it is relevant to grading, so please do not change anything# Run this cell to test your function
for tweet in ['I am happy', 'I am bad', 'this movie should have been great.', 'great', 'great great', 'great great great', 'great great great great']:# print( '%s -> %f' % (tweet, naive_bayes_predict(tweet, logprior, loglikelihood)))p = naive_bayes_predict(tweet, logprior, loglikelihood)
#     print(f'{tweet} -> {p:.2f} ({p_category})')print(f'{tweet} -> {p:.2f}')

I am happy -> 2.15
I am bad -> -1.29
this movie should have been great. -> 2.14
great -> 2.14
great great -> 4.28
great great great -> 6.41
great great great great -> 8.55

# Feel free to check the sentiment of your own tweet below
my_tweet = 'you are bad :('
naive_bayes_predict(my_tweet, logprior, loglikelihood)

-8.801622640492191

Part 4: 通过正负计数比划分单词

一些词有更多的正计数，被认为是更"正向"。同样，也有一些词被认为是更"负向"的
在不计算对数似然的情况下，定义积极或消极程度的一种方法是比较单词的积极频率和消极频率
- 当然，也可以使用对数似然来比较单词的正负向程度
可以计算一个单词的正负向频率比.
当计算出这个比率，就可以根据其高低来划分单词

实现 `get_ratio()`

给出 freqs 字典和一个单词，使用 lookup(freqs,word,1) 来得到该单词的正向计数
类似的，使用lookup() 函数来得到该单词负向计数
计算正负向计数比值

ratio=pos_words+1neg_words+1ratio = \frac{\text{pos\_words} + 1}{\text{neg\_words} + 1} ratio=neg_words+1pos_words+1

其中 pos_words 和 neg_words 对应于它们各自类别中单词的频率

Words	Positive word count	Negative Word Count
glad	41	2
arriv	57	4
:(	1	3663
:-(	0	378

# UNQ_C8 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def get_ratio(freqs, word):'''Input:freqs: dictionary containing the wordsword: string to lookupOutput: a dictionary with keys 'positive', 'negative', and 'ratio'.Example: {'positive': 10, 'negative': 20, 'ratio': 0.5}'''pos_neg_ratio = {'positive': 0, 'negative': 0, 'ratio': 0.0}### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) #### use lookup() to find positive counts for the word (denoted by the integer 1)pos_neg_ratio['positive'] = lookup(freqs,word,1)# use lookup() to find negative counts for the word (denoted by integer 0)pos_neg_ratio['negative'] = lookup(freqs,word,0)# calculate the ratio of positive to negative counts for the wordpos_neg_ratio['ratio'] = (pos_neg_ratio['positive']+1) / (pos_neg_ratio['negative']+1)### END CODE HERE ###return pos_neg_ratio

get_ratio(freqs, 'happi')['ratio']

8.526315789473685

实现 `get_words_by_threshold(freqs,label,threshold)`

当 label 设为1，选择正负计数比大于等于阈值的单词
当 label 设为0，选择正负计数比小于等于阈值的单词
使用 get_ratio() 函数生成一个字典，其包含正向计数、负向计数、正负计数比
构建一个字典到列表中，其中键为单词，值为一个字典类型pos_neg_ratio，即get_ratio()的返回值
例如，其结构如下:

{'happi':{'positive': 10, 'negative': 20, 'ratio': 0.5}
}

for key in freqs.keys():word, _ = keyprint(freqs[(word,_)])

23
30
7
14
27
72
2847
60
7
2
5
80
…

# UNQ_C9 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def get_words_by_threshold(freqs, label, threshold):'''Input:freqs: dictionary of wordslabel: 1 for positive, 0 for negativethreshold: ratio that will be used as the cutoff for including a word in the returned dictionaryOutput:word_set: dictionary containing the word and information on its positive count, negative count, and ratio of positive to negative counts.example of a key value pair:{'happi':{'positive': 10, 'negative': 20, 'ratio': 0.5}}'''word_list = {}### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###for key in freqs.keys():word, _ = key# get the positive/negative ratio for a wordpos_neg_ratio = get_ratio(freqs, word)# if the label is 1 and the ratio is greater than or equal to the threshold...if label == 1 and pos_neg_ratio['ratio'] >= threshold :# Add the pos_neg_ratio to the dictionaryword_list[word] = pos_neg_ratio# If the label is 0 and the pos_neg_ratio is less than or equal to the threshold...elif label == 0 and pos_neg_ratio['ratio'] <= threshold:# Add the pos_neg_ratio to the dictionaryword_list[word] = pos_neg_ratio# otherwise, do not include this word in the list (do nothing)### END CODE HERE ###return word_list

# Test your function: find negative words at or below a threshold
get_words_by_threshold(freqs, label=0, threshold=0.05)

{’

自然语言处理(NLP)编程实战-1.2 使用朴素贝叶斯实现情感分类相关推荐

朴素贝叶斯网络matlab实现_基于朴素贝叶斯的文本分类方法实战
基于朴素贝叶斯的文本分类方法一.朴素贝叶斯原理的介绍二.朴素贝叶斯分类器的代码实现分类器有时会产生错误结果,这时可以要求分类器给出一个最优的类别猜测结果,同时会给出这个猜测的概率估计值.朴素贝叶 ...

朴素贝叶斯（Naive Bayes）原理+编程实现拉普拉斯修正的朴素贝叶斯分类器
贝叶斯方法与朴素贝叶斯 1.生成模型与判别模型 2.贝叶斯 2.1贝叶斯公式 2.2贝叶斯方法 3朴素贝叶斯 3.1条件独立性假设 3.2朴素贝叶斯Naive在何处? 3.3朴素贝叶斯的三种模型 3. ...

NLP之TM之Dirichlet：朴素贝叶斯NB的先验概率之Dirichlet分布的应用
NLP之TM之Dirichlet:朴素贝叶斯NB的先验概率之Dirichlet分布的应用目录 1.Dirichlet骰子先验和后验分布的采样 2.稀疏Dirichlet先验的采样 1.Dirichl ...

机器学习实战（三）朴素贝叶斯（Peter Harrington著）
知识储备: 一.概率论和数理统计第一章概率论的基本概念 1.必须要掌握的名词 (1) 样本空间一般可以认为是整个样本 (2) 样本点其中的一个样本,其中每个样本一般可以理解为特征向量 (3) ...

机器学习实战（三）朴素贝叶斯NB（Naive Bayes）
目录 0. 前言 1. 条件概率 2. 朴素贝叶斯(Naive Bayes) 3. 朴素贝叶斯应用于文本分类 4. 实战案例 4.1. 垃圾邮件分类案例学习完机器学习实战的朴素贝叶斯,简单的做个笔记 ...

NLP系列(3)_用朴素贝叶斯进行文本分类(下)
作者: 龙心尘 && 寒小阳时间:2016年2月. 出处:http://blog.csdn.net/longxinchen_ml/article/details/50629110 h ...

自然语言处理NLP星空智能对话机器人系列：贝叶斯Bayesian Transformer课程片段1到片段7
Coherence is everything you need! – Gavin Wang(星空智能对话机器人作者,AI通用双线思考法创始人) 贝叶斯神经网络(Bayesian Neural Net ...

机器学习实战读书笔记(3)朴素贝叶斯
贝叶斯定理要理解贝叶斯推断,必须先理解贝叶斯定理.后者实际上就是计算"条件概率"的公式. 所谓"条件概率"(Conditional probability), ...

机器学习实战 - 读书笔记(04) - 朴素贝叶斯
核心公式 - 贝叶斯准则 \[p(c|x) = \frac{p(x|c)p(c)}{p(x)}\] p(c|x) 是在x发生的情况下,c发生的概率. p(x|c) 是在c发生的情况下,x发生的概率. ...

最新文章

内网通免广告_3D打印进军广告发光字领域，成为名副其实的智能打印工厂

linux下mysql数据库操作命令

Ctrl与Caps Lock键的交换

Java GC系列（4）：垃圾回收监视和分析

SQL查询得到(按编号分组的日期最大的记录)

游戏开发3D基础知识

上传苹果版本时错误解决办法：No suitable application records were found. Verify your bundle identifier

iphone 开发设置tableview 初始位置。

最新黑马程序员全套视频-.net视频，大家赶紧来下载吧，看图片水印上的QQ加我索取视频教程

如何彻底卸载Anaconda？

前端的IDE工具对比

tensorflow（一）windows 10 python3.6安装tensorflow1.4与基本概念解读

ubuntu中GoldenDict的使用

时钟系统和系统功耗的关系

2023华为机考刷题指南：八周机考速通车

【自考】——考后总结

如何修改Windows上Docker的镜像源

『论文复现系列』3.Glove

html给数字加货币单位,WPS如何批量给数字添加货币符号？

Swift - JSON

热门文章

swift block语法

免疫算法求解多元函数论文

国内20家优秀的低代码平台/厂商汇总

微信公众号开发整理（一）所有微信资料整理参考慕课网学习而得

C语言程序设计（基础篇）

金蝶K3 WISE 14.3版本增加用户账号

4192=鬼吹灯之龙岭迷窟

世界上最远的距离_泰戈尔

多套头像/壁纸/背景图资源微信小程序源码粉色UI 带流量主

电脑解锁后黑屏有鼠标_电脑黑屏后屏幕只有鼠标怎么办呢?

自然语言处理(NLP)编程实战-1.2 使用朴素贝叶斯实现情感分类

作业 2: 朴素贝叶斯(Naive Bayes)

导入python库

Part 1: 数据处理

Part 1.1 实现帮助函数

实现count_tweets()函数

Part 2: 训练朴素贝叶斯模型

如何训练朴素贝叶斯分类器?

先验(Prior)与对数先验(Logprior)

词的正向概率与负向概率

对数似然(Log likelihood)

建立 freqs 字典

训练模型

计算 VVV

计算 freqposfreq_{pos}freqpos​ 和 freqnegfreq_{neg}freqneg​

计算 NposN_{pos}Npos​ 和 NnegN_{neg}Nneg​

计算 DDD, DposD_{pos}Dpos​, DnegD_{neg}Dneg​

计算对数先验(logprior)

计算对数似然(loglikelihood)

Part 3: 测试模型

实现 naive_bayes_predict

注意

实现 test_naive_bayes

Part 4: 通过正负计数比划分单词

实现 get_ratio()

实现 get_words_by_threshold(freqs,label,threshold)

自然语言处理(NLP)编程实战-1.2 使用朴素贝叶斯实现情感分类相关推荐

最新文章

热门文章