Chapter 4 Naive Bayes and Sentiment Classification

Speech and Language Processing ed3 读书笔记

Text categorization, the task of assigning a label or category to an entire text or document.

We focus on one common text categorization task, sentiment analysis, the extraction of sentiment, the positive or negative orientation that a writer expresses toward some object.

Spam detection is another important commercial application, the binary classification task of assigning an email to one of the two classes spam or not-spam.

Another thing we might want to know about a text is the language it’s written in. The task of language id is thus the first step in most language processing pipelines. Related tasks like determining a text’s author (authorship attribution), or author characteristics like gender, age, and native language are text classification tasks that are also relevant to the digital humanities, social sciences, and forensic linguistics.

One of the oldest tasks in text classification is assigning a library subject category or topic label to a text. In fact, as we will see, subject category classification is the task for which the naive Bayes algorithm was invented in 1961.

Generative classifiers like naive Bayes build a model of how a class could generate some input data. Given an observation, they return the class most likely to have generated the observation. Discriminative classifiers like logistic regression instead learn what features from the input are most useful to discriminate between the different possible classes.

4.1 Naive Bayes Classifiers

A bag-of-words is an unordered set of words with their position ignored, keeping only their frequency in the document.

Naive Bayes is a probabilistic classifier, meaning that for a document ddd, out of all classes c∈Cc \in Cc∈C the classifier returns the class c^\hat cc^ which has the maximum posterior probability given the document.
c^=argmaxc∈CP(c∣d)\hat c=\mathop{\textrm{argmax}}_{c\in C}P(c|d) c^=argmaxc∈C​P(c∣d)
The intuition of Bayesian classification is to use Bayes’ rule to transform Eq.1 into other probabilities that have some useful properties. Bayes’ rule is presented in Eq.2; it gives us a way to break down any conditional probability P(x∣y)P(x|y)P(x∣y) into three other probabilities:
P(x∣y)=P(y∣x)P(x)P(y)P(x|y)=\frac{P(y|x)P(x)}{P(y)} P(x∣y)=P(y)P(y∣x)P(x)​

c^=argmaxc∈CP(c∣d)=argmaxc∈CP(d∣c)P(c)P(d)\hat c=\mathop{\textrm{argmax}}_{c\in C}P(c|d)=\mathop{\textrm{argmax}}_{c\in C}\frac{P(d|c)P(c)}{P(d)} c^=argmaxc∈C​P(c∣d)=argmaxc∈C​P(d)P(d∣c)P(c)​

P(d)P(d)P(d) are identical for all ccc,
c^=argmaxc∈CP(c∣d)=argmaxc∈CP(d∣c)P(c)\hat c=\mathop{\textrm{argmax}}_{c\in C}P(c|d)=\mathop{\textrm{argmax}}_{c\in C}P(d|c)P(c) c^=argmaxc∈C​P(c∣d)=argmaxc∈C​P(d∣c)P(c)
We thus compute the most probable class c^\hat cc^ given some document ddd by choosing probability prior the class which has the highest product of two probabilities: the prior probability likelihood of the class P© and the likelihood of the document P(d∣c)P(d|c)P(d∣c):
c^=argmaxc∈CP(d∣c)⏞likelihoodP(c)⏞prior\hat c=\mathop{\textrm{argmax}}_{c\in C}\overbrace{P(d|c)}^{\textrm{likelihood}}\overbrace{P(c)}^{\textrm{prior}} c^=argmaxc∈C​P(d∣c)​likelihood​P(c)​prior​
Without loss of generalization, we can represent a document d as a set of features f1,f2,…,fnf_1,f_2,\ldots, f_nf1​,f2​,…,fn​:
c^=argmaxc∈CP(f1,f2,…,fn∣c)⏞likelihoodP(c)⏞prior\hat c=\mathop{\textrm{argmax}}_{c\in C}\overbrace{P(f_1,f_2,\ldots, f_n|c)}^{\textrm{likelihood}}\overbrace{P(c)}^{\textrm{prior}} c^=argmaxc∈C​P(f1​,f2​,…,fn​∣c)​likelihood​P(c)​prior​
Naive Bayes classifiers make two simplifying assumptions.

The first is the bag of words assumption discussed intuitively above: we assume position doesn’t matter, and that the word “love” has the same effect on classification whether it occurs as the 1st, 20th, or last word in the document. Thus we assume that the features f1,f2,…,fnf_1,f_2,\ldots, f_nf1​,f2​,…,fn​ only encode word identity and not position.

The second is commonly called the naive Bayes assumption: this is the conditional independence assumption that the probabilities P(fi∣c)P( f_i|c)P(fi​∣c) are independent given the class ccc and hence can be ‘naively’ multiplied as follows:
P(f1,f2,…,fn∣c)=P(f1∣c)⋅P(f2∣c)⋅…⋅P(fn∣c)P(f_1,f_2,\ldots, f_n|c)=P(f_1|c)\cdot P(f_2|c)\cdot \ldots \cdot P(f_n|c) P(f1​,f2​,…,fn​∣c)=P(f1​∣c)⋅P(f2​∣c)⋅…⋅P(fn​∣c)
The final equation for the class chosen by a naive Bayes classifier is thus:
cNB=argmaxc∈CP(c)∏f∈FP(f∣c)c_{NB}=\mathop{\textrm{argmax}}_{c\in C}P(c)\prod_{f\in F} P(f|c) cNB​=argmaxc∈C​P(c)f∈F∏​P(f∣c)
To apply the naive Bayes classifier to text, we need to consider word positions, by simply walking an index through every word position in the document:
positions ←all word positions in test documentcNB=argmaxc∈CP(c)∏i∈positionsP(wi∣c)\textrm{positions }\leftarrow \textrm{ all word positions in test document}\\ c_{NB}=\mathop{\textrm{argmax}}_{c\in C}P(c)\prod_{i\in positions} P(w_i|c) positions ← all word positions in test documentcNB​=argmaxc∈C​P(c)i∈positions∏​P(wi​∣c)
To avoid underflow and increase speed,
cNB=argmaxc∈Clog⁡P(c)+∑i∈positionslog⁡P(wi∣c)c_{NB}=\mathop{\textrm{argmax}}_{c\in C}\log P(c)+\sum_{i\in positions} \log P(w_i|c) cNB​=argmaxc∈C​logP(c)+i∈positions∑​logP(wi​∣c)
Classifiers that use a linear combination of the inputs to make a classification decision —like naive Bayes and also logistic regression— are called linear classifiers.

4.2 Training the Naive Bayes Classifier

Let NcN_cNc​ be the number of documents in our training data with class ccc and NdocN_{doc}Ndoc​ be the total number of documents. Then:
P^(c)=NcNdoc\hat P(c)=\frac{N_c}{N_{doc}} P^(c)=Ndoc​Nc​​
To learn the probability P(fi∣c)P( f_i|c)P(fi​∣c), we’ll assume a feature is just the existence of a word in the document’s bag of words, and so we’ll want P(wi∣c)P(w_i|c)P(wi​∣c), which we compute as the fraction of times the word wiw_iwi​ appears among all words in all documents of topic ccc. We first concatenate all documents with category ccc into one big “category c” text. Then we use the frequency of wiw_iwi​ in this concatenated document to give a
maximum likelihood estimate of the probability:
P^(wi∣c)=count(wi,c)∑w∈Vcount(w,c)\hat P(w_i|c)= \frac{count(w_i,c)}{\sum_{w\in V}count(w,c)} P^(wi​∣c)=∑w∈V​count(w,c)count(wi​,c)​
Here the vocabulary VVV consists of the union of all the word types in all classes, not just the words in one class ccc.

To solve the zero count problem, use add-one (Laplace) smoothing:
P^(wi∣c)=count(wi,c)+1∑w∈V(count(w,c)+1)=count(wi,c)+1(∑w∈Vcount(w,c))+∣V∣\hat P(w_i|c)= \frac{count(w_i,c)+1}{\sum_{w\in V}(count(w,c)+1)}=\frac{count(w_i,c)+1}{\left(\sum_{w\in V}count(w,c)\right)+|V|} P^(wi​∣c)=∑w∈V​(count(w,c)+1)count(wi​,c)+1​=(∑w∈V​count(w,c))+∣V∣count(wi​,c)+1​
Note once again that it is crucial that the vocabulary V consists of the union of all the word types in all classes, not just the words in one class ccc. The reason for this lies in the fact that, as is shown in Eq.9, wiw_iwi​with i∈positionsi\in positionsi∈positions denotes all words from all the training documents, not just one class ccc.

The solution for unknown words is to ignore them—remove them from the test document and not include any probability for them at all.

Finally, some systems choose to completely ignore another class of words: stop words, very frequent words like the and a. This can be done by sorting the vocabulary by frequency in the training set, and defining the top 10–100 vocabulary entries as stop words, or alternatively by using one of the many pre-defined stop word list available online.

training_data={"just plain boring":"-",
"entirely predictable and lacks energy":"-",
"no surprises and very few laughs":"-",
"very powerful":"+",
"the most fun film of the summer":"+"}import numpy as np
def naive_bayes_classifier(training_data):n_docs=len(training_data)classes=set(training_data.values())n_c={}logprior={}vocabulary=set()bigdoc={}  for key,value in training_data.items():vocabulary.update(key.split())if value not in bigdoc.keys():bigdoc[value]=key.split()n_c[value]=1else:bigdoc[value]+=key.split()n_c[value]+=1count={}loglikelihood={}for c in classes:count[c]={}loglikelihood[c]={}logprior[c]=np.log(1.0*n_c[c]/n_docs)for word in vocabulary:count[c][word]=bigdoc[c].count(word)loglikelihood[c][word]=np.log(1.0*(count[c][word]+1)/(len(bigdoc[c])+len(vocabulary)))        return vocabulary,classes,logprior,loglikelihoodvocabulary,classes,logprior,loglikelihood= naive_bayes_classifier(training_data)
def test_naive_bayes_classifier(test_data):sum={}for c in classes:sum[c]=logprior[c]for word in test_data.split():if word in vocabulary:sum[c]+=loglikelihood[c][word]return max(sum,key=sum.get)

4.3 Worked example

4.4 Optimization for Sentiment Analysis

Some small changes are generally employed that improve performance.

First, for sentiment classification and a number of other text classification tasks, whether a word occurs or not seems to matter more than its frequency. Thus it often improves performance to clip the word counts in each document at 1. This variant is called binary multinomial naive Bayes or binary NB.

A second important addition commonly made when doing text classification for sentiment is to deal with negation. A very simple baseline that is commonly used in sentiment to deal with negation is during text normalization to prepend the prefix NOT_ to every word after a token of logical negation (n’t, not, no, never) until the next punctuation mark. Newly formed ‘words’ like NOT_like, NOT_recommend will thus occur more often in negative document and act as cues for negative sentiment, while words like
NOT_bored, NOT_dismiss will acquire positive associations.

Finally, in some situations we might have insufficient labeled training data to train accurate naive Bayes classifiers using all words in the training set to estimate positive and negative sentiment. In such cases we can instead derive the positive and negative word features from sentiment lexicons, lists of words that are preannotated with positive or negative sentiment. Four popular lexicons are the General Inquirer (Stone et al., 1966), LIWC (Pennebaker et al., 2007), the opinion lexicon LIWC of Hu and Liu (2004a) and the MPQA Subjectivity Lexicon (Wilson et al., 2005).

A common way to use lexicons in a naive Bayes classifier is to add a feature that is counted whenever a word from that lexicon occurs. Thus we might add a feature called ‘this word occurs in the positive lexicon’, and treat all instances of words in the lexicon as counts for that one feature, instead of counting each word separately. Similarly, we might add as a second feature ‘this word occurs in the negative lexicon’ of words in the negative lexicon. If we have lots of training data, and if the test data matches the training data, using just two features won’t work as well as using all the words. But when training data is sparse or not representative of the test set, using dense lexicon features instead of sparse individual-word features may generalize better.

4.5 Naive Bayes for other text classification tasks

Consider the task of spam detection, a common solution here, rather than using all the words as individual features, is to predefine likely sets of words or phrases as features, combined these with features that are not purely linguistic.

For other tasks, like language ID—determining what language a given piece of text is written in—the most effective naive Bayes features are not words at all, but byte n-grams, 2-grams (‘zw’) 3-grams (‘nya’, ‘ Vo’), or 4-grams (‘ie z’, ‘thei’). Because spaces count as a byte, byte n-grams can model statistics about the beginning or ending of words. 2 A widely used naive Bayes system, langid.py (Lui and Baldwin, 2012) begins with all possible n-grams of lengths 1-4, using feature selection to winnow down to the most informative 7000 final features.

4.6 Naive Bayes as a Language Model

If we use only individual word features, and we use all of the words in the text (not a subset), then naive Bayes has an important similarity to language modeling. Specifically, a naive Bayes model can be viewed as a set of class-specific unigram language models, in which the model for each class instantiates a unigram language model.

Since the likelihood features from the naive Bayes model assign a probability to each word P(word∣c)P(word|c)P(word∣c), the model also assigns a probability to each sentence:
P(s∣c)=∏i∈positionsP(wi∣c)P(s|c)=\prod_{i\in positions}P(w_i|c) P(s∣c)=i∈positions∏​P(wi​∣c)

4.7 Evaluation: Precision, Recall, F-measure

We will refer to human labels as the gold labels.
Precision=true positivestrue positives+ false positives\textbf{Precision}=\frac{\textrm{true positives}}{\textrm{true positives+ false positives}} Precision=true positives+ false positivestrue positives​

Recall=true positivestrue positives+ false negatives\textbf{Recall}=\frac{\textrm{true positives}}{\textrm{true positives+ false negatives}} Recall=true positives+ false negativestrue positives​

Fβ=(β2+1)PRβ2P+RF_\beta =\frac{(\beta^2 +1)PR}{\beta^2P+R} Fβ​=β2P+R(β2+1)PR​

F1=2PRP+RF_1=\frac{2PR}{P+R} F1​=P+R2PR​

accuracy=tp+tntp+fp+tn+fnaccuracy=\frac{\textrm{tp+tn}}{\textrm{tp+fp+tn+fn}} accuracy=tp+fp+tn+fntp+tn​

Although accuracy might seem a natural metric, we generally don’t use it. That’s because accuracy doesn’t work well when the classes are unbalanced. Accuracy is not a good metric when the goal is to discover something that is rare, or at least not completely balanced in frequency, which is a very common situation in the world.

F-measure comes from a weighted harmonic mean of precision and recall. The harmonic mean of a set of numbers is the reciprocal of the arithmetic mean of reciprocals:
HarmonicMean(a1,a2,a3,…,an)=n1a1+1a2+1a3+…+1an\textrm{HarmonicMean}(a_1,a_2,a_3,\ldots,a_n) =\frac{n}{\frac{1}{a_1}+\frac{1}{a_2}+\frac{1}{a_3}+\ldots+\frac{1}{a_n}} HarmonicMean(a1​,a2​,a3​,…,an​)=a1​1​+a2​1​+a3​1​+…+an​1​n​
and hence F-measure is
F=1α1P+(1−α)1Ror (with β2=1−αα)F=(β2+1)PRβ2P+RF=\frac{1}{\alpha\frac{1}{P}+(1-\alpha)\frac{1}{R}}\textrm{ or }\left(\textrm{ with } \beta^2=\frac{1-\alpha}{\alpha}\right) \textrm{ }F =\frac{(\beta^2 +1)PR}{\beta^2P+R} F=αP1​+(1−α)R1​1​ or ( with β2=α1−α​) F=β2P+R(β2+1)PR​
Harmonic mean is used because it is a conservative metric; the harmonic mean of two values is closer to the minimum of the two values than the arithmetic mean is. Thus it weighs the lower of the two numbers more heavily.

4.7.1 More than two classes

There are two kinds of multi-class classification tasks. In any-of or multi-label classification, each document or item can be assigned more than one label. We can solve any-of classification by building separate binary classifiers for each class c, trained on positive examples labeled c and negative examples not labeled c. Given a test document or item d, then each classifier makes their decision independently,
and we may assign multiple labels to d.

More common in language processing is one-of or multinomial classification, multinomial classification in which the classes are mutually exclusive and each document or item appears in exactly one class. Here we again build a separate binary classifier trained on positive examples from c and negative examples from all other classes. Now given a test document or item d, we run all the classifiers and choose the label from the classifier with the highest score.

In order to derive a single metric that tells us how well the system is doing, we can combine these values in two ways. In macroaveraging, we compute the performance for each class, and then average over classes. In microaveraging, we collect the decisions for all classes into a single contingency table, and then compute precision and recall from that table.

As the figure shows, a microaverage is dominated by the more frequent class (in this case spam), since the counts are pooled. The macroaverage better reflects the statistics of the smaller classes, and so is more appropriate when performance on all the classes is equally important.

4.8 Test sets and Cross-validation

cross-validation: we randomly choose a training and test set division of our data, train our classifier, and then compute the error rate on the test set. Then we repeat with a different randomly selected training set and test set. We do this sampling process 10 times and average these 10 runs to get an average error rate.
This is called 10-fold cross-validation.

It is common to create a fixed training set and test set, then do 10-fold cross-validation inside the training set, but compute error rate the normal way in the test set.

4.9 Statistical Significance Testing

Let’s say we have a test set xxx of nnn observations x=x1,x2,…,xnx = x_1, x_2, \ldots, x_nx=x1​,x2​,…,xn​ on which A’s performance is better than B by δ(x)\delta(x)δ(x). How can we know if A is really better than B? To do so we’d need to reject the null hypothesis that A isn’t really better than B and this difference δ(x)\delta(x)δ(x) occurred purely by chance. If the null hypothesis was correct, we would expect that if we had many test sets of size n and we measured A and B’s performance on all of them, that on average A might accidentally still be better than B by this amount δ(x)\delta(x)δ(x) just by chance.

More formally, if we had a random variable XXX ranging over test sets, the null hypothesis H0H_0H0​ expects P(δ(X)>δ(x)∣H0)P(\delta(X) >\delta(x)|H_0)P(δ(X)>δ(x)∣H0​), the probability that we’ll see similarly big differences just by chance, to be high.

If we had all these test sets we could just measure all the δ(x′)\delta(x')δ(x′) for all the x′x'x′. If we found that those deltas didn’t seem to be bigger than δ(x)\delta(x)δ(x), that is, that p-value(xxx) was sufficiently small, less than the standard thresholds of 0.05 or 0.01, then we might reject the null hypothesis and agree that δ(x)\delta(x)δ(x) was a sufficiently surprising difference and A is really a better algorithm than B. Following Berg-Kirkpatrick et al. (2012)
we’ll refer to P(δ(x)>δ(x)∣H0)P(\delta(x) > \delta(x)|H_0)P(δ(x)>δ(x)∣H0​) as p-value(xxx).

In language processing we don’t generally use traditional statistical approaches like paired t-tests to compare system outputs because most metrics are not normally distributed, violating the assumptions of the tests. The standard approach to computing p-value(x) in natural language processing is to use non-parametric tests like the bootstrap test (Efron and Tibshirani, 1993) or a similar test, approximate randomization (Noreen, 1989). The advantage of these tests is that they can apply to any metric; from precision, recall, or F1 to the BLEU metric used in machine translation.

The word bootstrapping refers to repeatedly drawing large numbers of smaller samples with replacement (called bootstrap samples) from an original larger sample. The intuition of the bootstrap test is that we can create many virtual test sets from an observed test set by repeatedly sampling from it. The method only makes the assumption that the sample is representative of the population.

import numpy.random as rndx=[[1,1,1,0,1,0,1,1,0,1],[1,0,1,1,0,1,0,1,0,0]]
def delta(x):return (np.count_nonzero(x[0])-np.count_nonzero(x[1]))/len(x[0])def bootstrap(x,batches):s=0n=len(x[0])for i in range(batches):x_star=[[None]*n,[None]*n]for j in range(n):rnd_index= rnd.randint(0,n)x_star[0][j]=x[0][rnd_index]x_star[1][j]=x[1][rnd_index]if delta(x_star)>2*delta(x):s+=1p_value=1.0*s/batchesreturn p_value

4.10 Advanced: Feature Selection

Feature selection is a method of removing features that are unlikely to generalize well. The basis of feature selection is to assign some metric of goodness to each feature, rank the features, and keep the best ones. The number of features to keep is a meta-parameter that can be optimized on a dev set.

Features are generally ranked by how informative they are about the classification decision. A very common metric is information gain. Information gain tells us how many bits of information the presence of the word gives us for guessing the class, and can be computed as follows (where cic_ici​ is the ith class and wˉ\bar{w}wˉ means that a document does not contain the word www):
G(w)=−∑i=1CP(ci)log⁡P(ci)+P(w)∑i=1CP(ci∣w)log⁡P(ci∣w)+P(wˉ)∑i=1CP(ci∣wˉ)log⁡P(ci∣wˉ)G(w)=-\sum_{i=1}^CP(c_i)\log P(c_i)+P(w)\sum_{i=1}^CP(c_i|w)\log P(c_i|w)+P(\bar w)\sum_{i=1}^CP(c_i|\bar w)\log P(c_i|\bar w) G(w)=−i=1∑C​P(ci​)logP(ci​)+P(w)i=1∑C​P(ci​∣w)logP(ci​∣w)+P(wˉ)i=1∑C​P(ci​∣wˉ)logP(ci​∣wˉ)

4.11 Summary

This chapter introduced the naive Bayes model for classification and applied it to the text categorization task of sentiment analysis.

  • Many language processing tasks can be viewed as tasks of classification.
  • Text categorization, in which an entire text is assigned a class from a finite set, includes such tasks as sentiment analysis, spam detection, language identification, and authorship attribution.
  • Sentiment analysis classifies a text as reflecting the positive or negative orientation (sentiment) that a writer expresses toward some object.
  • Naive Bayes is a generative model that make the bag of words assumption (position doesn’t matter) and the conditional independence assumption (words are conditionally independent of each other given the class)
  • Naive Bayes with binarized features seems to work better for many text classification tasks.
  • Feature selection can be used to automatically remove features that aren’t helpful.
  • Classifiers are evaluated based on precision and recall.
  • Classifiers are trained using distinct training, dev, and test sets, including the use of cross-validation in the training set.

Chapter 4 Naive Bayes and Sentiment Classification相关推荐

  1. 机器学习之朴素贝叶斯方法(Naive Bayes)原理和实现

    目录 一.贝叶斯理论 二.实战朴素贝叶斯 实战朴素贝叶斯1 实战朴素贝叶斯3 三.scikit-learn中朴素贝叶斯的分类算法的适用 四.贝叶斯算法的优缺点 一.贝叶斯理论 贝叶斯模型 现在我们来看 ...

  2. 【自然语言处理】情感分析(一):基于 NLTK 的 Naive Bayes 实现

    情感分析(一):基于 NLTK 的 Naive Bayes 实现 朴素贝叶斯(Naive Bayes)分类器可以用来确定输入文本属于某一组类别的概率.例如,预测评论是正面的还是负面的. 它是 &quo ...

  3. 【Lifelong learning】Continual Learning with Knowledge Transfer for Sentiment Classification

    链接:http://arxiv.org/abs/2112.10021 简介 这是一篇在情感分类Sentiment Classification运用连续学习Continual Learning的pape ...

  4. matlab中的分类器使用小结(SVM、KNN、RF、AdaBoost、Naive Bayes、DAC)

    1.前言 目前了解到的MATLAB分类器有:K近邻,随机森林,朴素贝叶斯,集成学习方法,鉴别分析,支持向量机.现将其主要函数使用方法总结如下,更多细节需参考MATLAB 帮助文件. 设: 训练样本:t ...

  5. [机器学习] 分类 --- Naive Bayes(朴素贝叶斯)

    一.概率知识点复习 (1)条件概率 就是事件A在另外一个事件B已经发生条件下的发生概率.条件概率表示为P(A|B),读作"在B条件下A的概率". (2)联合概率 可以简单的理解为事 ...

  6. python文本分类算法_基于Naive Bayes算法的文本分类

    理论 什么是朴素贝叶斯算法? 朴素贝叶斯分类器是一种基于贝叶斯定理的弱分类器,所有朴素贝叶斯分类器都假定样本每个特征与其他特征都不相关.举个例子,如果一种水果其具有红,圆,直径大概3英寸等特征,该水果 ...

  7. naive bayes java_Naive Bayes(朴素贝叶斯)

    Naive Bayes Bayes' theorem(贝叶斯法则) 在概率论和统计学中,Bayes' theorem(贝叶斯法则)根据事件的先验知识描述事件的概率.贝叶斯法则表达式如下所示: $$ \ ...

  8. 机器学习——Naive Bayes算法

    文章目录 一.符号型数据的Naive Bayes算法 1.例子数据集 2. 基础理论 2.1 条件概率 2.2 独立性假设 2.3 Laplacian 平滑 二.数值型数据的Naive Bayes算法 ...

  9. 高斯贝叶斯(Gaussian Naive Bayes)基于Numpy的python实现

    学了贝叶斯以后,不用SKlearn现成的包,基于numpy自己实现了一下高斯贝叶斯算法.可以按照顺序把代码贴进去,自己跑一下试试. 导入需要的包 import time #调用时间,显示算法运行时间 ...

最新文章

  1. STM32开发 -- 4G模块开发详解(2)
  2. php银行转账,php+mysqli事务控制实现银行转账实例_PHP教程
  3. android学习1:初识Activity
  4. python和java选择哪个-Java、Python你会选择哪个?老男孩python
  5. 一个java文件里可以有多个类嘛?
  6. 简易抽奖软件逻辑实现
  7. 华硕win10键盘失灵_win10笔记本键盘失灵
  8. welearn考试切屏会有显示吗_welearn随行课堂班级测试答案
  9. yum 与pip区别
  10. Github上3.5k star 的微博爬虫,很赞!
  11. Unity 进阶 之 简单模仿鼠标交互(场景:手机屏幕当做触摸板Touch Pad,移动鼠标,鼠标确定等操作)
  12. 小程序发布测试版本步骤
  13. 两个时间序列之间的DTW(Dynamic Time Warping)距离度量
  14. 如何编写BI项目之ETL文档
  15. middel在c语言中的作用,Middle和medium的区别
  16. scratch做简单跑酷游戏_育儿这样做,宝宝更聪明,简单又实用的家庭感统训练游戏推荐...
  17. android-sdk:adb shell Monkey命令入门: (基于网易云APP进行压力稳定性测试)
  18. 2015年5月产品设计学习与思考
  19. 【源码】ConvertTDMS (v10)——将LabView TDMS文件导入或转换到MATLAB工作区或mat文件
  20. kettle调优之读写速度

热门文章

  1. 亚信安全认证acse_构建中国云生态|华云数据与亚信安全完成产品兼容互认证 携手推出“云安全防护联合解决方案”...
  2. MarkDown缩进和换行
  3. 1.1 电路和电路模型
  4. Alfresco Community 7 安装(5)安装Afresco War
  5. 计算机简介、电脑常用快捷键、DOS命令、java环境搭建
  6. 2021考研数学 高数第五章 定积分与反常积分
  7. 【OpenCV 4开发详解】图像模板匹配
  8. ROS中使用乐视 奥比中光(Astra Pro)深度相机显示彩色和深度图像
  9. 关闭Chrome 安全策略
  10. 程序员的数学 学习(指数爆炸)