精度,精确率,召回率

Hello folks, greetings. So, maybe you are thinking what’s so hard in precision and recall? Why just another article on this topic?

大家好,问候。 因此,也许您在思考精确度和召回率有何困难? 为什么只是关于该主题的另一篇文章?

I recommend reading this article with patience and a note and pencil in hand. Also, concentrate… Reread the same lines if needed.

我建议您耐心阅读本文,并准备好笔记和铅笔。 另外,集中精力……如果需要,请重读相同的内容。

I have hard time in remembering things. I tend to forget things that I haven’t used for a while. I tend to forget the FORMULAS of Precision and Recall over time.

我很难记住事情。 我倾向于忘记我已经一段时间没有使用过的东西了。 随着时间的流逝,我倾向于忘记“精确度”和“召回率”的公式。

BUT, I have a tendency to remake things up in my mind. In the high school, I was having hard time cramming things up. I couldn’t remember formulas for a long period. So, what I did was, understanding them in natural language (for ex: English). And then, during my exams, I would simply recreate the formula from my understanding. Such an ability also allowed me, at times, to invent new formulas. Actually, that wasn’t any kind of invention but it was specialization. But then, I was a kid at that time, right!! So, let’s keep that “invention” ;)

但是,我倾向于重新构想。 在高中时,我很难把东西塞满。 我很久都不记得公式了。 所以,我所做的就是用自然语言理解它们(例如:英语)。 然后,在考试期间,我将根据自己的理解简单地重新创建公式。 这种能力有时也使我能够发明新的公式。 实际上,这不是任何一种发明,而是专业化的。 但是那时候我还是个孩子,对吧! 因此,让我们保持“发明”;)

Jobs in Machine Learning
机器学习工作

Now, you might be thinking that “I am not here to hear your story”. But I am here to make you hear my story XD. Just Kidding! Let’s start..

现在,您可能会想“我不是来这里听听您的故事”。 但是我在这里是为了让您听到我的XD故事。 开玩笑! 开始吧..

So, let’s understand Precision and Recall in an intuitive manner. And then, you won’t need to Google up every time what they mean and how are they formulated.

因此,让我们以直观的方式了解“精确度”和“调用率”。 然后,您不必每次都了解Google的含义和方式时就使用Google。

Mostly, you might be aware to the terms TP, FP, TN and FN. But I have habit of explaining thoroughly. So, maybe you should skip that section if you know it.

通常,您可能知道术语TP,FP,TN和FN。 但是我有彻底解释的习惯。 因此,如果您知道的话,也许应该跳过该部分。

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — -

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

TP,FP,TN和FN (TP, FP, TN and FN)

Assume that you are performing a classification task. Let us keep it very simple. Suppose you are performing a single label image classification. This means that, the image belongs to one and only one of the given classes. Also, let’s make it simpler. Consider that there is only one class.

假设您正在执行分类任务。 让我们保持非常简单。 假设您正在执行单个标签图像分类。 这意味着,图像属于给定类别中的一个,并且仅属于其中一个类别。 另外,让我们简化一下。 考虑到只有一类。

Now, if you don’t know the difference between single label and multi label classification, just google a bit.

现在,如果您不知道单标签分类和多标签分类之间的区别,只需谷歌一下。

So, you are now performing binary image classification. For example, the task of whether an image contains a dog or not, belongs to this category.

因此,您现在正在执行二进制 图像分类。 例如,图像是否包含狗的任务属于此类别。

So, there are two target labels depending on if the predicted value is 1 or 0: dog and not dog. Consider being a dog as “positive” (1) and not being a dog as “negative” (0). In short, define positive as one of the two classes and negative as the other (leftover) class.

因此,根据预测值是1还是0,有两个目标标签:狗而不是狗。 考虑将狗视为“阳性”(1),而不将狗视为“阴性”(0)。 简而言之,将正定义为两个类之一,将负定义为另一个(剩余)类。

Now, you input an image to the model and the model predicts that the image is of a dog. This means that the model is “positive” that there is a dog. Now, the case is, the image isn’t actually of a dog. The image is of a person and not of a dog. Hence, the output of model is wrong. Wrong means “false”. This is an example to false positive.

现在,您将图像输入模型,模型将预测该图像是狗的。 这意味着该模型“肯定”有一只狗。 现在,情况是,图像实际上不是狗的。 该图像是一个人而不是一条狗。 因此,模型的输出是错误的。 错误的意思是“假”。 这是误报的一个例子。

Suppose, that image actually contained a dog. Then, the model was correct. Correct means “true”. Now this became an example to true positive.

假设该图像实际上包含一只狗。 然后,该模型是正确的。 正确表示“正确”。 现在,这已成为一个真正正面的例子。

So, true positive means that the model is positive and is correct. And false positive means that the model is positive but is wrong/incorrect.

因此,真正的肯定意味着模型是正确的并且是正确的。 误报是指模型为正,但错误/不正确。

Same goes for true negative and false negative. If the model predicts that there is no dog (i.e. negative) but, actually there is a dog, then the model is wrong. This becomes a case of false negative. Similarly, if the model predicted that there is no dog and the image actually doesn’t contain a dog, then the model is correct. This is a case of true negative.

真否定和假否定也一样。 如果模型预测没有狗(即阴性),但实际上有狗,则该模型是错误的。 这成为假阴性的情况。 同样,如果模型预测没有狗,并且图像实际上不包含狗,则该模型是正确的。 这是真正的消极情况

So, you guys got an idea of these terms. Let’s extend this for the whole training data instead of a single image. Suppose, you are classifying 100 images. The model classified 70 images correctly and 30 images incorrectly. Kudos! You now have a 70% accurate model.

因此,你们对这些术语有所了解。 让我们将其扩展到整个训练数据而不是单个图像。 假设您要分类100张图像。 该模型正确分类了70张图像,错误地分类了30张图像。 荣誉! 您现在拥有70%的准确模型。

Now, let’s focus on the correct images, i.e. TRUE classifications. Suppose, 20 of the 70 correctly classified images were not of dog, i.e. they were NEGATIVES. In this case, the value of TRUE NEGATIVES is 20. And hence, the value of TRUE POSITIVES is 50.

现在,让我们关注正确的图像,即TRUE分类。 假设正确分类的70张图像中有20张不是狗的,即它们是负片 。 在这种情况下, TRUE NEGATIVES的值为20。因此, TRUE POSITIVES的值为50。

热门AI文章: (Trending AI Articles:)

1. Machine Learning Concepts Every Data Scientist Should Know

1.每个数据科学家都应该知道的机器学习概念

2. AI for CFD: byteLAKE’s approach (part3)

2. CFD的人工智能:byteLAKE的方法(第3部分)

3. AI Fail: To Popularize and Scale Chatbots, We Need Better Data

3. AI失败:要普及和扩展聊天机器人,我们需要更好的数据

4. Top 5 Jupyter Widgets to boost your productivity!

4.前5个Jupyter小部件可提高您的生产力!

Now, consider the case of incorrectly classified images, i.e. FALSE classifications. Suppose, 10 images out of the 30 incorrectly classified images are of dogs i.e. POSITIVE. Then the value of FALSE POSITIVES became 10. Similarly, the value of FALSE NEGATIVES becomes 20.

现在,考虑图像分类错误的情况,即FALSE分类。 假设在30个错误分类的图像中,有10个图像是狗,即POSITIVE 。 然后, FALSE POSITIVES的值变为10。类似地, FALSE NEGATIVES的值变为20。

Now, let’s add up. TP + FP + TN + FN = 50 + 20 + 20 + 10 = 100 = size of training data.

现在,让我们加起来。 TP + FP + TN + FN = 50 + 20 + 20 + 10 = 100 =训练数据的大小

Remember: Positive/Negative refers to the prediction made by the model. And True/False refers to the evaluation of that prediction i.e. if the prediction made is correct (true) or incorrect (false).

请记住:正/负是指模型所做的预测。 正确/错误是指对该预测的评估,即,做出的预测是正确的(是)还是错误的(是)。

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — —-

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

So, now that you have understood these terms, let’s shift on to precision and recall. If you have read other articles in past, you might be thinking, what about the confusion matrix? Are you going to skip it? Maybe yes?! Maybe not! See, confusion matrix are, too confusing. The only reason they are needed, or the only reason they are included as a part of precision-recall articles, is that they help with the formulation of precision and recall.

因此,既然您已经理解了这些术语,那么让我们继续进行精确度和召回率。 如果您以前读过其他文章,您可能会想,那混淆矩阵会如何? 你要跳过吗? 也许是的?! 也许不会! 看到,混淆矩阵太混乱了。 需要它们的唯一原因,或将它们包含在精确召回文章中的唯一原因是,它们有助于制定精确性和召回率。

And as I said earlier, I am too bad at remembering formulas. So, let’s just invent (create) them.

正如我之前所说,我很难记住公式。 因此,让我们发明(创建)它们。

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — —-

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

介绍 (Introduction)

What does precision mean to you? Actually, the term precision is context dependent. Like, it depends on the task that you are performing. Whether you are solving a math problem, or you are performing image classification, or you are performing object detection, the term precision has different meanings in all the contexts. The current object detection metrics are just stumbled ones. They still use the same formula and then use additional calculations on precision and recall. No comments on that part. I ain’t a researcher, and hence, I can’t comment on how they calculate metrics.

精度对您意味着什么? 实际上,术语“精度”取决于上下文。 就像,它取决于您正在执行的任务。 无论您是要解决数学问题,还是要进行图像分类,还是要进行对象检测,术语“精度”在所有情况下都有不同的含义。 当前的对象检测指标只是偶然发现的指标。 他们仍然使用相同的公式,然后对精度和召回率使用其他计算。 该部分暂无评论。 我不是研究人员,因此,我无法评论他们如何计算指标。

For those who don’t know what do I mean by metrics, go Google it..

对于那些不了解指标的含义的人,请使用Google。

So, for now, let’s understand the meaning of precision in the mostly used formulation. Precision just calculates how precise your model is. Above I mentioned that your model is 70% accurate. But can you answer how precise it was? No..

因此,现在,让我们了解常用的公式中精度的含义。 精度只是计算模型的精度。 上面我提到您的模型是70%准确的。 但是你能回答它有多精确吗? 没有..

Accuracy, here, means the percentage of images correctly classified by the model. So, what does precision mean?

此处的准确性是指模型正确分类的图像的百分比。 那么,精度是什么意思呢?

The thing is, as I said, the concept of precision is context dependent. But, you are lucky enough that for evaluating ML models, the concept remains same throughout. But then, to understanding precision in an intuitive manner, you will need to understand at first “why do you need precision”?

正如我所说的,精度的概念取决于上下文。 但是,您很幸运,对于评估ML模型,这一概念始终保持不变。 但是,为了以直观的方式理解精度,您首先需要理解“为什么需要精度”?

Actually, you might have read several articles online related to what is precision and recall. But none of them clearly mentions why do you need them. Yes, there are separate articles covering this topic. I will share a reference at the end. But let me try to explain you here about why do you need precision and recall. I believe in completeness ;)

实际上,您可能已经在线阅读了几篇有关精度和召回率的文章。 但是他们都没有清楚提到您为什么需要它们。 是的,有单独的文章涉及此主题。 我将在最后分享参考。 但是,让我在这里向您解释为什么您需要精确度和召回率。 我相信完整性;)

Actually, the reference that I am going to share is quite good. Also, it lists the formulas of precision and recall. But, does it make you understand the formulas? No.. it just mentions that this is the formula and you have will to cram it. So, stay focused here XD

实际上,我要分享的参考文献非常好。 此外,它还列出了精确度和召回率的公式。 但是,这是否使您理解公式? 不。它只是提到这是公式,您必须填写它。 因此,请保持专注于XD

为什么需要召回? (Why do you need recall?)

“Are you serious? What about Precision? You just skipped all the things related to precision and directly jumped on to recall..” Yeah, I hear you. But, just wait and watch -_-.

“你是认真的吗? 精度呢? 您只是跳过了所有与精度有关的事情,直接跳回去。”是的,我听到了。 但是,请稍等一下-_-。

The real question is, “I have accuracy. My model is 99% accurate. Why do I still need recall?”. Now, this depends on the task you perform. If you are classifying whether the image is of a dog (positive class) or not of a dog (negative class), then accuracy is all you need. But, if you are classifying whether the person is infected by COVID-19 (-\_/-) or not, then you will need something else than accuracy. Let’s understand this with an example.

真正的问题是:“我有准确性。 我的模型是99%准确的。 为什么我仍然需要召回?”。 现在,这取决于您执行的任务。 如果要对图像是狗(正类)还是狗(负类)进行分类,则只需要准确性。 但是,如果您要对人员是否感染了COVID-19(-\ _ //-)进行分类,那么您将需要除准确性之外的其他功能。 让我们通过一个例子来理解这一点。

Suppose, you have 100 images to classify and task is to predict if it is a dog or not. Now, the model classified 9 images as positive and 91 images as negative.

假设您有100张图像要分类,任务是预测它是否是狗。 现在,该模型将9张图像分类为正,将91张图像分类为负。

Suppose, the values of TP, FP, TN and FN are 9, 0, 90, 1 respectively.

假设TP,FP,TN和FN的值分别为9、0、90、1。

Note that TP + FP = Positives = 9 and TN + FN = Negatives = 91.

注意,T P + F P = P正数= 9,T N + F N = N正数= 91。

That means, the model correctly classified 99 images out of 100. Note that correct implies true and trues = TP + TN = 9 + 90 = 99. That is, 99% accuracy.

这意味着,该模型正确地将100张图像中的99张分类了。注意,正确意味着true和trues = T P + T N = 9 + 90 =99。也就是说,精度为99%。

Here, the model miss-classified 1 image. Maybe, because it didn’t learn the features properly or maybe there’s some another reason like unbalanced dataset or something. But the thing to note is, that the model did miss-classify 1 image.

在这里,模型未分类1张图片。 可能是因为它没有正确学习功能,或者还有其他原因,例如数据集不平衡或其他原因。 但是要注意的是,该模型确实对1张图片进行了误分类。

If you don’t know what an unbalanced dataset means, and how can an unbalanced dataset cause such issues, Google it. Also, refer the references I share at the end.

如果您不知道不平衡的数据集的含义,以及不平衡的数据集如何导致此类问题,请使用Google。 另外,请参考我最后分享的参考资料。

You can do 99 things for someone and all they’ll remember is the one thing you didn’t do.

您可以为某人做99件事,而他们会记住的只是您没有做的一件事。

Remember the quote? Yes.. and we are going to do the same with our model. We are going to look at that 1 miss-classified image. Consider the task now. If we miss-classify an image as not a dog, how will it impact the users? It won’t, right? Or maybe just a little. Now, suppose the task was classifying if the image captured using CCTV in a small town contained a lion or not. And if there was a lion, alert all the citizens of town to be aware and hide themselves. Now, if the model miss-classified an image of lion, then it would have a huge impact on citizens.

还记得报价吗? 是的..我们将对我们的模型做同样的事情。 我们将看那张未分类的图像。 现在考虑任务。 如果我们将图像误分类为不是狗,它将对用户产生什么影响? 不会吧? 也许只是一点点。 现在,假设任务是对在一个小镇中使用CCTV捕获的图像是否包含狮子进行分类。 如果有一头狮子,请警惕镇上所有市民,并注意躲起来。 现在,如果该模型对狮子的图像进行了错误分类,那么它将对市民产生巨大影响。

Consider a more serious task. Classifying if the person is infected by COVID-19 or not. If he/she is affected, alert the emergency staff and quarantine him/her. What if that infected person is not quarantined? The virus would spread, right? The impact here of wrong/false classification is huge. Hence, even if the model is 99% accurate and it only miss-classified 1% of data, we will still tell the model that you made a mistake and ask it to improve.

考虑一个更严肃的任务。 分类此人是否被COVID-19感染。 如果他/她受到影响,请提醒紧急工作人员并隔离他/她。 如果该感染者没有被隔离怎么办? 病毒会传播,对吗? 错误/错误分类的影响很大。 因此,即使模型的准确性为99%,并且仅误分类了1%的数据,我们仍然会告诉模型您犯了一个错误,并要求对其进行改进。

Hence, we need something more than accuracy. And that metric is called recall. Now, in order to know how recall helps here, we will need to understand what is recall.

因此,我们需要的不仅仅是准确性。 该指标称为召回率。 现在,为了知道召回如何在这里有所帮助,我们需要了解什么是召回。

Remember.. You haven’t yet understood Precision. I skipped that part :(

记住..您尚未了解Precision。 我跳过了那部分:(

召回 (Recall)

What do you mean by recall in simple terms? Forget about AI/ML etc. What you mean by “I am trying to recall but I can’t”? Or “let me try to recall what happened”. Does “recall” equals “think”? No.. it’s “remember”. Actually, recall and remember, these two words have a slight difference in their meaning but are mostly the same. In both of the above two sentences, you can replace recall with remember and it would work fine.

简单地回忆一下是什么意思? 忘记AI / ML等。“我想回想但不能回想”是什么意思? 或“让我尝试回忆发生的事情”。 “召回”等于“思考”吗? 不,这是“记住”。 实际上,请记住和记住,这两个词的含义略有不同,但基本相同。 在以上两个句子中,您都可以将“ recall”替换为“ remember”,它会很好地工作。

So, recall = remember.

因此,回忆=记住。

The thing here is, our model needs to recall if the features of a person indicates that he/she COVID-19 positive. Our model needs to remember the features of COVID-19 positive class such that it does not miss-classify a COVID-19 positive case as negative.

这里的事情是,如果一个人的特征表明他/她的COVID-19阳性,我们的模型就需要回忆。 我们的模型需要记住COVID-19阳性案例的特征,以使其不会将COVID-19阳性案例误分类为阴性。

Recall can then be defined as, the number of positive classes correctly classified (remembered/recalled) by the model divided by total number of positive classes. Suppose, there are 50 positive classes in the dataset. Now, on running predictions on this dataset, model only predicts correctly 20 positive classes. This means that the model is only able to correctly remember 20 positive classes out of 50. And hence, the recall is 40%. (20/50 = 0.4)

然后可以将召回定义为通过模型正确分类(记忆/调用)的阳性类别数除以阳性类别总数 。 假设数据集中有50个阳性类别。 现在,在此数据集上运行预测时,模型只能正确预测 20个阳性类别。 这意味着该模型只能正确记住50个类别中的20个阳性类别。因此,召回率为40%。 (20/50 = 0.4)

Such a model predicting COVID-19 positive cases won’t work. Because, it is marking 60% COVID-19 positive cases as negative. And this number (60%) is too high to ignore.

这样的模型无法预测COVID-19阳性病例。 因为,它将60%的COVID-19阳性病例标记为阴性。 而且这个数字(60%)太高了,无法忽略。

So, recall = number of positive classes correctly predicted by the model / total number of positive classes.

因此, 召回率=由模型正确预测的阳性类别数/阳性类别总数。

The number of classes correctly (true) classified as positive equals TP. The total number of positive classes in the dataset equals TP + FN. Because, FN means that the model said “negative” and the model is “wrong”. Hence, it was actually “positive”.

正确地分类为正的类数(正确)等于TP。 数据集中阳性类别的总数等于TP + FN。 因为,FN表示模型表示“负”,而模型表示“错误”。 因此,它实际上是“积极的”。

That means, the invented formula is:recall = TP / (TP + FN)

这意味着,发明的公式为: 召回率= TP /(TP + FN)

Hence, “How is the recall of the model?” will simply answer the question “How many of the total positive datapoints (images) are correctly remembered by the model?”

因此,“如何召回该模型?” 只会回答“模型正确记住了多少个阳性数据点 (图像)?”的问题。

Total positive datapoints = TP + FN

总阳性数据点= TP + FN

Because, TP = Model predicts that the datapoint is positive and the model is correct i.e. datapoint is indeed positive.

因为,TP = Model预测数据点为正,而模型正确,即数据点确实为正。

And, FN = Model predicts that the datapoint is negative and the model is wrong here i.e. datapoint is positive.

并且,FN = Model预测数据点为负,此处模型错误,即数据点为正。

Also, datapoints correctly remembered by the model = TP + TNThat is, positive datapoints correctly remembered by the model = TP

此外,模型正确记住的数据点= TP + TN也就是说,模型正确记住的正数据点= TP

Finally, recall = positive datapoints correctly remembered / total positive datapoints = TP / (TP + FN)

最后, 召回=正确记住的阳性数据点/总阳性数据点= TP /(TP + FN)

So, remember that recall answers the question — How many of the total positive datapoints did the model correctly remember? Or, How well does the model recall positive datapoints?

因此,请记住,召回回答了这个问题-模型正确地记住了多少个阳性数据点? 或者,该模型对正数据点的回忆程度如何?

Wait.. what about TN and FP? Also, I have wrote “predicts correctly positive classes” all the time. So, what about the other cases? Like the case here “predicts incorrectly negative classes” i.e. classifying a person who is not infected with COVID-19 as positive. This became an example of FP. The model said that the person is infected but he/she isn’t. Now, does that matter? How much does it impact to quarantine a person who is not infected? A little yes? So, we can ignore it. Also, TN should be ignored as the prediction is true (correct).

等待..那TN和FP呢? 另外,我一直都在写“正确预测积极的课堂”。 那么,其他情况呢? 像这里的情况一样,“预测错误地归为阴性类别”,即将未感染COVID-19的人归为阳性。 这成为FP的一个例子。 模特说这个人被感染了,但他/她没有被感染。 现在,这有关系吗? 隔离未被感染的人有多大影响? 有点吗? 因此,我们可以忽略它。 另外,由于预测为真(正确),因此应忽略TN。

为什么需要精度? (Why do you need precision?)

I said that, if a person who isn’t infected with COVID-19 is predicted as infected (positive), then it does not matter. And you blindly believed me!

我说过,如果未感染COVID-19的人被预测为感染(阳性),那就没关系了。 而你却盲目相信我!

But but but.. What if you are living in North Korea? You will be shot dead if you are detected positive. “What the hell…. That’s a high impact. You can’t just ignore this. I want to live man!!” Yeah.. I hear these words too. So, that’s the reason you need precision.

但是,但是。。。如果你住在朝鲜怎么办? 如果检测到阳性,您将被枪杀。 “我勒个去…。 影响很大。 您不能只是忽略这一点。 我想活下去!!” 是的..我也听到了这些话。 因此,这就是您需要精度的原因。

There’s another reason too. What if I simply ask the model to classify all the images as positive? In this case, TP = x, FP = 100 - x (if size of dataset is 100), TN = 0 and FN = 0. Recall in this case would be, recall = 1 i.e. 100%.

还有另一个原因。 如果我只是简单地要求模型将所有图像分类为阳性,该怎么办? 在这种情况下,TP = x,FP = 100-x(如果数据集的大小为100),TN = 0和FN =0。在这种情况下,召回率为:召回= 1,即100%。

What the heck!!! This means that, we will shoot every human in North Korea as the model will classify all the citizens of North Korea as COVID-19 Positive and also, we trust the model because recall is 100%. Like seriously!!!

有没有搞错!!! 这意味着,我们将射击朝鲜的每一个人类,因为该模型会将朝鲜的所有公民归类为COVID-19正面,而且我们信任该模型,因为召回率是100%。 喜欢认真!!!

That is one other reason why we need precision.

这是我们需要精度的另一个原因。

The things went in this order:1. Only accuracy won’t work in certain tasks2. We need recall3. Only recall won’t work4. We need precision along with recall

事情按以下顺序进行:1。 在某些任务中,只有准确性不起作用2。我们需要召回3。 只有回忆不起作用4。 我们需要精确度和召回率

精确 (Precision)

Ahh.. now you know why I skipped precision. But remember, I have also skipped confusion matrix as it was too confusing.

啊..现在你知道为什么我跳过了精度。 但请记住,我也跳过混淆矩阵,因为它 混乱。

At this stage, you should already know that precision will have something to do with FP. If you haven’t guessed this, go re-read the above two sections.

在此阶段,您应该已经知道精度将与FP有关。 如果您还没猜到,请重新阅读以上两节。

Consider the last example where model was simply classifying all the citizens as COVID-19 positive. In this case, though the recall of model is high (100%), the precision of model is very low. Hence, as with other topics in Machine Learning, here too, there is a trade-off. Just like bias-variance trade-off, there is precision-recall trade-off.

考虑最后一个示例,其中模型只是将所有市民归类为COVID-19阳性。 在这种情况下,尽管模型的召回率很高(100%),但是模型的精度却很低。 因此,与机器学习中的其他主题一样,这也是一个折衷。 就像偏差方差折衷一样,也存在精确调用折衷。

After reading this article, I need you to prove mathematically about why there’s a trade-off in precision and recall. And yeah.. Google a bit too. If you are successful, then leave a comment here of the method you used to prove this.

阅读本文之后,我需要您用数学方式证明为什么在精度和召回率之间需要权衡取舍。 是的.. Google也有点。 如果成功,请在此处留下您用来证明这一点的方法的评论。

So, we need the model to also take care of “not miss-classifying negative samples” i.e. not marking an uninfected (negative) person as infected (positive).

因此,我们需要该模型还要注意“不要对阴性样本进行误分类”,即不要将未感染(阴性)的人标记为感染(阳性)。

We can do this by defining precision as the number of correct positive cases divided by the number of predicted positive cases. For example, if the number of positive cases in the dataset is 50 and the model predicts that the number of positive cases is 80. Now, out of these 80 cases, only 20 predicitons are correct and other 60 are incorrect. That means, 20 cases are predicted positive and correct i.e. TP = 20. And 60 cases are predicted positive but are incorrect i.e. FP = 60.

我们可以通过将精度定义为正确的阳性病例数除以预测的阳性病例数来做到这一点。 例如,如果数据集中的阳性病例数为50,并且模型预测阳性病例数为80。现在,在这80个病例中,只有20个正确的谓词,其他60个不正确。 也就是说,有20例被预测为阳性且正确,即TP =20。而有60例被预测为阳性但不正确,即FP = 60。

As you can see, the model is not at all precise. The model says that 80 cases are positive out of which only 20 cases are actually positive. Here, precision = 20/80 = 25% .

如您所见,模型一点也不精确。 该模型说,有80例是阳性的,而实际上只有20例是阳性的。 在此,精度= 20/80 = 25%。

We simply formulated precision above. Precision = TP / (TP + FP)

我们只是在上面简单地制定了精度。 精度= TP /(TP + FP)

Understanding this in an intuitive way, “How precise your model is?” answers the question “How many datapoints are actually positive out of the total number of predicted positive datapoints?”

以直观的方式理解这一点,“您的模型有多精确?” 回答问题“预测的阳性数据点总数中实际上有多少个阳性数据点?”

So, remember that precision answers the question — How many of the claimed (predicted) positive datapoints are actually positive? Or, How precise is the model in predicting positive datapoints?

因此,请记住,精度回答了这个问题-多少个声称的(预测的)正数据点实际上是正的? 或者,该模型在预测正面数据点时有多精确?

结论 (Conclusion)

Both the definitions of precision and recall matches their meaning in English.

精度和召回率的定义都与英语中的含义相匹配。

Like, how many positive datapoints (out of the total number of positive datapoints) does the model remember? — Recall

像该模型记住多少个阳性数据点(在阳性数据点总数中)? —召回

And, how many (of the total predicted positive datapoints) are actually positive? — Precision

而且,实际上有多少(在总预测的积极数据点中)是积极的? —精度

If you just understand what these two questions mean, you can then rebuild the formulas whenever you need them. If you don’t understand these questions clearly, try to translate them in your local language (mine is Gujarati) and you will be able to understand it.

如果您仅了解这两个问题的含义,则可以在需要时重新构建公式。 如果您不清楚地理解这些问题,请尝试以您的本地语言翻译(我的语言是古吉拉特语),您将能够理解。

Wait wait.. is it going to end? What about the confusion matrix?

等等等等..它会结束吗? 那混乱矩阵呢?

Confusion matrix is just used in order to visualize all these things and help you cram the formulas. I won’t cover it! But yes, I will help you cram formulas using confusion matrix here.

混淆矩阵仅用于可视化所有这些内容并帮助您填充公式。 我不会报道! 但是,是的,我会在这里帮助您使用混淆矩阵填充公式。

Here is an image that will help you cram the denominators of precision and recall formulas. The numerator being same is TP.

这是一张可以帮助您填充精度和召回公式分母的图像。 分子相同是TP。

Cancer being the positive class instead of COVID-19
癌症是阳性类别,而不是COVID-19

What more? Nothing.. Maybe what I have written is too confusing. Maybe it is not. I don’t know. Just leave your comments, bad or good, so that I can know.

还有什么? 没什么..也许我写的太混乱了。 也许不是。 我不知道。 请留下您的评论,好坏,以便我知道。

But yeahh.. something more to do by yourself. Go read about F1 score and why you need it? — short answer — because of the trade-off between precision and recall. How would you select a model? Based on precision? Or based on recall? The answer is F1 score. Go read it..

但是,是的。.还有更多事情要做。 去阅读有关F1分数的信息,为什么需要它? —简短的答案—因为要在精度和召回率之间进行权衡。 您将如何选择型号? 基于精度? 还是基于召回? 答案是F1分数。 去看吧..

The reference I promised is here:- https://towardsdatascience.com/beyond-accuracy-precision-and-recall-3da06bea9f6c

我答应过的参考资料在这里: -https : //towardsdatascience.com/beyond-accuracy-precision-and-recall-3da06bea9f6c

What more? Read about ROC curve, mAP and AR. Or wait for me to post about it.. Bye!

还有什么? 了解ROC曲线,mAP和AR。 或等待我发布相关信息。再见!

别忘了给我们您的

精度,精确率,召回率_了解并记住精度和召回率相关推荐

  1. auc计算公式_图解机器学习的准确率、精准率、召回率、F1、ROC曲线、AUC曲线

    机器学习模型需要有量化的评估指标来评估哪些模型的效果更好. 本文将用通俗易懂的方式讲解分类问题的混淆矩阵和各种评估指标的计算公式.将要给大家介绍的评估指标有:准确率.精准率.召回率.F1.ROC曲线. ...

  2. python写因子策略_单因子策略进阶版 本篇延续(第三期:单因子策略入门版),介绍如何使用优矿平台编写策略代码,以股息率作为择股条件,自动筛选出股息率前十名的股... - 雪球...

    来源:雪球App,作者: 爱喝豆汁的投资者,(https://xueqiu.com/2680567071/130470562) 本篇延续(第三期:单因子策略入门版),介绍如何使用优矿平台编写策略代码, ...

  3. matlab 端点检测 能零比法_基于短时能量与过零率的端点检测的matlab分析

    基于语音端点检测的方法有很多,从历史的发展来看. 首先是基于短时能量和短视过零率的端点检测=〉各变换域=〉人工神经网络=〉基于倒谱距离的检测算法=〉基于谱熵的方法=〉几何门限的方法=〉sigma函数= ...

  4. 乖离率背离公式_乖离率BIAS操盘口诀:“W底上涨为先,M顶下跌为先”仅12字,赚得盆满钵满...

    一个容易被人忽略的指标:乖离率,简称Y值,是移动平均原理派生的一项技术指标,其功能主要是通过测算股价在波动过程中与移动平均线出现偏离的程度.如果股价偏离移动平均线太远,不管股价在移动平均线之上或之下, ...

  5. 乖离率背离公式_股价偏离率是什么 BIAS指标计算公式-BIAS-技术指标-股票入门基础知识学习网...

    乖离率(BIAS),又称偏离率,简称Y值,是通过计算市场指数或收盘价与某条移动平均线之间的差距百分比,以反映一定时期内价格与其MA偏离程度的指标,那么股价偏离率是什么呢? 股价偏离率是什么 股价偏离率 ...

  6. 乖离率背离公式_股市偏离率怎么计算 公式是怎么样的?-BIAS-技术指标-股票入门基础知识学习网...

    炒股的人有很多,不同的股票当然大家都需要去了解,乖离率(BIAS),又称偏离率.是通过计算市场指数或收盘价与某条移动平均线之间的差距百分比,以反映一定时期内价格与其MA偏离程度的指标,那么股市偏离率怎 ...

  7. JAVA召回算法_推荐系统召回策略之多路召回与Embedding召回

    图1. 推荐系统整体架构 推荐系统学习笔记系列链接: 1. 多路召回 1.1 概述 所谓的"多路召回策略"就是指采用不同的策略.特征或者简单模型,分别召回一部分候选集,然后再把这些 ...

  8. python 浮点数精度丢失_浮点数精度问题透析:小数计算不准确+浮点数精度丢失根源...

    浮点数精度问题透析:小数计算不准确+浮点数精度丢失根源 无论在java python javaScript里面都存在 1+ 2!== 3 问题,这个问题的产生根源在于计算存储数字是二进制,对无限循环小 ...

  9. c# 取余数 浮点数_浮点数精度问题透析:小数计算不准确+浮点数精度丢失根源

    在知乎上上看到如下问题: 浮点数精度问题的前世今生? 1.该问题出现的原因 ? 2.为何其他编程语言,比如java中可能没有js那么明显 3.大家在项目中踩过浮点数精度的坑? 4.最后采用哪些方案规避 ...

最新文章

  1. 智能车竞赛技术报告 | 节能信标组组 - 大连民族大学 - 粉红靓车队
  2. 初中计算机应用教什么,信息技术在初中数学教学中的应用
  3. 计算机组成原理课设移位,计算机组成原理课设(多寄存器减法、右移位、输入输出、转移指令实验计算机设计)...
  4. 算法与数据结构(python):递归
  5. 15个最受欢迎的Python开源框架(转载)
  6. 推荐一个采用方便程序员在线动画学习常用算法的良心网站
  7. 【图像缩放】双立方(三次)卷积插值
  8. leetcode718 最长重复子数组
  9. python中的库有哪些餐厅_2017,最受欢迎的 15 大 Python 库有哪些?
  10. toolBar——工具栏
  11. elm的 java包_Elm架构
  12. AS3文本框的操作,为密码框添加按钮
  13. iOS - NSURLSession 网络请求
  14. APK可视化修改工具:APK改之理(APK IDE)
  15. 域名被抢注的知名案例有哪些?
  16. iOS SafeArea安全区域
  17. 基于 Si446x 上的 RSSI 测量的自动操作
  18. OllyDbg 常用命令 【Pray收集整理】
  19. Android录屏技术方案
  20. 【深度学习-数据加载优化-训练速度提升一倍】

热门文章

  1. 微信小程序走出国门,国际化将指日可待?
  2. Queue(队列)-Swift实现与广度优先搜索应用
  3. ASP.net mvc Code First 更新数据库
  4. ir2104s的自举电容_电赛必备,IR2104S半桥驱动MOS管电机驱动板(PCB工程文件+磁悬浮代码)...
  5. 奥克兰大学计算机科学与技术,奥克兰大学与2016级计算机科学技术专业(中外合作办学)学生见面会顺利进行...
  6. 深度学习之基于Tensorflow2.0实现Xception网络
  7. 我的世界java版游戏崩溃_我的世界:MC不一样的冷知识,游戏崩溃?没想到你是这样的F3!...
  8. Mac osx系统中virtual box 中的Ubuntu系统的全屏显示问题解决
  9. python的线程组怎么写_Python学习——Python线程
  10. java axis webservice_Axis Webservice框架使用案例