http://blog.csdn.net/pipisorry/article/details/44119187

机器学习Machine Learning - Andrew NG courses学习笔记

Machine Learning System Design机器学习系统设计

Prioritizing What to Work On优先考虑做什么

the first decision we must make is how do we want to represent x, that is the features of the email.


Note:feature的选择

1. chose a hundred words to use for this representation manually.

2. in practice,look through a training set, and in the training set depict(描述) the most frequently occurring n words where n is usually between ten thousand and fifty thousand, and use those as your features.

用数据预处理降低错误率

Note:

1. getting lots of data will often help, but not all the time.

2. when spammers send email,very often they will try to obscure(隐藏) the origins of the email, and maybe use fake email headers.Or send email through very unusual sets of computer service.Through very unusual routes, in order to get the spam to you.
3. the spam classifier might not equate "w4tches" as "watches," and so it may have a harder time realizing that something is spam with these deliberate misspellings.And this is why spammers do it.

皮皮blog

Error Analysis 错误分析

{help give you a way to more systematically make some of these decisions of different ideas on how to improve the algorithm.quick way to let you identify some errors and quickly identify what are the hard examples so that you can focus your efforts on those.}

设计机器学习系统的建议步骤

Note: error analysis on the emails would inspire you to design new features.Or they'll tell you whether the current things or current shortcomings of the system and give you the inspiration you need to come up with improvements to it.

错误分析的一个例子

Note:

1. 计算准确率Accuracy = (true positives + true negatives) / (total examples)判断

2. by counting up the number of emails in these different categories that you might discover, for example, that the algorithm is doing really particularly poorly on emails trying to steal passwords, and that may suggest that it might be worth your effort to look more carefully at that type of email, and see if you can come up with better features to categorize them correctly.
3. a strong sign that it might actually be worth your while to spend the time to develop more sophisticated features based on the punctuation.

numerical evaluation of your learning algorithm

note:

1. using a stemming software can help but it can hurt.
2. We'll see later, examples where coming up with this, sort of, single row number evaluation metric may need a little bit more work.then let you make these decisions much more quickly.

皮皮blog

Error Metrics for Skewed Classes有偏类的错误度量(精确度/召回率)

skewed class: in this case, the number of positive examples is much,much smaller than the number of negative examples.有偏类就是两类数据量不平衡,如正样本类的数目比负样本类的数目多得多,这时准确率accuracy并没有什么卵用了。

Note:

1. So a non learning algorithm just predicting y equals 0 all the time is even better than the 1% error.

2. By going from 99.2% accuracy to 99.5% accuracy.we just need a good change to the algorithm or not?it becomes much harder to use just classification accuracy, because you can get very high classification accuracies or very low errors, and it's not always clear if doing so is really improving the quality of your classifier because predicting y equals 0 all the time doesn't seem like a particularly good classifier.

faced with such a skewed classes therefore come up with a different error metric called precision recall.

Precision/Recall精确度/召回率

Note:

1. a learning algorithm that predicts y equals zero all the time,then recall equal to zero,recognize that just isn't a very good classifier.
2. defined setting y equals 1, rather than y equals 0, to be sort of that the presence of that rare class that we're trying to detect.哪个类别设为1哪个为0计算出的precision和recall是不一样的,一般选择类别中样本少的那个类为1。
总结 : precision recall is often a much better way to evaluate our learning algorithms,than looking at classification error or classification accuracy, when the classes are very skewed.

[1.6 误差类型Types of errors-常见的误差度量方法]

皮皮blog

Trading Off Precision and Recall权衡精度和召回率:F1值

Note:

1. tell someone that we think they have cancer only if they're very confident.that instead of setting the threshold at 0.5.
2. the position recall curve can look like many different shapes, depending on the details of the classifier.

3. 判断threshole变化给P\R带来的影响: Lowering the threshold means more y = 1 predictions, 而recall的分母是不变的!先看recall变大还是变小,再判断precision怎么变化

A way to choose this threshold automatically?How do we decide which of these algorithms is best?

A way of combining precision recall called the f score.

皮皮blog

Data For Machine Learning数据影响机器学习算法的表现

{the issue of how much data to train on}

Note:

1. 而不是include high order polynomial features of x.

2. hopefully even though we have a lot of parameters but if the training set is sort of even much larger than the number of parameters then hopefully these albums will be unlikely to overfit.
3. Finally putting these two together that the train set error is small and the test set error is close to the training error what this two together imply is that hopefully the test set error will also be small.

4. A sufficiently large training set will not be overfit

总结:if you have a lot of data and you train a learning algorithm with lot of parameters, that might be a good way to give a high performance learning algorithm.

皮皮blog

Review:


from:http://blog.csdn.net/pipisorry/article/details/44245513

ref: [机器学习模型的评价指标和方法]

Machine Learning - XI. Machine Learning System Design机器学习系统设计(Week 6)系统评估标准相关推荐

  1. 【动态规划】Lighting System Design 照明系统设计

    Description 你的任务是设计一个照明系统.一共有n(n≤1000)种灯泡可供选择,不同种类的灯泡必须用不同的电源,但同一种灯泡可以共用一个电源.每种灯泡用4个数值表示:电压值V(V≤1320 ...

  2. 关于机器学习系统设计的一些思路

    Machine Learning System Design[机器学习系统设计] 主要涉及在设计复杂的机器学习系统时,可能遇到的主要问题.同时,我们也会试着给出一些关于如何巧妙构建一个复杂的机器学习系 ...

  3. Coursera公开课笔记: 斯坦福大学机器学习第十一课“机器学习系统设计(Machine learning system design)”

    Coursera公开课笔记: 斯坦福大学机器学习第十一课"机器学习系统设计(Machine learning system design)" 斯坦福大学机器学习斯坦福大学机器学习第 ...

  4. 吴恩达机器学习系列课程笔记——第十一章:机器学习系统的设计(Machine Learning System Design)

    11.1 首先要做什么 https://www.bilibili.com/video/BV164411b7dx?p=65 在接下来的视频中,我将谈到机器学习系统的设计.这些视频将谈及在设计复杂的机器学 ...

  5. Machine learning system design - Error analysis

    摘要: 本文是吴恩达 (Andrew Ng)老师<机器学习>课程,第十二章<机器学习系统设计>中第94课时<误差分析>的视频原文字幕.为本人在视频学习过程中记录下来 ...

  6. Machine Learning week 6 quiz: Machine Learning System Design

    Machine Learning System Design 5 试题 1. You are working on a spam classification system using regular ...

  7. 吴恩达Coursera, 机器学习专项课程, Machine Learning:Advanced Learning Algorithms第三周编程作业...

    吴恩达Coursera, 机器学习专项课程, Machine Learning:Advanced Learning Algorithms第三周所有jupyter notebook文件: 吴恩达,机器学 ...

  8. 吴恩达Coursera, 机器学习专项课程, Machine Learning:Advanced Learning Algorithms第二周编程作业...

    吴恩达Coursera, 机器学习专项课程, Machine Learning:Advanced Learning Algorithms第二周所有jupyter notebook文件: 吴恩达,机器学 ...

  9. paper survey ——deep learning or machine learing and optical communication

    machine learning 或者说deep learning已经被广泛应用于各种领域,之前本人也发表了几篇ML或者DL跟VLC相结合的论文.本博文主要是对16年后ML或DL跟optical co ...

  10. Netural Machine Translation By Joinly Learning To Align And Translate

    参考论文:Netural Machine Translation By Joinly Learning To Align And Translate 这篇论文应该是attention系列论文的鼻祖论文 ...

最新文章

  1. 到底是把甲方当爸爸还是当甲方爸爸
  2. JSONObject和JSONArray 以及Mybatis传入Map类型参数
  3. 重装系统后,快盘不能拷贝进文件的解决办法
  4. 小程序基础能力~网络
  5. 【Flink】Flink StreamingFileSink
  6. AHT20温湿度传感器STM32-I2C驱动,替代DHT11/DHT12/AM2320/SHT20/SHT30,IIC代码兼容AHT10/15-MEMS温湿度传感器
  7. poj Gone Fishing 枚举加贪心 当初做的很纠结啊!!终于A了,与大家分享一下经验
  8. (解决办法)Windows Server 2003安装sp1时说产品密钥无效
  9. win10系统下载文件、解压缩文件时文件名称乱码的问题解决方法
  10. c#实现json转kml、kmz、shp格式
  11. 贪吃蛇-单机游戏-微信小程序项目开发流程详解
  12. Mysql sql执行错误#1436 Thread stack overrun:
  13. SIFT@David G. Lowe
  14. 2022-11-18 mysql列存储引擎-assert failed on i < m_idx.size() at rc_attr.h:342-问题分析
  15. Java练习ArrayList的运用——勇者斗史莱姆
  16. 作为元宇宙的雏形:GameFi领域年末正在不断成长
  17. python的tesseract库几个重要的命令
  18. games101 作业4
  19. 解决办法:Ubuntu 16.04 【缺少依赖】导致出现该错误——ERROR: the following packages/stacks could not have their
  20. QT信号槽第五个参数

热门文章

  1. Linux上层应用--Shell scripts基础规范
  2. SQL使用LIKE匹配
  3. 最短路+状压DP【洛谷P3489】 [POI2009]WIE-Hexer
  4. Android 服务
  5. Codeforces Round #450
  6. 设计模式:第二章--抽象工厂模式
  7. nw.js桌面软件开发系列 第0.1节 HTML5和桌面软件开发的碰撞
  8. Eclipse+Tomcat WEB开发配置
  9. 得到css style
  10. 通过 JavaScript调用Asp.net(C#)后台方法