机器学习实际应用

Some of my previous introductory posts to machine learning and data science were a bit technical. However, my purpose of this post is to explain some of the practical use-cases of ML solely from a non-technical savvy layman’s perspective who has had nil exposure to it previously. To satisfy your curiosity, I will also mention the specific ML algorithms that are generally applicable to each use-case if you want to learn more about them.

我以前的一些机器学习和数据科学入门文章有些技术性。 但是,我的这篇文章的目的是仅从非技术过硬的外行的角度解释ML的一些实际用例,而以前他几乎没有接触过它。 为了满足您的好奇心,如果您想了解更多有关它们的信息,我还将提到通常适用于每个用例的特定ML算法。

What type of problems does ML help us with? Irrespective of the specific domain, what answers or actionable insights it offers? Instead of the ‘how’, our focus here will be more on ‘what’ and ‘why’.

机器学习可以帮助我们解决哪些类型的问题? 无论特定领域如何,它提供了哪些答案或可行的见解? 除了“如何”,我们在这里的重点将更多地放在“什么”和“为什么”上。

这是什么? A还是B? (What is This? A or B?)

This family of ML algorithms predicts in which one of the only two possible categories an observation belongs to. There is no other third potential option. Consider that the management wants to predict which of your existing customers will churn. The answer can only be whether a specific customer will churn or not. Other practical examples include:

这个ML算法系列可预测观察结果仅属于两种可能的类别之一。 没有其他第三种可能的选择。 考虑到管理层希望预测您现有的哪些客户会流失。 答案只能是特定客户是否会流失。 其他实际示例包括:

  • Is this email spam or not?这是垃圾邮件吗?
  • Will this customer default or not?该客户是否会违约?
  • Are these symptoms symptomatic of a specific disease or not?这些症状是否是特定疾病的症状?
  • Will this customer continue with a purchase or not?该客户会继续购物吗?
  • Is this an image of a boy or a girl?这是男孩还是女孩的画像?

Formally known as Binary Classification, the relevant algorithms include:

正式称为二进制分类 ,相关算法包括:

  • Logistic Regression逻辑回归
  • Support Vector Machine支持向量机
  • k-Nearest Neighbork最近邻居
  • Classification Decision Tree分类决策树

这是什么? A或B或C或D(或其他)? (What is This? A or B or C or D (Or Something Else)?)

An extension of binary classification, here, the number of potential categories can be more than two. Consider that you are working on a face recognition model; the person in a specific picture can be any of the individuals in your database. The number of possible correct answers is only limited to the amount of data used during model development. Other practical examples include:

二进制分类的扩展,在这里,潜在类别的数量可以超过两个。 考虑您正在开发人脸识别模型; 特定图片中的人可以是您数据库中的任何人。 可能的正确答案的数量仅限于模型开发期间使用的数据量。 其他实际示例包括:

  • Optical Character Recognition: which character is this?光学字符识别:这是哪个字符?
  • Which animal is in this image?该图像中的哪只动物?
  • Which genre does this movie belong to?这部电影属于哪种类型?
  • Sentiment Analysis: what is the feeling associated with this tweet?情绪分析:此推文有什么感觉?
  • Whose voice is it in this audio recording?这段录音是谁的声音?

Formally known as Multi-Class Classification, the relevant algorithms include:

正式称为多类分类 ,相关算法包括:

  • Random Forests随机森林
  • Classification Decision Tree分类决策树
  • XGBoostXGBoost
  • k-Nearest Neighbork最近邻居
  • Artificial Neural Networks人工神经网络

有多少期望值? (How Much or How Many of Something To Expect?)

This family of ML algorithms predicts quantities of something as a continuous output or number (i.e., the prediction can be any of the unlimited numbers of possible outcomes). There are no fixed possible categories that can be predicted — for example, predicting sales volume for the next quarter. That sales prediction can be 1,000 units, 10,000 units, 1,200 units, or any other positive real number.

这个ML算法系列以连续的输出或数量的形式预测某物的数量(即,该预测可以是无限数量的可能结果)。 没有可以预测的固定可能类别,例如,预测下一季度的销量。 该销售预测可以是1,000个单位,10,000个单位,1,200个单位或任何其他正实数。

The output of these algorithms can be any real number (positive, negative, zero, fractions); however, your specific use-case will determine whether negatives or fractions can be expected and accepted. For example, a sales forecast cannot be negative.

这些算法的输出可以是任何实数(正数,负数,零,分数)。 但是,您的特定用例将确定是否可以预期和接受负数或分数。 例如,销售预测不能为负。

Other practical use-cases of this class of algorithms include:

此类算法的其他实际用例包括:

  • What will be tomorrow’s temperature?明天的温度是多少?
  • How many prospects can we sign up as customers in the next quarter?在下一季度,我们可以签约多少潜在客户?
  • What will be our energy consumption next month?下个月我们的能源消耗是多少?
  • How long will it take for an event to occur?事件发生需要多长时间?

Formally known as Regression, the relevant algorithms include:

正式称为回归 ,相关算法包括:

  • Linear Regression线性回归
  • Regression Decision Tree回归决策树
  • XGBoostXGBoost
  • Artificial Neural Networks人工神经网络

该数据正常还是异常? (Is This Data Normal or Abnormal?)

Oftentimes, we are more interested in whether a specific observation is atypical, abnormal, or anomaly. Or is it merely a normal and usual observation. We can have historical observations classified as abnormal or not. Or it could be the case that such historical classification does not exist, and an ML algorithm will be used to detect any outliers.

通常,我们对特定观察结果是非典型,异常还是异常更感兴趣。 还是仅仅是正常和通常的观察。 我们可以将历史观测分为异常与否。 或者可能是不存在这种历史分类的情况,并且将使用ML算法来检测任何异常值。

Typical use-cases include:

典型的用例包括:

  • Is this purchase materially different from the customer’s past purchases?这次购买与客户过去的购买有实质性的不同吗?
  • Is this traffic pattern from a computer network typical?来自计算机网络的这种流量模式是否典型?
  • Are these outputs from a piece of industrial equipment atypical?这些来自工业设备的输出是否非典型?

Formally known as Outlier or Anomaly Detection, the relevant algorithms include:

正式称为异常值或异常检测 ,相关算法包括:

  • Isolation Forest隔离林
  • Density-Based Spatial Clustering of Applications with Noise (DBSCAN)基于密度的噪声应用空间聚类(DBSCAN)
  • Z-Scores (not technically an ML algorithm, instead a statistical test to identify outliers)Z分数(从技术上讲不是ML算法,而是统计测试以识别异常值)
  • One-Class Support Vector Machine一类支持向量机

我们如何组织这些数据? (How Can We Organize this Data?)

Are there any underlying identifiable characteristics that can be used to categorize and organize data into specific groups (also known as clusters or segments)? These unique characteristics are not known to us, and often, even the number of potential clusters is unknown. Clustering or organizing your data may assist you with further analysis or developing cluster-specific strategies.

是否存在可用于将数据分类和组织为特定组(也称为群集或段)的潜在可识别特征? 这些独特的特征对我们来说是未知的,而且甚至潜在簇的数量通常也是未知的。 对数据进行聚类或组织可以帮助您进一步分析或制定特定于聚类的策略。

For example, we may segment our customers into distinct groups based on their age, gender, purchase history, etc. to devise segment-specific sales, marketing, or promotion strategies.

例如,我们可能会根据客户的年龄,性别,购买历史记录等将客户划分为不同的群体,以制定针对特定细分市场的销售,营销或促销策略。

Other practical use-cases of this class of algorithms include:

此类算法的其他实际用例包括:

  • Which of our subscribers like similar movies or songs?我们哪个订阅者喜欢类似的电影或歌曲?
  • How can we categorize several text documents or audio recordings?我们如何对几个文本文档或录音进行分类?
  • How can we better segment our products or services?我们如何更好地细分我们的产品或服务?
  • Which model of a specific machine is more prone to breakdowns?特定机器的哪种型号更容易发生故障?

Formally known as Clustering, the relevant algorithms include:

正式称为聚类 ,相关算法包括:

  • k-Means Clusteringk均值聚类
  • Mean-Shift Clustering均值漂移聚类
  • Density-Based Spatial Clustering of Applications with Noise (DBSCAN)基于密度的噪声应用空间聚类(DBSCAN)
  • Agglomerative Hierarchical Clustering聚集层次聚类
  • Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH)使用层次结构(BIRCH)进行平衡的迭代减少和聚类

接下来做什么? (What To Do Next?)

This is where ML gets really interesting, whereby the ML algorithm not only predicts but also tells us what to do given its prediction. This family of ML algorithms might not be mature enough yet for all use-cases; however, substantial progress has been made recently in the light of advanced deep learning algorithms and the greater processing power available to us.

这是ML真正令人感兴趣的地方,据此ML算法不仅可以预测,而且可以告诉我们根据其预测该怎么做。 对于所有用例,这种ML算法系列可能还不够成熟。 但是,鉴于先进的深度学习算法和我们可以使用的更大处理能力,最近已经取得了实质性进展。

These algorithms rely on trial and error and multiple feedback loops while not being as heavily dependant upon data as other algorithms. Mostly applicable in automated systems, the recommended action is usually taken by the machine.

这些算法依赖反复试验和多个反馈循环,而没有像其他算法那样严重依赖数据。 通常适用于自动化系统,建议的操作通常由机器执行。

Formally known as Reinforcement Learning, it is usually implemented through deep neural networks.

正式称为强化学习 ,通常是通过深度神经网络来实现的。

Some practical applications of reinforcement learning include:

强化学习的一些实际应用包括:

  • What should the robot do next in its situation in an industrial concern?在工业方面,机器人在其情况下下一步该怎么做?
  • Should we adjust the temperature or leave it untouched?我们应该调节温度还是保持不变?
  • How should a self-driving car react (accelerate, decelerate, apply brakes, etc.) given the hazard ahead?鉴于前方存在危险,无人驾驶汽车应如何React(加速,减速,刹车等)?

结论 (Conclusion)

There you have it: a practical, no-nonsense introduction to functional scenarios where ML assists us in plain language.

在那里,您可以找到实用的,实用的功能介绍,其中ML以简单的语言帮助我们。

Free free to comment or reach out to me if you would like to discuss anything further related to machine learning, data analytics, risk scoring, and financial analysis.

如果您想讨论与机器学习,数据分析,风险评分和财务分析有关的任何其他内容,可以免费发表评论或与我联系 。

Till next time, code on!

直到下一次,编码!

翻译自: https://towardsdatascience.com/what-are-the-practical-benefits-of-machine-learning-c9820dbdd67c

机器学习实际应用


http://www.taodudu.cc/news/show-997593.html

相关文章:

  • mysql 时间推移_随着时间的推移可视化COVID-19新案例
  • 海量数据寻找最频繁的数据_寻找数据科学家的“原因”
  • kaggle比赛数据_表格数据二进制分类:来自5个Kaggle比赛的所有技巧和窍门
  • netflix_Netflix的Polynote
  • 气流与路易吉,阿戈,MLFlow,KubeFlow
  • 顶级数据恢复_顶级R数据科学图书馆
  • 大数据 notebook_Dockerless Notebook:数据科学期待已久的未来
  • 微软大数据_我对Microsoft的数据科学采访
  • 如何击败腾讯_击败股市
  • 如何将Jupyter Notebook连接到远程Spark集群并每天运行Spark作业?
  • twitter 数据集处理_Twitter数据清理和数据科学预处理
  • 使用管道符组合使用命令_如何使用管道的魔力
  • 2020年十大币预测_2020年十大商业智能工具
  • 为什么我们需要使用Pandas新字符串Dtype代替文本数据对象
  • nlp构建_使用NLP构建自杀性推文分类器
  • 时间序列分析 lstm_LSTM —时间序列分析
  • 泰晤士报下载_《泰晤士报》和《星期日泰晤士报》新闻编辑室中具有指标的冒险活动-第1部分:问题
  • 异常检测机器学习_使用机器学习检测异常
  • 特征工程tf-idf_特征工程-保留和删除的内容
  • 自我价值感缺失的表现_不同类型的缺失价值观和应对方法
  • 学习sql注入:猜测数据库_面向数据科学家SQL:学习简单方法
  • python自动化数据报告_如何:使用Python将实时数据自动化到您的网站
  • 学习深度学习需要哪些知识_您想了解的有关深度学习的所有知识
  • 置信区间估计 预测区间估计_估计,预测和预测
  • 地图 c-suite_C-Suite的模型
  • sap中泰国有预扣税设置吗_泰国餐厅密度细分:带有K-means聚类的python
  • 傅里叶变换 直观_A / B测试的直观模拟
  • 鸽子 迷信_人工智能如何帮助我战胜鸽子
  • scikit keras_Scikit学习,TensorFlow,PyTorch,Keras…但是天秤座呢?
  • 数据结构两个月学完_这是我作为数据科学家两年来所学到的

机器学习实际应用_机器学习的实际好处是什么?相关推荐

  1. 机器学习 凝聚态物理_机器学习遇到了凝聚的问题

    机器学习 凝聚态物理 为什么要机器学习? (Why machine learning?) Machine learning is one of today's most rapidly cutting ...

  2. 机器学习导论�_机器学习导论

    机器学习导论� Say you are practising basketball on your own and you are trying to shoot the ball into the ...

  3. 机器学习模型 非线性模型_机器学习:通过预测菲亚特500的价格来观察线性模型的工作原理...

    机器学习模型 非线性模型 Introduction 介绍 In this article, I'd like to speak about linear models by introducing y ...

  4. 机器学习偏差方差_机器学习101 —偏差方差难题

    机器学习偏差方差 Determining the performance of our model is one of the most crucial steps in the machine le ...

  5. 机器学习系列(4)_机器学习算法一览,应用建议与解决思路

    作者:寒小阳 时间:2016年1月. 出处:http://blog.csdn.net/han_xiaoyang/article/details/50469334 声明:版权所有,转载请联系作者并注明出 ...

  6. 机器学习系列(7)_机器学习路线图(附资料)

    作者:寒小阳&&龙心尘 时间:2016年2月. 出处:http://blog.csdn.net/han_xiaoyang/article/details/50759472 http:/ ...

  7. (转)机器学习系列(7)_机器学习路线图(附资料)

    作者:寒小阳&&龙心尘 时间:2016年2月. 出处:http://blog.csdn.net/han_xiaoyang/article/details/50759472 http:/ ...

  8. 机器学习为什么重要_机器学习:它是如何工作的; 更重要的是,为什么它起作用?...

    机器学习为什么重要 Back in 2017, I was introduced to a cool wor(l)d. The word by itself was very intriguing. ...

  9. 机器学习与不确定性_机器学习求职中的不确定性

    机器学习与不确定性 In less than a year, I will be deemed worthy by my university of a Bachelors degree. In le ...

最新文章

  1. 转-完成端口高效的三个原因
  2. Win7+Ubuntu11
  3. 孰轻孰重:可穿戴式设备的助益与风险
  4. [你必须知道的.NET]第十三回:从Hello, world开始认识IL
  5. H5 Canvas刮刮乐
  6. 支付系统整体设计:整体架构设计以及注意要点(一)
  7. jquery.uploadify flash IE6上传无效
  8. IEPNGFix:Unclickable children of element 解决办法
  9. 利用反射判断初始化后的对象所有属性是否为空判断对象指定属性是否为空
  10. 50阶乘c语言思想,求10000的阶乘(c语言代码实现)
  11. Q107:Linux系统下GDB对PBRT-V3进行debug
  12. the database profile could not loaded. Check log for details
  13. 如何快速辨识四位数字贴片电阻阻值
  14. 【Ignite】使用数据库软件DBeaver管理Apache Ignite
  15. 前端js遍历map获取key与value
  16. V4L2驱动的移植与应用(三)
  17. 那些酷炫的网页你也可以做到——第六篇(表单标签)
  18. 非安全系列教程 NPM、PYPI、DockerHub 备份
  19. 五金自营平台进军MRO百亿市值 行业独角兽势头显现
  20. uestc oj 1831 论程序的阿卡林化

热门文章

  1. [BZOJ3626] [LNOI2014] LCA 离线 树链剖分
  2. 对Faster R-CNN的理解(1)
  3. 简单团队-爬取豆瓣电影T250-项目进度
  4. Elasticsearch集群知识笔记
  5. js中的extend的用法及其JS中substring与substr的区别
  6. lock和synchronized的同步区别与选择
  7. 利用yii2 gridview实现批量删除案例
  8. [转]DevExpress GridControl 关于使用CardView的一点小结
  9. MonoRail学习-介绍篇(一)
  10. 常见开源分布式存储系统