如何学习 azure_Azure的监督学习

如何学习 azure

Machine learning sounds cool, doesn’t it? I’m a biology student who didn’t have any idea about this branch of computer science. This lockdown gave me the time and strength to explore it. For those who need a layman intro to machine learning, I shall share an example. One day my dad asked me what do I keep studying? I didn’t know how to explain it to him. Words going on in my mind were normalization, overfitting, models, azure, etc. The next minute, he was trying to type a text to a friend by using google speech recognition on his phone. My next sentence was, that’s what I am studying dad! The science behind this process is what is called machine learning. It is a subset of artificial intelligence that focuses on creating programs that are capable of learning without explicit instruction.

机器学习听起来很酷，不是吗？我是生物学专业的学生，对计算机科学的这个分支一无所知。这种锁定使我有时间和精力进行探索。对于那些需要入门的机器学习入门者，我将分享一个例子。有一天我爸爸问我继续学习什么？我不知道如何向他解释。我脑海中常出现的单词是规范化，过度拟合，模型，天蓝色等。第二分钟，他试图通过在手机上使用Google语音识别功能向朋友输入文本。我的下一句话是，这就是我正在学习的爸爸！该过程背后的科学就是所谓的机器学习。它是人工智能的子集，专注于创建无需明确指令即可学习的程序。

The following article includes one of the basic concepts of machine learning i.e. Supervised Learning. Hope you all enjoy it! 1. Supervised Learning: Classification

以下文章包括机器学习的基本概念之一，即监督学习。希望大家喜欢！ 1.监督学习：分类

The first type of supervised learning that we’ll look at is classification. Recall that the main distinguishing characteristic of classification is the type of output it produces:

我们要研究的第一类监督学习是分类。回想一下分类的主要区别特征是它产生的输出类型：

In a classification problem, the outputs are categorical or discrete.Within this broad definition, there are several main approaches, which differ based on how many classes or categories are used, and whether each output can belong to only one class or multiple classes. Let’s have a look.

在分类 问题中，输出是分类的或离散的。 在这个宽泛的定义内，有几种主要方法，这些方法根据所使用的类别或类别的数量以及每个输出是否只能属于一个类别或多个类别而有所不同。我们来看一下。

Some of the most common types of classification problems include:

最常见的分类问题类型包括：

· Classification on tabular data: The data is available in the form of rows and columns, potentially originating from a wide variety of data sources.

· 表格数据的分类 ：数据以行和列的形式提供，可能源自多种数据源。

· Classification on image or sound data: The training data consists of images or sounds whose categories are already known.

· 图像或声音数据的分类 ：训练数据由其类别已知的图像或声音组成。

· Classification on text data: The training data consists of texts whose categories are already known.

· 文本数据的分类 ：训练数据由类别已知的文本组成。

As we know, machine learning requires numerical data. This means that with images, sound, and text, several steps need to be performed during the preparation phase to transform the data into numerical vectors that can be accepted by the classification algorithms.

众所周知，机器学习需要数值数据。这意味着对于图像，声音和文本，在准备阶段需要执行几个步骤，以将数据转换为分类算法可以接受的数值向量。

The following images are just an introduction to the various algorithms with their major characteristics. No need to get overwhelmed! Learning about algorithms is a slow and steady process.

下图只是各种算法的主要特征介绍。无需不知所措！学习算法是一个缓慢而稳定的过程。

*One-vs-all method: A binary model is created for each of the multiple output classes. Each of these binary models for the individual classes is assessed against its complement (all other classes in the model) as though it were a binary classification issue. Prediction is then performed by running these binary classifiers and choosing the prediction with the highest confidence score.

* 一对多方法 ：为多个输出类中的每个类创建一个二进制模型。针对每个类别的这些二进制模型中的每一个都将根据其补语(模型中的所有其他类别)进行评估，就好像它是二进制分类问题一样。然后，通过运行这些二进制分类器并选择具有最高置信度得分的预测来执行预测。

In essence, an ensemble of individual models is created and the results are then merged, to create a single model that predicts all classes. Thus, any binary classifier can be used as the basis for a one-vs-all model.

本质上，创建单个模型的集合，然后将结果合并，以创建预测所有类的单个模型。因此，任何二进制分类器都可以用作“一对多”模型的基础。

*SMOTE (synthetic minority oversampling technique) is one of the most commonly used oversampling methods to solve the imbalance problem. It aims to balance class distribution by randomly increasing minority class examples by replicating them. SMOTE synthesizes new minority instances between existing minority instances.

* SMOTE (合成少数群体过采样技术)是解决 不平衡问题的最常用过采样方法之一。它旨在通过随机复制少数族裔的例子来平衡阶级分布。 SMOTE在现有少数派实例之间合成新的少数派实例。

2. Multi-Class Algorithms a) Multi-class Logistic Regression *Logistic Regression is a classification method used to predict the value of a categorical dependent variable from its relationship to one or more independent variables assumed to have a logistic distribution. If the dependent variable has only two possible values (success/failure), then the logistic regression is binary. If the dependent variable has more than two possible values (blood type given diagnostic test results), then the logistic regression is multinomial.

2.多类算法a)多类Logistic回归* Logistic回归是一种分类方法，用于根据类别因变量与一个或多个假设具有逻辑分布的自变量之间的关系来预测类别因变量的值。如果因变量只有两个可能的值(成功/失败)，则逻辑回归是二进制的。如果因变量具有两个以上的可能值(给定诊断测试结果的血液类型)，则逻辑回归是多项式。

2 Key parameters to configure this algorithm are: -Optimization tolerance: control when to stop the iterations. If the improvement between iterations is less than the specified threshold, the algorithm stops and returns the current model.

2配置此算法的关键参数是：- 优化容差 ：控制何时停止迭代。如果迭代之间的改进小于指定的阈值，则算法将停止并返回当前模型。

-Regularization weight: Regularization is a method to prevent overfitting by penalizing the models with extreme coefficient values. This factor determines how much to penalize the models at each iteration.

-正则化权重：正则化是一种通过对极端系数值进行惩罚的模型来防止过度拟合的方法。这个因素决定了每次迭代要对模型进行多少惩罚。

b) Multi-class Neural Network Include the input layer, a hidden layer, and an output layer. The relationship between input and output is learned from training the neural network on input data. 3 key parameters include: -The number of hidden nodes: Lets you customize the number of hidden nodes in the neural network. -Learning rate: Controls the size of the step taken at each iteration before correction. -The number of Learning Iterations: The maximum number of times the algorithm should process the training cases. c) Multi-class Decision Forest An ensemble of decision trees. Works by building multiple decision trees and then voting on the most popular output class. 5 key parameters include: -Resampling method: This controls the method used to create the individual trees. -The number of decision trees: This specifies the maximum number of decision trees that can be created in the ensemble. -Maximum depth of the decision trees: This is a number to limit the maximum depth of any decision tree. -The number of random splits per node: The number of splits to use when building each node of the tree. -The minimum number of samples per leaf node: This controls the minimum number of cases that are required to create any terminal node in a tree.

b)多类神经网络包括输入层，隐藏层和输出层。输入和输出之间的关系是通过在输入数据上训练神经网络来学习的。 3个关键参数包括：- 隐藏节点的数量 ：让您自定义神经网络中隐藏节点的数量。 - 学习率 ：控制校正前每次迭代所采取步骤的大小。 - 学习迭代次数：算法应处理训练案例的最大次数。 c)多类决策森林决策树的集合。通过构建多个决策树，然后对最受欢迎的输出类进行投票来工作。 5个关键参数包括：-重采样方法：此控件控制用于创建单个树的方法。 - 决策树的数量 ：这指定可以在集合中创建的决策树的最大数量。 - 决策树的最大深度 ：这是一个数字，用于限制任何决策树的最大深度。 - 每个节点的随机分割数 ：构建树的每个节点时要使用的分割数。 - 每个叶节点的最小样本数 ：这控制在树中创建任何终端节点所需的最小案例数。

3. Supervised Learning: Regression In a regression problem, the output is numerical or continuous. 3.1 Introduction to Regression Common types of regression problems include:

3.有监督的学习：回归 在回归 问题中，输出是数字或连续的。 3.1回归简介回归问题的常见类型包括：

· Regression on tabular data: The data is available in the form of rows and columns, potentially originating from a wide variety of data sources.

· 表格数据的回归：数据以行和列的形式提供，可能源自多种数据源。

· Regression on image or sound data: Training data consists of images/sounds whose numerical scores are already known. Several steps need to be performed during the preparation phase to transform images/sounds into numerical vectors accepted by the algorithms.

· 图像或声音数据的回归：训练数据由其数字分数已知的图像/声音组成。在准备阶段需要执行几个步骤，以将图像/声音转换为算法接受的数值向量。

Regression on text data: Training data consists of texts whose numerical scores are already known. Several steps need to be performed during the preparation phase to transform the text into numerical vectors accepted by the algorithms. Examples: Housing prices, Customer churn, Customer Lifetime Value, Forecasting (time series), and Anomaly Detection.

对文本数据进行回归：训练数据由数字分数已知的文本组成。在准备阶段需要执行几个步骤，以将文本转换为算法接受的数值向量。示例：房价，客户流失，客户生命周期价值，预测(时间序列)和异常检测。

3.2 Categories of Algorithms Common machine learning algorithms for regression problems include:

3.2算法类别用于回归问题的常见机器学习算法包括：

· Linear Regression

·线性回归

· Fast training, linear model

·快速训练，线性模型

· Decision Forest Regression

·决策森林回归

· Accurate, fast training times

·准确，快速的培训时间

· Neural Net Regression

·神经网络回归

· Accurate, long training times

·准确，长时间的培训

Numerical Outcome: Dependent variable *Ordinary least squares method: Calculates error as a sum of the squares of distance from the actual value to the predicted line. It fits the model by minimizing the squared error. This method assumes a strong linear relationship between the inputs and the dependent variable. *Gradient Descent: The approach is to minimize the amount of error at each step of the model training process.

数值结果：因变量* 普通最小二乘法 ：将误差计算为从实际值到预测线的距离的平方。它通过最小化平方误差来拟合模型。该方法假定输入和因变量之间具有很强的线性关系。 * 梯度下降 ：该方法是在模型训练过程的每个步骤中最小化误差量。

The algorithm supports some of the same hyper-parameters discussed for multi-class decision forest algorithms such as the number of trees, maximum depth, etc.

该算法支持为多类决策森林算法讨论的某些相同的超参数，例如树的数量，最大深度等。

Since it is a supervised learning method, it requires a tagged dataset that includes a label column which must be a numerical data type. The algorithm also supports the same hyper-parameters as the number of hidden nodes, learning rate, and the number of iterations that were included in a multi-class neural network algorithm. *Regularization is one of the hyperparameters in machine learning which is the process of regularizing the parameters that restrict, regularizes, or reduces the coefficient estimates towards zero. This technique avoids the risk of overfitting by discouraging the learning of a more complex or flexible model.

由于这是一种有监督的学习方法，因此需要带标签的数据集，该数据集包括必须为数字数据类型的标签列。该算法还支持与多类神经网络算法中包含的隐藏节点数，学习率和迭代数相同的超参数。 * 正则化是机器学习中的超参数之一，它是将限制，正则化或将系数估计值减小为零的参数进行正则化的过程。通过阻止学习更复杂或更灵活的模型，该技术避免了过拟合的风险。

4. Automate the training of Regressors Key challenges in successfully training a machine learning model include: -selecting features from the ones available in the datasets -choosing the right algorithm for the task -tuning the hyperparameters of the selected algorithm -selecting the right evaluation metrics to measure the performance of the trained model -the entire process is pretty iterative The idea behind Automated ML is to enable the automated exploration of the combinations needed to successfully produce a trained model. It intelligently tests multiple algorithms and hyper-parameters in parallel and returns the best one. The next steps include the deployment of the model into production and further customization or refinement if needed to improve performance.

4.自动化回归器的训练成功训练机器学习模型的主要挑战包括：-从数据集中可用的特征中选择特征-为任务选择正确的算法-调整所选算法的超参数-选择正确的评估指标衡量训练模型的性能-整个过程是反复进行的。自动化ML的想法是使能够自动探索成功生成训练模型所需的组合。它可以并行智能地测试多种算法和超参数，并返回最佳算法。下一步包括将模型部署到生产中，并在需要提高性能时进一步定制或完善。

Material Reference: Udacity Fundamental Course in Machine Learning for Microsoft Azurehttps://docs.microsoft.com/en-us/azure/?product=featuredhttps://docs.microsoft.com/en-us/

物料参考：适用于Microsoft Azure的机器学习中的Udacity基础课程https://docs.microsoft.com/zh-cn/azure/?product=featured https://docs.microsoft.com/zh-CN/

Happy learning :)

快乐学习：)

翻译自: https://medium.com/ml-course-microsoft-udacity/supervised-learning-with-azure-23204eae32d6

如何学习 azure

查看全文

http://www.taodudu.cc/news/show-863713.html

t-sne 流形_流形学习[t-SNE，LLE，Isomap等]变得轻松
数据库课程设计结论_结论
摘要算法_摘要
数据库主从不同步_数据从不说什么
android 揭示动画_遗传编程揭示具有相互作用的多元线性回归
检测和语义分割_分割和对象检测-第5部分
如何在代码中将menu隐藏_如何在40行代码中将机器学习用于光学/光子学应用
pytorch实现文本分类_使用变形金刚进行文本分类（Pytorch实现）
python 机器学习管道_构建机器学习管道-第1部分
pandas数据可视化_5利用Pandas进行强大的可视化以进行数据预处理
迁移学习迁移参数_迁移学习简介
div文字自动扩充_文字资料扩充
ml是什么_ML，ML，谁是所有人的冠军？
随机森林分类器_建立您的第一个随机森林分类器
Python中的线性回归：Sklearn与Excel
机器学习中倒三角符号_机器学习的三角误差
使用Java解决您的数据科学问题
树莓派神经网络植入_使用自动编码器和TensorFlow进行神经植入
opencv 运动追踪_足球运动员追踪-使用OpenCV根据运动员的球衣颜色识别运动员的球队
犀牛建模软件的英文语言包_使用tidytext和textmineR软件包在R中进行主题建模（
使用Keras和TensorFlow构建深度自动编码器
出人意料的生日会400字_出人意料的有效遗传方法进行特征选择
fast.ai_使用fast.ai自组织地图—步骤4：使用Fast.ai DataBunch处理非监督数据
无监督学习与监督学习_有监督与无监督学习
分类决策树回归决策树_决策树分类器背后的数学
检测对抗样本_对抗T恤以逃避ML人检测器
机器学习中一阶段网络是啥_机器学习项目的各个阶段
目标检测 dcn v2_使用Detectron2分6步进行目标检测
生成高分辨率pdf_用于高分辨率图像合成的生成变分自编码器
神经网络激活函数对数函数_神经网络中的激活函数

如何学习 azure_Azure的监督学习相关推荐

超酷炫！Facebook用深度学习和弱监督学习绘制全球精准道路图
作者 | Saikat Basu等译者 | 陆离责编 | 夕颜出品 | AI科技大本营(ID: rgznai100) 导读:现如今,即使可以借助卫星图像和绘制软件,创建精确的道路图也依然是一个费 ...
元学习、迁移学习、对比学习、自监督学习与少样本学习的关系解读
文章目录前言一.对比自监督学习与FSL 1.对比学习与自监督学习 2.自监督学习与FSL 二.元学习与FSL 1.元学习是什么 2.元学习与FSL 三.迁移学习与FSL 1.迁移学习 2.迁移学习 ...
机器学习主动学习和半监督学习
一.主动学习(active learning) 学习器能够主动选择包含信息量大的未标注的样例并将其交由专家进行标注,然后置入训练集进行训练,从而在训练集较小的情况下获得较高的分类正确率,这样可以有效的 ...
深度学习入门 --- 自我学习与半监督学习
该章节参考ufldl 1.什么是自我学习(Self-Taught Learning)与半监督学习首先,什么是半监督学习?当你手头上拥有在大量未标注数据和少量的已标注数据,那这种场景就可以说是半监督学 ...
【人工智能与深度学习】自我监督学习 - ClusterFit 和 PIRL
[人工智能与深度学习]自我监督学习 - ClusterFit 和 PIRL "前置"任务中到底差了什么东西呢? 到底我们想在预先训练过的模型中想要什么「特征」呢? 物以类聚法:提高 ...
AI之强化学习、无监督学习、半监督学习和对抗学习
1.强化学习文章目录 1.强化学习 1.1 强化学习原理 1.2 强化学习与监督学习 2.无监督学习 3.半监督学习 4.对抗学习强化学习(英语:Reinforcement Learning,简称 ...
机器学习（学习笔记）——监督学习和无监督学习
前言机器学习可以分为两大类:监督学习和无监督学习. 今天介绍机器监督学习和无监督学习. ** 监督学习和无监督学习很好区分:是否有监督(supervised),就看输入数据是否有标签(label), ...
系统学习机器学习之监督学习
监督学习是从标记的训练数据来推断一个功能的机器学习任务.训练数据包括一套训练示例.在监督学习中,每个实例都是由一个输入对象(通常为矢量)和一个期望的输出值(也称为监督信号)组成.监督学习算法是分析该训 ...
《神经网络与深度学习》-无监督学习
无监督学习 1. 无监督特征学习 1.1 主成分分析 1.2 稀疏编码 1.2.1 训练方法 1.2.2 稀疏编码的优点 1.3 自编码器 1.4 稀疏自编码器 1.5 堆叠自编码器 1.6 降噪自编 ...

如何学习 azure_Azure的监督学习

相关文章：

如何学习 azure_Azure的监督学习相关推荐

最新文章

热门文章