sql 2017 机器学习_使用R和SQL Server 2017进行机器学习

sql 2017 机器学习

The primitive Business Intelligence (BI) methodology has its primary focus on data sourcing from disparate source systems and data augmentation in a data lake or data warehouse. This respiratory of data acts as the primary source purpose like reporting, data marts, and data mining. All these data analysis forms require the end user to apply analytical thinking for result interpretation.

原始的商业智能（BI）方法主要集中于从不同的源系统进行数据采购以及在数据湖或数据仓库中进行数据扩充。数据呼吸是报告，数据集市和数据挖掘等主要来源。所有这些数据分析表格都要求最终用户运用分析思想进行结果解释。

Machine Learning, being an advanced analysis forms where the model learns from the model of data fed and for predicting analysis through derives intelligence. This analysis majorly depends on the model of machine learning to develop the process. It is the combination of data transformation/modeling, model training, model improvisation, and model testing and data analysis.

机器学习是一种高级分析形式，其中，模型从馈入的数据模型中学习，并通过导出情报来预测分析。这种分析主要取决于机器学习的模型来开发过程。它是数据转换/建模，模型训练，模型即兴以及模型测试和数据分析的组合。

Professionals often think that their database experience covers exploratory skills of data analysis. The professionals of database professionals are fluent in data analysis which is more of a query logic/ database model assessment. The study of exploratory data that is involved in machine learning systems is nature wise statistical and often named as data science.

专业人士通常认为他们的数据库经验涵盖了数据分析的探索技能。数据库专业人员的专业技能精通数据分析，这更像是查询逻辑/数据库模型评估。机器学习系统中涉及的探索性数据的研究是自然明智的统计方法，通常被称为数据科学。

ML has deep roots in statistics that are required to create a solid foundation for data science basics for exploratory data analysis. We can divide statistics into two broad categories- inferential and descriptive and is widely used in the model development of machine learning.

ML在统计方面具有深厚的渊源，这是为探索性数据分析的数据科学基础创建坚实基础所必需的。我们可以将统计信息分为推论性和描述性两大类，并且广泛用于机器学习的模型开发中。

SQL Server hosted data provides the benefits of a predefined schema and T-SQL constructs. SSIS and other ETL tools provide the benefits of data transformation at a broader scale and faster pace. Assuming data is concisely structured and treated for errors during data quality/ capture, exploratory data analysis can be applied over this data, the fundamental step in machine learning model development. Model training, model development and model training follows this analysis.

SQL Server托管数据提供了预定义架构和T-SQL构造的好处。 SSIS和其他ETL工具以更大的规模和更快的速度提供了数据转换的好处。假设数据结构简洁，并在数据质量/捕获期间针对错误进行处理，则可以对这些数据进行探索性数据分析，这是机器学习模型开发的基本步骤。模型训练，模型开发和模型训练遵循此分析。

What is Machine Learning and reason to learn?

什么是机器学习和学习原因？

When we train a machine to learn from a given dataset, we can use these items for distinct purposes like prediction, classification, and others; we call this concept as Machine Learning. One more point to learn is that a machine not only means a physical device. For easy understanding, it can be perceived as a program or a data model.

当我们训练机器从给定的数据集中学习时，我们可以将这些项目用于不同的目的，例如预测，分类等。我们将此概念称为机器学习。还有一点要学习的是，机器不仅意味着物理设备。为了易于理解，可以将其视为程序或数据模型。

Some key points and definitions related to Machine Learning are mentioned below:

与机器学习有关的一些关键点和定义如下：

Machine Learning is concerned with automatic concerned programs to improve their performance through expertise.

机器学习与自动相关程序有关，以通过专业知识来提高其性能。

Machine Learning is one of the types of AI provides the computer devices with the learning ability without any explicit programming.

机器学习是AI的一种类型，它无需任何显式编程即可为计算机设备提供学习能力。

ML primarily focuses on computer program development that can change with new data exposition.

ML主要致力于可以随着新数据公开而变化的计算机程序开发。

The process of ML is comparable to data processing. Both systems search through information to appear for patterns. However, rather than extracting information for human comprehension as just in case of knowledge mining, ML uses that information to discover trends in data and alter program actions consequently.

机器学习的过程可与数据处理媲美。两个系统都搜索信息以显示模式。但是，ML不会像知识挖掘那样为人类理解而提取信息，而是使用该信息来发现数据趋势并因此改变程序动作。

Some of the applications mentioned below will provide the best answer for the question, why learn ML.

下面提到的某些应用程序将为问题（为什么学习ML）提供最佳答案。

Machine Learning Applications

机器学习应用

Web Search through page ranking based on user likelihood and clicks通过基于用户可能性和点击次数的网页排名进行网络搜索
Finance to decide target users for new offers of credit card财务决定目标用户提供新的信用卡优惠
E-commerce to predict the transactions that are fraudulent电子商务预测欺诈性交易
Space exploration to radio astronomy and space probes射电天文学和太空探测器的太空探索
Robotics to handle uncertainty in environments like self-driving cars机器人技术可解决自动驾驶汽车等环境中的不确定性
Computational suggestion to application bugs based on cognitive processing基于认知处理的应用程序错误的计算建议
ML deals with the predictive/advance analysis that makes it a primary extension for data professionals who are seeking skill enhancement.ML处理预测/提前分析，这使其成为寻求技能增强的数据专业人员的主要扩展。

Machine Learning Types

机器学习类型

The types of ML learning can be found in distinct reference materials. Usually, the process of ML classifies into three categories as Supervised, Unsupervised and Reinforcement Learning.

机器学习的类型可以在不同的参考资料中找到。通常，机器学习的过程可分为三类：监督学习，无监督学习和强化学习。

Supervised ML: This form of ML learns from unlabeled knowledge and takes actions. For instance, think about a dataset containing attributes of all the homes in a given country or state or town. Also, even if it is, prediction intends to predict the price of a given home based on attributes and not which house the attributes belong.

监督式ML ：这种形式的ML从未标记的知识中学习并采取行动。例如，考虑一个数据集，其中包含给定国家，州或镇中所有房屋的属性。而且，即使是这样，预测也打算基于属性而不是属性所属的房屋来预测给定房屋的价格。

Unsupervised ML: This form of ML learns style unlabeled data and then takes actions. The best example is “consider a dataset with attributes of all houses in a particular country or state or city.

无监督的ML ：这种形式的ML学习样式未标记的数据，然后采取措施。最好的例子是“考虑具有特定国家，州或城市中所有房屋属性的数据集。

Reinforcement Learning: In this form of ML, the learning is possible based on the rewards according to the depending system upon the actions performed by the model. This is the most advanced machine learning form applies to AI-based systems like robotics, neural networks, and recommendation engines.

强化学习 ：以这种形式的ML，可以根据依赖于模型执行的动作的系统获得的奖励进行学习。这是适用于基于AI的系统（如机器人技术，神经网络和推荐引擎）的最先进的机器学习形式。

Machine Learning Support in Microsoft Technology Stack

Microsoft Technology Stack中的机器学习支持

ML Support in Microsoft Technology Stack

Microsoft Technology Stack中的ML支持

Microsoft acquired R in 2016 enabling a vision of Microsoft data platforms on-premises, hybrid environments and on Microsoft Azure. Microsoft post-acquisition integrated R with SQL Server, Azure, PowerBI, and Cortana Analytics. Additionally, Revolution R open has been renamed to Microsoft R Open and Revolution R Enterprise to SQL Server R Services and Microsoft R Server.

微软在2016年收购了R，从而实现了在本地，混合环境以及Microsoft Azure上对Microsoft数据平台的愿景。微软收购后将R与SQL Server，Azure，PowerBI和Cortana Analytics集成在一起。此外，Revolution R open已重命名为Microsoft R Open，Revolution R Enterprise重命名为SQL Server R Services和Microsoft R Server。

R Services from SQL Server/ SQL Server ML Services installs an open source R distribution as well as packages provided by Microsoft that support distributed and parallel processing. This architecture is specially designed to enable external scripts using R run in a separate process from SQL Server. R services integrate the R language with SQL Server and help to perform analytics close to the data and eliminate the security risks and costs that are associated with data movement.

SQL Server / SQL Server ML Services的R Services安装了一个开源R发行版以及Microsoft提供的支持分布式和并行处理的软件包。该体系结构经过专门设计，以允许使用R在与SQL Server分开的进程中运行的外部脚本。 R服务将R语言与SQL Server集成在一起，有助于执行接近数据的分析并消除与数据移动相关的安全风险和成本。

The methodology of traditional data analytics relies on transforming and transporting the data from OLTP databases> Data Warehouses> Data Marts using Power shell administration, SSAS for in-memory analytics and multi-dimensional, and reporting SSRS. Manipulation of data using set-based operations and numerical algebra has been the perfect solution with T-SQL on data stored in OLTP databases. Using T-SQL and R extends the data science power, machine learning, and statistical computing and other advanced predictive analysis capabilities to OLTP systems.

传统数据分析的方法依赖于使用Power Shell管理，用于内存中分析和多维的SSAS以及报告SSRS来转换和传输OLTP数据库>数据仓库>数据市场中的数据。使用基于集合的操作和数值代数来处理数据已成为T-SQL对OLTP数据库中存储的数据的完美解决方案。使用T-SQL和R将数据科学能力，机器学习和统计计算以及其他高级预测分析功能扩展到OLTP系统。

In this tutorial, we'll be acting active exercises exploitation R and T-SQL for exploratory information analysis and machine learning. It's assumed that you just have already put in SQL Server 2017, Machine Learning Services still as R. just in case you've got not, you'll learn the way to that here.

在本教程中，我们将进行主动练习，利用R和T-SQL进行探索性信息分析和机器学习。假定您刚刚已经将SQL Server 2017，Machine Learning Services仍然保留为R。以防万一，请在这里学习实现的方法。

How Statistics are used in Machine Learning

机器学习中如何使用统计数据

ML has deep roots in Statistics and Mathematics. Here are some distinct phases of an ML model development with their order.

ML在统计学和数学领域具有深厚的渊源。这是ML模型开发的一些不同阶段及其顺序。

Data Exploration-Structural data analysis including probability, central tendency, variance, etc.数据探索-结构数据分析，包括概率，中心趋势，方差等。
Model Testing模型测试
Data Standardization like Normalization, Feature extraction, Noise filtering, etc数据标准化，例如标准化，特征提取，噪声过滤等
Model Improvisation模型即兴
Model Development and Training模型开发与培训

In the process of ML model development, the initial step is data exploration. Here the investigation does not mean data querying form distinct sources using complex functions, queries or joins.

在ML模型开发过程中，第一步是数据探索。这里的调查并不意味着数据查询使用复杂的函数，查询或联接从不同的源进行查询。

The exploration intent is assessing the data balance from a standard point to develop a model of ML. If the data is not balanced correctly, it requires both transformations as well as standardization.

探索意图是从一个标准点评估数据平衡，以开发ML模型。如果数据平衡不正确，则既需要转换又需要标准化。

Upon identifying the attributes of inputs, an ML model is developed and trained with a significant data portion. The remaining data tests the accuracy of the model’s prediction. Improvising the prediction accuracy of any model is an iterative process until it reaches a level of satisfactory convenience.

在确定输入的属性后，将开发一个ML模型并使用重要的数据部分进行训练。其余数据测试模型预测的准确性。提高任何模型的预测准确性都是一个反复的过程，直到达到令人满意的便利水平为止。

Branches of Statistics

统计分支

Generally, statistics are categorized into two branches at the best level as Descriptive and Inferential.

通常，统计数据在最佳级别上分为描述性和推论性两个分支。

Firstly, let’s understand about descriptive statistics that explains organization’s data and summarizes it with a representative sample. Its significant parts include Central Tendency Measures, Variability Measures, and Correlation. Quantitative analysis designs this particular branch.

首先，让我们了解描述性统计数据，该统计数据解释了组织的数据并用代表性样本对其进行了总结。它的重要部分包括集中趋势度量，可变性度量和相关性。定量分析设计了这个特定的分支。

Coming to the inferential statistics, it interprets and determines data as well as statistical significance thus concludes an unknown broader dataset from a sample one. Its foundation lies in the theory of Hypothesis Testing and Central Limit Theorem.

来到推论统计，它解释和确定数据以及统计显着性，从而从一个样本中推断出一个未知的更广泛的数据集。它的基础在于假设检验和中心极限定理的理论。

According to inferential statistics, the algorithms number deals with a particular predictive analysis types problems. ML models use these algorithms that mean it requires a detailed understanding of the algorithm before applying.

根据推论统计，算法编号处理特定的预测分析类型问题。 ML模型使用这些算法，这意味着在应用之前需要对算法进行详细的了解。

Studying Statistics of ML

学习机器学习统计

Any ML algorithms explanation starts with statistics. These statistics are usually at a higher level as describing it from the lowest level requiring a separate book itself for each algorithm but do not have the appropriate statistical background to learn these concepts.

任何ML算法的解释都从统计开始。这些统计信息通常处于较高的层次，从最低层次描述它时，每种算法本身都需要单独编写一本书，但是没有适当的统计背景来学习这些概念。

Without proper statistics foundation, any tutorial on ML would look like a mathematics class. Therefore, the question is learning statistics without touching the breakdown point where you give-up ML or lose interest due to learning struggle more and more about statistics.

没有适当的统计基础，任何有关ML的教程都将看起来像一门数学课。因此，问题是学习统计数据时不要触及到由于学习统计方面的越来越多而放弃ML或失去兴趣的崩溃点。

The learning approach is distinct for distinct persons based on their likes and dislikes. One of the following ways is a top-down approach to identify the best starting point. It is recommended to consider any of the statistics topics.

对于不同的人，根据他们的好恶，学习方法是不同的。以下方法之一是自上而下的方法，用于确定最佳起点。建议考虑任何统计主题。

It may be difficult to understand the characteristics of Normal Distribution if you are unaware of standard deviation.如果您不知道标准偏差，可能很难理解正态分布的特征。
It may be difficult to understand the standard deviation, its calculation, and the significance if you do not know variance.如果您不知道方差，则可能很难理解标准偏差，其计算和重要性。
To understand Variance, you need to know Mean and the formula to calculate Variance.要了解方差，您需要知道均值和计算方差的公式。
The low factor is independent of any other statistical derivation and is a part of elementary mathematics.低因子与任何其他统计推导无关，并且是基础数学的一部分。

So, in this way you can deduce the point where you have the appropriate background to understand the most fundamental topics and slowly build-up until you reach the statistical terms that are used in ML algorithms.

因此，通过这种方式，您可以推断出拥有适当背景的知识，以便理解最基本的主题并慢慢积累，直到达到ML算法中使用的统计术语为止。

Some inferences are faster and easier to make with the help of graphical analysis instead of looking at distinct numbers. There are different varieties of statistical visualizations based on the analysis types and variable categories. Some among them are quite fundamental and are almost used in every kind of analysis as a beginning point. The most commonly used visualizations for graphical exploratory study are :

借助图形分析而不是查看不同的数字，可以更快，更轻松地进行某些推断。根据分析类型和变量类别，统计可视化的种类繁多。其中一些是非常基础的，几乎被用作各种分析的起点。用于图形探索性研究的最常用的可视化对象是：

Density Plot密度图
Histogram直方图
Box Plot箱形图
Scatterplot散点图

Conclusion

结论

Now let’s assume that you are entirely new to the ML discipline, we started this discussing some basic terms, concepts and ML theory. We have a glance at the components of SQL Server 2017 which supports deep roots in statistics and mathematics. We came across some basic statistics terms, fundamentals and ML learning statistics.

现在，假设您是ML领域的新手，我们开始讨论一些基本术语，概念和ML理论。我们对SQL Server 2017的组件一目了然，该组件支持统计和数学的深入研究。我们遇到了一些基本统计术语，基础知识和机器学习学习统计。

Having a strong statistics foundation, theoretical ML knowledge learning and implementation of R knowledge, we came across that how about the data spread and about the shape of learning distinct statistics that are extracted using T-SQL and R. We have also learned about how to do this graphically by using different statistical visualizations.

有了强大的统计基础，理论ML知识学习和R知识的实现，我们发现了数据的传播方式以及学习使用T-SQL和R提取的不同统计信息的形状。我们还了解了如何通过使用不同的统计可视化图形地做到这一点。

翻译自: https://www.experts-exchange.com/articles/32480/Machine-Learning-with-R-and-SQL-Server-2017.html

sql 2017 机器学习

sql 2017 机器学习_使用R和SQL Server 2017进行机器学习相关推荐

sql azure 语法_什么是Azure SQL Cosmos DB？
sql azure 语法介绍 (Introduction) In the Azure Portal, you will find the option to install Azure SQL Co ...
python中引入sql的优点_引用sql-和引用sql相关的内容-阿里云开发者社区
bboss持久层改进支持模块sql配置文件引用其它模块sql配置文件中sql语句 bboss持久层改进支持模块sql配置文件引用其它模块sql配置文件中sql语句. 具体使用方法如下: <pro ...
java中sql模糊查询_模糊查询的sql语句(java模糊查询sql语句)
模糊查询的sql语句(java模糊查询sql语句) 2020-07-24 11:06:02 共10个回答假设表名为product,商品名为name,简界为remark.则可如下写:select[na ...
sql delete删除列_现有表操作中SQL DELETE列概述
sql delete删除列 In this article, we will explore the process of SQL Delete column from an existing tab ...
sql初学者指南_使用tSQLt框架SQL单元测试面向初学者
sql初学者指南 tSQLt is a powerful, open source framework for SQL Server unit testing. In this article, we ...
pl sql mysql 版本_老版本PL/SQL Developer操作数据库导致ORA-00600[17113]
在巡检某运营商的计费库时,发现alert日志中发现如下错误 Thu Feb 2 13:54:52 2012 Errors in file /oracle9/app/admin/bill/udump/b ...
mysql sql并列排名_教你用SQL实现统计排名
前言: 在某些应用场景中,我们经常会遇到一些排名的问题,比如按成绩或年龄排名.排名也有多种排名方式,如直接排名.分组排名,排名有间隔或排名无间隔等等,这篇文章将总结几种MySQL中常见的排名问题. 创 ...
java.sql.date格式化_如何将java.sql.date格式化为这种格式：“MM-dd-yyyy”？
我需要以下面的格式"MM-dd-yyyy"获取 java.sql.date,但是我需要它来保留java.sql.date所以我可以将它作为日期字段放入表中.因此,格式化后它不能是S ...
修改mysql导入sql大小限制_修改phpMyAdmin导入SQL文件的大小限制
用phpMyAdmin导入mysql数据库时,我的19M的数据库不能导入,提示mysql数据库最大只能导入2M. phpMyAdmin数据库导入出错: You probably tried to up ...

sql 2017 机器学习_使用R和SQL Server 2017进行机器学习

sql 2017 机器学习_使用R和SQL Server 2017进行机器学习相关推荐

最新文章

热门文章