batch lr替代关系

Linear regression is one of the most well-known and simple tools for statistics and machine learning.

线性回归是统计和机器学习中最知名的和简单的工具之一。

In this article, you can explore a linear regression algorithm, how it operates, and how you can better use it?

在本文中，您可以探索线性回归算法，其运作方式以及如何更好地使用它？

Linear regression (LR) is a simple yet powerful supervised learning technique. It is applied in a large number of situations.

线性回归(LR)是一种简单但功能强大的监督学习技术 。它适用于许多情况。

LR determines how the input variable termed as the explanatory variables affecting the output variable named the response variable. It uses the best fit straight line with the smallest number of square residuals nicknamed the line of regression or the square least line. The simple linear model contains only one independent variable called simple linear regression. While the multiple linear regression has more than one explanatory variable.

LR确定称为解释变量的输入变量如何影响称为响应变量的输出变量。它使用最佳拟合的直线，其残差最小的平方数被称为回归线或最小平方线 。简单线性模型仅包含一个称为简单线性回归的自变量。而多元线性回归具有多个解释变量。

LR handles the study of continuous variables. It’s beneficial for companies to forecasts such as the future market trend and the salary relationship with the experience. LR used in forecasting, time series, and cause-effect relationships. The association between reckless driving and road injuries, for example.

LR处理连续变量的研究。对于公司而言，预测诸如未来的市场趋势以及与经验的薪资关系是有益的。 LR用于预测，时间序列和因果关系。例如，鲁re驾驶和道路伤害之间的关联。

LR could be either positive or negative. A positive relationship between the two variables means that an increase in the value of one variable always increases in the value of the other variable. On the other hand, a negative relationship between two variables means that an increase in the value of one variable means a reduction in the value of the other variable.

LR可以是正数或负数。两个变量之间的正相关关系意味着一个变量值的增加始终会增加另一个变量的值。另一方面，两个变量之间的负关系意味着一个变量的值增加意味着另一个变量的值减小。

线性回归的假设 (Assumptions of linear regression)

· The relationship between the dependent variable y and the independent variable x always linear. The coefficients of x must so be linear and unrelated. You cannot allow the coefficients to be the function of each other.

·因变量y和自变量x之间的关系始终是线性的。 x的系数必须是线性的并且不相关。您不能允许系数互为函数。

· The independent variables must also be non-random in non-financial applications. Besides, in financial scenarios, the approximation to a random independent variable can be accurate as long as the error variable and the independent variable are not associated.

·在非金融应用程序中，自变量也必须是非随机的。此外，在财务场景中，只要误差变量和自变量不相关联，对随机自变量的近似就可以是准确的。

· Multicollinearity occurs when independent variables associated. With the correlation matrix where the correlation coefficient of all the variables must be less than 1. Tolerance is another measure of multi-collinearity. Tolerance defined by T=1-R2, where T<0.1 may be multicollinear and T<0.01 is multicollinear. For the variable inflation factor (VIF), VIF>10 is multicollinearity among variables.

·当自变量关联时发生多重共线性 。对于所有变量的相关系数必须小于1的相关矩阵，公差是多共线性的另一种度量。由T = 1-R2定义的公差，其中T <0.1可以是多共线，而T <0.01是多共线。对于可变通胀因子 (VIF)，VIF> 10是变量之间的多重共线性。

· The word error is usually spread. It tested to shape a histogram or a Q-Q residual plot. The histogram should be symmetrical and bell-shaped and the points of the Q-Q plot should be on a 45-degree axis.

·错误一词通常会传播。它经过测试可以塑造直方图或QQ残差图。 直方图应对称且呈钟形，并且QQ图的点应位于45度轴上。

· The variance of the definition of error is constant. This called Homoscedasticity Constraint or Constant Error Variance. It evaluated using a scatter plot. Breusch-Pagan test used to test for homoscedasticity. Performs an extra analysis with squared residues on independent variables.

·误差定义的方差是恒定的。这称为同方差约束或恒定误差方差 。使用散点图进行了评估。 Breusch-Pagan检验用于测试均方差。对自变量平方残差执行额外的分析。

• Autocorrelation happens where the residues are not independent of each other. Durbin-Watson (DW) checks the null hypothesis that the residues are not self-correlated. A DW statistic below 2 signals that nearby residuals correlated with one another.

•自相关发生在残基彼此不独立的情况下。 Durbin-Watson (DW)检查了残基不是自相关的原假设。低于2的DW统计信号表明附近的残差彼此相关。

• If LR makes reliable predictions, your input and output variables will be Gaussian distribution. Multivariate normality under which all variables expected to be multivariate and regular. Identified using the histogram or Q-Q plot. Further, verify the normality of the fitness test using the Kolmogorov-Smirnov test. When the data is not usually distributed for translation, log transformation done.

•如果LR做出可靠的预测，则您的输入和输出变量将为高斯分布。多元正态性，所有变量均应为多元正态。使用直方图或QQ图识别。此外，使用Kolmogorov-Smirnov检验验证适应性检验的正常性。如果通常不分发数据进行转换，则完成日志转换。

预测的准确性水平 (Level of the accuracy of the prediction)

· The scale of the residues gives a clear example of how effective a regression line is to estimate Y values from X values. This calculation referred to as the standard error of the estimation. This is the standard deviation of the estimate. The smaller the number, the more precise the forecasts appear to be.

·残基的规模清楚地说明了回归线从X值估计Y值的有效性。该计算称为估计的标准误差。这是估算值的标准偏差。数字越小，预测似乎越精确。

· The reliability of the model tested using the formula R2, which is the square of the association between x and y. The stronger the R2 the more it suits. It’s still between 0 and 1. The stronger the linear alignment, the closer the R² is to 1.

·使用公式R2进行测试的模型的可靠性，公式R2是x和y之间关联的平方。 R2越强，则越适合。它仍然在0到1之间。线性对齐越强，R²越接近1。

· Adjusted R2 is an extra method that applies R2 to the number of explanatory variables in the equation. This used to control whether extra explanatory variables are part of the equation. Based R2 is the strongest approximation of the connection. Adjusted R2 may be negative, although that is not the case.

· 调整后的R2是将R2应用于等式中解释变量的数量的另一种方法。这用于控制额外的解释变量是否为方程式的一部分。基于R2的是连接的最强近似值。调整后的R2可能为负，但事实并非如此。

In an over-fitting setting, a high R2 value, which contributes to a decreased predictability achieved. That is not the case with the R2 adjusted. Each variable added to the model increases R2 and never decreases. While the adjusted R2 only rises if the new predictor strengthens the LR model.

在过拟合的设置中，较高的R2值会导致降低可预测性。调整R2并非如此。添加到模型中的每个变量都会增加R2，而不会减少。而仅当新的预测变量增强了LR模型时，调整后的R2才会增加。

建立关系的替代方法 (Alternative approaches to modeling the relationship)

· Many alternative explanatory factors are categorical and can’t test on a quantitative scale. It’s a trick to use dummy variables. A dummy variable is a variable with a potential value between 0 and 1. Example of gender, quarter.

·许多其他解释性因素是绝对的，不能在定量范围内进行检验。使用伪变量是一个技巧。虚拟变量是可能值为0到1之间的变量。性别示例，季度。

· You may have an interaction variable combination of two explanatory variables. Including an interaction variable in a regression equation, if, you assume that the influence of one explanatory variable on y depends on the value of another explanatory variable.

·您可能具有两个解释变量的交互变量组合。如果假设回归变量中包含一个交互变量，则假定一个解释变量对y的影响取决于另一个解释变量的值。

· Nonlinear transformations of variables used as a consequence of curvature found in scatterplots. You should transform the dependent variable y or either of the explanatory variables, x or you can do all. It involves the normal logarithm, the square root, the reciprocal, and the square.

·由于散点图中的曲率而导致的变量的非线性变换。您应该转换因变量y或任一解释变量x，否则可以全部转换。它涉及正常对数，平方根，倒数和平方。

为什么要在回归中记录变量？ (Why log your variables in a regression?)

• The variable’s got the right skew and taking a log will make the distribution of the transformed variable symmetrical. But this is not enough excuse to log the variable. There are no regression rules that govern the independent or dependent variables to be normal. If you have outliers in your dependent or independent variables, a log transformation cut the effect.

•变量具有正确的偏斜，并且取对数将使变换后的变量的分布对称。但这还不足以记录变量。没有将自变量或因变量控制为正常的回归规则。如果因变量或因变量中有离群值，则对数转换会减少影响。

• The variance of your regression residuals is increasing with your regression predictions. Taking the log of your dependent or independent variables may drop the heteroscedasticity.

•回归残差的方差随着回归预测的增加而增加。记录因变量或自变量的对数可能会降低异方差性。

• Your regression residual variance is growing with your regression forecasts. Taking a log of the dependent or independent variables that cut heteroscedasticity. Your regression residual is not normal. It might or may not have been a problem for you. Even if the residues are not usual. you should log the dependent or independent variables and verify whether the residuals are regular after the log transformation.

•您的回归残差方差随着您的回归预测而增长。记录减少异方差的因变量或自变量的对数。您的回归残差不正常。这可能对您来说不是问题。即使残留物不常见。您应该记录因变量或自变量，并在对数转换后验证残差是否为正则。

• If dependent and independent variables do not have a linear and exponential relation. For example, the amount of income correlated with food consumption. The proportional rise in income would raise consumption to a certain amount and, after that, food consumption would either flatten or even decrease.

•如果因变量和自变量不具有线性和指数关系。例如，收入数额与粮食消费相关。收入的成比例增长将使消费增加到一定程度，此后，粮食消费将趋于平缓甚至下降。

自变量的相关性 (The relevance of the independent variable)

The underlying idea is that parsimony demonstrates most with the least. It supports a model with less explanatory variables. The below techniques can be used to identify explanatory variable significance in the linear regression equation.

其基本思想是， 简约性表现出最少的表现。它支持具有较少解释变量的模型。以下技术可用于识别线性回归方程式中的解释变量重要性。

The coefficient of correlation describes the strength and direction of the linear relationship between x and y. The hypothesis test helps one to determine, if the population correlation coefficient value is close to zero, or if it is different from zero.

相关系数描述了x和y之间线性关系的强度和方向。假设检验有助于确定总体相关系数值是否接近零，或者是否不同于零。

When the test determines the correlation coefficient is different from zero, the correlation coefficient is important. If the test shows that the correlation coefficient is close to zero, we assume the correlation coefficient is not significant. There are two methods to test the significance of using p-value and t statistic.

当测试确定相关系数不同于零时，相关系数很重要。如果测试表明相关系数接近零，则我们假设相关系数不显着。有两种方法可以检验使用p值和t统计量的重要性。

T-values of regression coefficients to include or exclude explanatory variables in the regression equation. A variable assumed to be important if p-value < 0.05 at 95% confidence level and t statistic > 2 use in the regression equation. If t statistic is less than 1, then it is a statistical fact that standard error would decrease and adjusted R2 will increase if this variable excluded from the regression equation.

回归系数的T值，以在回归方程中包含或排除解释变量。如果在回归方程中使用p值 <0.05(在95％置信水平下且t统计量> 2)，则认为该变量很重要。如果t统计量小于1，则是一个统计事实，如果将此变量从回归方程中排除，则标准误差将减小，而调整后的R2将增大。

F-test method to determine if the explained variation is high relative to the unexplained variation. The F-test of significance is the hypothesis test for the linear relationship. It has a related p-value that allows the test to run. If the F-value of the ANOVA table is large and the corresponding p-value is small. Reject the null hypothesis and assume explanatory variables have some value.

F检验方法，用于确定所解释的变化相对于无法解释的变化是否较高。显着性F检验是线性关系的假设检验。它具有相关的p值，该值允许测试运行。如果方差分析表的F值较大而相应的p值较小。拒绝原假设，并假设解释变量具有一定价值。

结论 (Conclusion)

Regression Analysis used in the broader sense. Yet, it focuses on quantifying shifts in the dependent variable related to adjustments in the independent variable. It is since all linear or non-linear regression models, link the dependent variable to the independent variables.

广义上使用回归分析。然而，它着重于量化与自变量调整相关的因变量的变化。由于所有线性或非线性回归模型都将因变量链接到自变量。

Now, take your thoughts on Twitter and Linkedin! Agree or disagree with Saurav Singla ideas and examples? Want to tell us your story? Tweet @SauravSingla_08 and Comment Saurav_Singla right now!

现在，在Twitter和Linkedin上发表您的想法！同意还是不同意Saurav Singla的想法和例子？想告诉我们您的故事吗？发推文@ SauravSingla_08和评论Saurav_Singla现在！

翻译自: https://medium.com/swlh/isnt-linear-regression-for-machine-learning-d31543f49181

batch lr替代关系

查看全文

http://www.taodudu.cc/news/show-863558.html

ai/ml_您本周应阅读的有趣的AI / ML文章（8月9日）
snowflake 使用_如何使用机器学习模型直接从Snowflake进行预测
统计 python_Python统计简介
ios 图像翻转_在iOS 14中使用计算机视觉的图像差异
熔池沉积_用于3D打印的AI（第3部分）：异常熔池分类的纠缠变分自动编码器
机器学习中激活函数和模型_探索机器学习中的激活和丢失功能
macos上的硬盘检测工具_如何在MacOS上使用双镜头面部检测器（DSFD）实现90％以上的精度
词嵌入应用_神经词嵌入的法律应用
谷歌 colab_使用Google Colab在Python中将图像和遮罩拆分为多个部分
美国人口普查年收入比赛_训练网络对收入进行分类：成人普查收入数据集
NLP分类
解构里面再次解构_解构后的咖啡：焙炒，研磨和分层，以获得更浓的意式浓缩咖啡
随机森林算法的随机性_理解随机森林算法的图形指南
南加州大学机器视觉实验室_机器学习带动南加州爱迪生的变革
机器学习特征构建_使用Streamlit构建您的基础机器学习Web应用
数学建模算法：支持向量机_从零开始的算法：支持向量机
普元部署包部署找不到构建_让我们在5分钟内构建和部署AutoML解决方案
基于决策树的多分类_R中基于决策树的糖尿病分类—一个零博客
csdn无人驾驶汽车_无人驾驶汽车100年历史
无监督学习 k-means_无监督学习-第2部分
regex 正则表达式_使用正则表达式（Regex）删除HTML标签
精度,精确率,召回率_了解并记住精度和召回率
如何在Python中建立回归模型
循环神经网络递归神经网络_了解递归神经网络中的注意力
超参数优化贝叶斯优化框架_mlmachine-使用贝叶斯优化进行超参数调整
使用线性回归的预测建模
机器学习处理不平衡数据_在机器学习中处理不平衡数据
目标检测迁移学习_使用迁移学习检测疟疾
深度学习cnn人脸检测_用于对象检测的深度学习方法：解释了R-CNN
人口预测和阻尼-增长模型_使用分类模型预测利率-第2部分

batch lr替代关系_建立关系的替代方法相关推荐

生产替代物料_一种替代的多生产者方法
生产替代物料最近在InfoQ上,Aliasei Papou发表了一篇关于他的一些实验的文章 ,该实验在线程之间进行了高性能的消息交换. 本文中有许多示例,但我将重点介绍多生产者案例. 文章显示的一种 ...
php查找特定字符并替代,php怎样查找替代字符串_后端开发
php查找替代字符串的完成要领:1.运用"substr_replace()"函数把字符串的一部分替代为另一个字符串:2.经由过程"str_replace()"函 ...
嵌入式与人工智能关系_嵌入式人工智能的发展趋势
嵌入式与人工智能关系_嵌入式人工智能的发展趋势所谓嵌入式人工智能,就是设备无须联网通过云端数据中心进行大规模计算去实现人工智能,而是在本地计算,在不联网的情况下就可以做实时的环境感知.人机交互.决策 ...
Python之pandas：pandas中数据处理常用函数(与空值相关/去重和替代)简介、具体案例、使用方法之详细攻略
Python之pandas:pandas中数据处理常用函数(与空值相关/去重和替代)简介.具体案例.使用方法之详细攻略目录 pandas中数据处理常用函数(isnull/dropna/fillna/ ...
CN3905规格书|CN3905完全替代MT3905|pin to pin替代MT3905芯片
CN3905是一款低EMI.异步.降压.开关模式转换器,带有内部功率MOSFET. 它提供了一个非常紧凑的解决方案,在广泛的输入电源范围内提供3.5A的连续电流,具有出色的负载和线路调节.CN3905 ...
技术平台应用开发专题月 | 国产化替代的本质是价值替代
在数字经济时代,软件自主创新和国产化替代已经不仅只是把ERP.CRM等软件换成国产软件,而是要以新的思路.新的应用将数智化与信创化相结合,形成中国企业真正的国产化价值替代. 国产化替代的本质是价值替代 ...
替代LT6711A功能方案| 完全替代LT6711A芯片|高性价比HDMI转EDP转换设计
替代LT6711A功能方案| 完全替代LT6711A芯片|高性价比HDMI转EDP转换设计 LT6711A是一款HDMI2.0转eDP / DP1.2支持4K 60Hz音视频转换芯片.LT6711A芯 ...
简述机器指令与微指令之间的关系_技术动态 | 跨句多元关系抽取
第一部分概述关系抽取简介关系抽取是从自由文本中获取实体间所具有的语义关系.这种语义关系常以三元组 <E1,R,E2> 的形式表达,其中,E1 和E2 表示实体,R 表示实体间所具有的 ...
mysql的表导出er关系图_使用Navicat生成ER关系图并导出的方法
平时管理数据库一般都是用cmd命令提示符,或是IDEA Intellij自带的Data source,使用Navicat比较少.这段时间,由于要对前后端交互的数据结构进行设计,直接写文档联系多表时有些 ...

batch lr替代关系_建立关系的替代方法