r中如何求变量的对数转换

In Simple terms, log transform squashes or compresses range of large numbers and expands the range of small numbers. So if x is larger, then slower the log(x) increments.

用简单的术语来说，对数变换可以挤压或压缩大数范围，并扩大小数范围。因此，如果x较大，则log(x)的增量会变慢。

Log transform on range(1,1000), on x axis is real value and on y axis is log transformed value.

If you closely look at the plot above, which actually talks about log transformation on values ranging from 1 to 1000. As we can see from the plot, log has transformed values from [1,1000] into [0,7] range.

如果仔细看一下上面的图，它实际上是关于从1到1000的值的对数转换。从图中可以看出，对数已将值从[1,1000]转换为[0,7]范围。

Note that how x values from 200 to 1000 get compressed into just ~5 and 7. So the larger the x, slower the log(x) increments.

请注意，如何将200到1000之间的x值压缩为仅〜5和7。因此，x越大，log(x)的增量越慢。

Log is only defined when x>0. Log 0 is undefined. It’s not a real number, let’s say Log (base 10) 0=x, so 10^x=0, if you try to solve this, you will see that no value of x raised to the power of 10 gives you zero. 10⁰ is also 1.

仅在x> 0时定义对数。日志0未定义。它不是一个实数，比方说对数(以10为底)0 = x，所以10 ^ x = 0，如果尝试解决这个问题，您会发现x的任何数值都不提高到10的幂。 10⁰也是1。

Log transform is also known as variance stabilizing transform, which is useful when dealing with heavy tailed distributions. Log transform can make highly skewed distributions less skewed. So log transform reduces or removes skewness in data.

对数变换也称为方差稳定变换，在处理重尾分布时很有用。对数变换可以使高度偏斜的分布减少偏斜。因此，对数变换可以减少或消除数据的偏斜。

Log transform reduces or removes skewness and tries to make our distribution normal.

使用对数变换作为特征工程技术： (Using Log transform as feature engineering technique:)

To reduce or remove skewness in our data distribution and make it more normal (A.K.A Gaussian distribution) we can use log transformation on our input features (X).

为了减少或消除数据分布中的偏斜并使之更正态(又称高斯分布)，我们可以对输入要素(X)使用对数变换。

We usually see heavy tailed distributions in real world data where values are right skewed(More larger values in distribution) and left skewed(More smaller values in distribution). Algorithms can be sensitive to such distribution of values and can under perform if the range is not properly normalized.

我们通常会在现实世界数据中看到重尾分布，其中值右偏(分布中的值更大)和左偏(分布中的值更小)。算法可能对这种值的分布很敏感，如果范围未正确归一化，则算法可能会表现不佳。

It is common practice to apply a logarithmic transformation on the data so that the very large and very small values do not negatively affect the performance of a learning algorithm. Log transform reduces the range of values caused by outliers.

通常的做法是对数据应用对数转换，以使非常大和非常小的值都不会对学习算法的性能产生负面影响。对数变换可减少由异常值引起的值范围。

However it is important to remember that once log transform is done, observing data in its raw form will no longer have the same original meaning, as Log transforming the data.

但是，重要的是要记住，一旦完成对数转换，以原始形式观察数据将不再具有与对数进行数据转换相同的原始含义。

Next question is: when we do linear regression and get coefficient for X (Independent variable) how do we interpret log transformed independent variables (X) coefficient (Feature importance).

下一个问题是：当我们进行线性回归并获得X(独立变量)的系数时，我们如何解释对数变换后的独立变量(X)系数(特征重要性)。

For Independent variable(X) Divide the coefficient by 100. This tells us that a 1% increase in the independent variable increases (or decreases) the dependent variable by (coefficient/100) units.

对于自变量(X)，将系数除以100。这告诉我们，自变量增加1％，因变量增加(或减少)的系数为(系数/ 100)单位。

Example: the coefficient is 0.198. 0.198/100 = 0.00198. For every 1% increase in the independent variable, our dependent variable increases by about 0.002.

示例：系数为0.198。 0.198 / 100 = 0.00198。自变量每增加1％，我们的因变量将增加约0.002。

Note: I’m also attaching a link below which dives deep into interpreting log transformed features.

注意：我还将在下面附加一个链接，以深入了解解释日志转换的功能。

在目标变量上使用对数变换： (Using Log transform on target variable:)

For example let’s consider a machine learning problem where you want to predict price of a house based on input features like (Area, number of bed rooms,…etc).

例如，让我们考虑一个机器学习问题，您希望根据输入特征(面积，床房数量等)来预测房屋价格 。

In this problem if you choose to create a linear regression model to fit prices(y) on X(Area, number of bed rooms….) and gradient descent in optimizing the model, the dataset would have some extreme prices (higher values properties) due to which your gradient descent algorithm would focus more on optimizing higher valued properties(Due to large error) and hence would produce a bad model. So performing a log transform on target variable makes sense when your performing linear regression.More importantly linear regression can predict values that are any real number (Negative values). If your model is far off, it can produce negative values, especially when predicting some of the cheaper houses. Real world values like price, income, stock price are positive so its good to log transform it before using linear regression otherwise the linear regression would predict negative values as predictions which doesn’t make sense.

在此问题中，如果您选择创建线性回归模型以在X(面积，床位数…。)上拟合价格(y)，并在优化模型时采用梯度下降，则数据集将具有一些极端价格(较高值的属性)因此，您的梯度下降算法将更多地专注于优化更高价值的属性(由于误差较大)，因此会产生错误的模型。因此，在执行线性回归时，对目标变量执行对数转换是有意义的。更重要的是，线性回归可以预测任何实数的值(负值)。如果您的模型相差太远，则可能会产生负值，尤其是在预测一些较便宜的房屋时。诸如价格，收入，股票价格等现实世界中的值都是正值，因此在使用线性回归之前最好先对其进行对数转换，否则线性回归会将负值预测为没有意义的预测。

If you look at the above example, if you chose to go with RMSE as the cost function then the model would focus more on high valued properties and would perform bad. If you chose log(Actual)-log(Predicted) value it intuitively works in optimizing the model and thereby produce a good model.

如果看上面的示例，如果选择将RMSE作为成本函数，则该模型将更多地关注高价值的房地产，并且表现不佳。如果选择log(Actual)-log(Predicted)值，则可以直观地优化模型，从而生成一个好的模型。

Model will be under more pressure on correcting large errors due to High valued properties so using log here makes sense.

由于具有高价值的属性，模型在校正大错误时将承受更大的压力，因此在此处使用log是有意义的。

Converting log predictions back to actual values.

Converting to actual predictions using np.exp:But you would need actual predictions not the log of predictions, so you can always convert back to actual predictions using exponential of the value (Log(price)).

使用np.exp转换为实际预测：但是您将需要实际预测而不是预测的对数，因此您始终可以使用值的指数(Log(price))转换回实际预测。

日志损失以改善模型 (Log loss to improve models)

Logarithmic loss (related to cross-entropy) measures the performance of a classification model where the prediction input is a probability value between 0 and 1. The goal of our machine learning models is to minimize this value. A perfect model would have a log loss of 0. Log loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high log loss.

对数损失(与交叉熵有关 )用于衡量分类模型的性能，其中预测输入为0到1之间的概率值。我们的机器学习模型的目标是最小化该值。理想模型的对数损失为0。对数损失随着预测概率与实际标签的偏离而增加。因此，当实际观察标签为1时预测0.01的概率将很糟糕，并会导致较高的对数损失。

Log loss in binary classification setting

If you look at the above example when true value is 1 and predicted probability is 0.1, the log loss is high. Whereas when true value is 1 and predicted probability is 0.9, log loss is low.

如果看上面的示例，当true值为1且预测概率为0.1时，对数损失很高。而当真实值为1且预测概率为0.9时，对数损失较低。

文本分类中的日志转换(自然语言处理) (Log transformation in Text Classification (Natural language processing))

We use tf-idf method to encode our text data to fit machine learning models. Tf-idf uses log transform on inverse document frequency, so the word that appears in every single document will be effectively zeroed out, and a word that appears in very few documents will have an even larger count than before.

我们使用tf-idf方法对文本数据进行编码，以适合机器学习模型。 Tf-idf对文档的逆频率使用对数变换，因此每个文档中出现的单词将被有效地清零，而在很少文档中出现的单词的计数将比以前更大。

Please share this article if it helped you understand how important log is to machine learning. Do comment if you have any questions.

如果可以帮助您了解日志对机器学习的重要性，请分享此文章。如有任何疑问，请发表评论。

GOOD DAY!

美好的一天！

Reference:1. https://data.library.virginia.edu/interpreting-log-transformations-in-a-linear-model/#:~:text=We%20simply%20log%2Dtransform%20x.&text=To%20interpret%20the%20slope%20coefficient%20we%20divide%20it%20by%20100.&text=The%20result%20is%20multiplying%20the,variable%20by%20the%20coefficient%2F100.2. http://wiki.fast.ai/index.php/Log_Loss#:~:text=Logarithmic%20loss%20(related%20to%20cross,a%20log%20loss%20of%200.

参考： 1. https://data.library.virginia.edu/interpreting-log-transformations-in-a-linear-model/#:~:text=We%20simply%20log%2Dtransform%20x.&text=To% 20解释％20the％20slope％20coefficient％20we％20divide％20it％20by％20100。＆text =％20result％20is％20乘以％20the，可变％20by％20the％20coefficient％2F100。 2. http://wiki.fast.ai/index.php/Log_Loss#:~:text=Logarithmic%20loss%20(related%20to%20cross,a%20log%20loss%20of%200。

翻译自: https://medium.com/analytics-vidhya/log-transform-for-positivity-d3e1f183c804

r中如何求变量的对数转换

查看全文

http://www.taodudu.cc/news/show-863634.html

美团脱颖而出的经验_使数据科学项目脱颖而出的6种方法
aws rds同步_将数据从Python同步到AWS RDS
扫描二维码读取文档_使用深度学习读取和分类扫描的文档
电路分析导论_生存分析导论
强化学习-第3部分
范数在机器学习中的作用_设计在机器学习中的作用
贝叶斯深度神经网络_深度学习为何胜过贝叶斯神经网络
模型监控psi_PSI和CSI：前2个模型监控指标
flask渲染图像_用于图像推荐的Flask应用
pytorch贝叶斯网络_贝叶斯神经网络：2个在TensorFlow和Pytorch中完全连接
稀疏组套索_Python中的稀疏组套索
deepin中zz_如何解决R中的FizzBuzz问题
图像生成对抗生成网络gan_GAN生成汽车图像
生成模型和判别模型_生成模型和判别模型简介
机器学习算法拟合曲线_制定学习曲线以检测机器学习算法中的错误
重拾强化学习的核心概念_强化学习的核心概念
gpt 语言模型_您可以使用语言模型构建的事物的列表-不仅仅是GPT-3
廉价raid_如何查找80行代码中的廉价航班
深度学习数据集制作工作_创建我的第一个深度学习+数据科学工作站
pytorch线性回归_PyTorch中的线性回归
spotify音乐下载_使用Python和R对音乐进行聚类以在Spotify上创建播放列表。
强化学习之基础入门_强化学习基础
在置信区间下置信值的计算_使用自举计算置信区间
步进电机无细分和20细分_细分网站导航会话
python gis库_使用开放的python库自动化GIS和遥感工作流
mask rcnn实例分割_使用Mask-RCNN的实例分割
使用FgSegNet进行前景图像分割
完美下巴标准_平行下颚抓
api 规则定义_API有规则，而且功能强大
r语言模型评估:_情感分析评估：对自然语言处理的过去和未来的反思

r中如何求变量的对数转换_对数转换以求阳性。相关推荐

main c语言中变量的定义,C语言中在main函数中定义的变量是全局变量么_后端开发...
PHP 和 JavaSript 区别_后端开发 PHP是一种创建动态交互性站点的强有力的服务器端脚本语言,主要用于Web开发领域,而JavaSript是一种具有函数优先的轻量级,解释型或即时编译型的高 ...
c++ 求四边形面积和周长_面向对象c++——三角形求周长和面积
这几天放假耍了几天,没有ACM题可贴,就只有贴作业了,很水的作品请指教源代码: /*************************************** c++编程题定义一个三角形类求三角 ...
r中gglot怎么组合多张图_最终版本Science级组合图表绘制
简介 ggcor 是厚哥最近的作品,功能完全代替了前两次的你终于可以做这张图和重大升级的两个science组合图表绘制.这里我也为大家带肋实战教程,总体来说厚哥这个ggcor包用起来还是挺方便的,将 ...
matlab中函数或变量无法识别怎么办_用MATLAB巧解微分方程实例分析
点"考研竞赛数学"↑可每天"涨姿势"哦! MATLAB巧解微分方程实例分析王少华西安电子科技大学微分方程求解难, 字母一堆看着烦. 写错数字一时爽, 一直 ...
r中gglot怎么组合多张图_利用ggplot将多个图形组合在一起
关于ggplot2作图的问题,不少人关心如何将多个图形组合在一起,下面给大家分享一个网上的例子,下图就是最后的结果.画这个图有几个障碍,一个是二维散点的置信椭圆,另一个是一维直方图的边缘显示.解决的方 ...
r中gglot怎么组合多张图_继电器组合扫盲篇
好多小伙伴留言说,经常看到一些继电器组合,但是不知道是做什么用的,也不知道是不是定型组合,有什么用,怎么命名的?下面咱们就这些问题开展延伸,有需要的小伙伴可以先收藏,懂得多的可以留言区里补充. 先来一 ...
hz和分贝怎么转换_分贝转换
分贝转换小常识我们把常用的单位 , 不同的称呼转换告诉大家方便使用公式: dBm=10logW/1mW dB μ =20logV/1 μ V db=20logX/1 0dbm=1mW 0dB μ ...
mysql sqlite转换_数据库转换工具(SqliteToMysql)
SqliteToMysql是一款用于SQLITE和MYSQL之间的数据库转换工具.它能够将SQLITE数据转换成MYSQL数据库,让用户自主配置转换条件,满足用户的数据库格式需要.. 相关软件软件大小 ...
unity3d 求两个点长度_三年级上册求组合图形周长专项练习，附答案
周长是指封闭图形一周的长度. 先画图,再计算. 数学解题方法:平移法和等量代换法. 三年级上册数学第七单元<认识四边形>三年级上册数学第七单元<长方形和正方形的特点和关系> 三 ...

r中如何求变量的对数转换_对数转换以求阳性。

使用对数变换作为特征工程技术： (Using Log transform as feature engineering technique:)

在目标变量上使用对数变换： (Using Log transform on target variable:)

日志损失以改善模型 (Log loss to improve models)

文本分类中的日志转换(自然语言处理) (Log transformation in Text Classification (Natural language processing))

相关文章：

r中如何求变量的对数转换_对数转换以求阳性。相关推荐

最新文章

热门文章