交叉特征有很好的效果，但人工组合发现有意义的特征很难
深度学习可以不用人工挖掘特征，还可以挖掘到专家都没找到的高阶特征
特色在于残差单元的使用，特征的表示

1 摘要

automatically combines features to produce superior models
自动组合特征以产生出色的模型
achieve superior results with only a sub-set of the features used in the production models.
仅使用生产模型中使用的特征的子集即可获得出色的结果。

2 Sponsored Search

Sponsored search is responsible for showing ads alongside organic search results
Sponsored Search 负责与自然搜索结果一起展示广告

概念	含义
Query	用户在搜索框中输入的文本字符串
Keyword	广告商指定的与产品相关的文本字符串，以匹配用户查询
Title	广告客户指定的赞助广告标题，以吸引用户的注意
Landing page（登录页面）	当用户点击相应的广告时，用户访问的产品网站
Match type	提供给广告客户的选项，可以让用户查询关键字与关键字的匹配程度如何，通常为以下四种之一：精确，词组，广泛和上下文
Campaign	一组具有相同设置（如预算和位置定位）的广告，通常用于将产品分类
Impression（展品）	向用户显示的广告实例。通常会在运行时记录展品以及其他可用信息
Click	用户是否点击了展品的指标。通常会在运行时记录一次单击以及其他可用信息
Click through rate	总点击次数超过总展示次数
Click Prediction	平台的关键模型，可预测用户针对给定查询点击给定广告的可能性

3 特征表示

Simply converting campaign ids into a onehot vector would significantly increase the size of the model.
只将广告系列 ID 转化为 onehot 向量，就会大大增加模型的大小
- One solution is to use a pair of companion features as exemplified in the table, where CampaignID is a one-hot representation consisting only of the top 10, 000 campaigns with the highest number of clicks.
  一种解决方案是使用表中示例的一对广告特征，CampaignID 是只包含点击次数最高的前 10,000 个广告的 onehot 表示
- Other campaigns are covered by CampaignIDCount, which is a numerical feature that stores per campaign statistics such as click through rate. Such features will be referred as a counting feature in the following discussions
  其他广告由 CampaignIDCount 包含，CampaignIDCount 是一个数字特征，可存储每个广告的统计信息，例如点击率。在以下讨论中，此类功能将被称为计数特征。
Deep Crossing avoids using combinatorial features. It works with both sparse and dense individual features
Deep Crossing 不使用特征组合。它可以同时处理稀疏和密集的个体特征

4 模型结构

The objective function is log loss but can be easily customized to soft-max or other functions
目标函数是 log 损失函数，但也能定义为 softmax 或其他函数
$=−1N∑i=1N(yilog⁡(pi)+(1−yi)log⁡(1−pi))(1)\text { logloss }=-\frac{1}{N} \sum_{i=1}^{N}\left(y_{i} \log \left(p_{i}\right)+\left(1-y_{i}\right) \log \left(1-p_{i}\right)\right) \tag{1}$ $p_i$ 是 Scoring 层一个节点的输出

4.1 Embedding and Stacking Layers

The embedding layer consists of a single layer of a neural network, with the general form
Embedding 由神经网络的单层组成，一般形式为
$XjO=max⁡(0,WjXjI+bj)(2)X_{j}^{O}=\max \left(\mathbf{0}, \mathbf{W}_{j} X_{j}^{I}+\mathbf{b}_{j}\right) \tag{2}$ 其中，
$XjIX^I_j$ 是 $n_j$ 维的输入特征，
$W_j$ 是 $mj×njm_j \times n_j$ 矩阵
$b$ 是 $n_j$ 维的
当 $mj<njm_j \lt n_j$ ，embedding 就可以减小输入特征的维度
这个运算参考于 ReLU
Note that both { $W_j$ } and { $b_j$ } are the parameters of the network, and will be optimized together with the other parameters in the network.
$W_j$ 和 $b_j$ 会和网络中的其他参数一起进行优化，这与 word2vec 不同

4.2 Residual Layers

源于 Residual Net 的 Residual Unit，进行了修改

The unique property of Residual Unit is to add back the original input feature after passing it through two layers of ReLU transformations
残差单元的独特属性是在经过两层 ReLU 转换后，将原始输入特征添加回去
$XO=F(XI,{W0,W1},{b0,b1})+XI(3)X^{O}=\mathcal{F}\left(X^{I},\left\{\mathbf{W}_{0}, \mathbf{W}_{1}\right\},\left\{\mathbf{b}_{0}, \mathbf{b}_{1}\right\}\right)+X^{I} \tag{3}$ $F(⋅)\mathcal{F}(\cdot)$ 表示拟合 $X^O - X^I$ 的残差
the authors believed that fitting residuals has a numerical advantage. While the actual reason why Residual Net could go as deep as 152 layers with high performance is subject to more investigations, Deep Crossing did exhibit a few properties that might benefit from the Residual Units.
在这篇论文中¹作者认为拟合残差具有数值优势。尽管“Residual Net”可以深入到 152 层还能有很高性能的实际原因尚待进一步研究，但“Deep Crossing”确实显示出一些可能会受益于“残差单元” 的属性。
Deep Crossing was applied to a wide variety of tasks. It was also applied to training data with large differences in sample sizes. It’s likely that the Residual Units are implicitly performing some kind of regularization that leads to such stability.
Deep Crossing 被应用于各种各样的任务。它也适用于样本数量差异较大的训练数据。残差单元可能会隐式执行某种正则化操作，从而导致这种稳定性。

5 相关工作

Fig. 3 is the architecture of a modified DSSM using log loss as the objective function. The modified DSSM is more closely related to the applications of click prediction. It keeps the basic structure of DSSM on the left side of the green dashed line, but uses log loss to compare the predictions with real-world labels.
图 3 是使用对数丢失作为目标函数的改进 DSSM 的体系结构。修改后的 DSSM 与点击预测的应用更紧密相关。它将 DSSM 的基本结构保留在绿色虚线的左侧，但使用对数损失将预测与实际标签进行比较。

6 总结

Deep Crossing demonstrated that with the recent advance in deep learning algorithms, modeling language, and GPU-based infrastructure, a nearly dummy solution exists for complex modeling tasks at large scale
Deep Crossing 证明了随着深度学习算法，建模语言和基于GPU的基础架构的最新发展，针对大型复杂建模任务存在着几乎是虚拟的解决方案

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385,2015. ↩︎

Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features【论文记录】相关推荐

【推荐系统论文精读系列】(八)--Deep Crossing：Web-Scale Modeling without Manually Crafted Combinatorial Features
文章目录一.摘要二.介绍三.相关工作四.搜索广告五.特征表示 5.1 独立特征 5.2 组合特征六.模型架构 6.1 Embedding层 6.2 Stacking层 6.3 Residu ...
2.Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features论文核心解读以及代码实现
一.背景微软于2016年提出的Deep Crossing可以说是深度学习CTR模型的最典型和基础性的模型.它涵盖了深度CTR模型最典型的要素,即通过加入embedding层将稀疏特征转化为低维稠密特 ...
Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features（2016）
文章目录 Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features(2016) (0)个人小结 ...
Deep Crossing——经典的深度学习架构
Deep Crossing--经典的深度学习架构论文地址基本原理网络结构图代码实现总结归纳参考文献论文地址 Deep Crossing: Web-Scale Modeling witho ...
Deep Crossing
1. 概述 Deep Crossing[1]是微软在2016年提出的用于计算CTR问题的深度神经网络模型,Deep Crossing的网络结构对传统的FNN(Feedforward Neural Ne ...
推荐系统-Deep Crossing理论与python实现
简介 2016年,微软提出Deep Crossing模型,旨在解决特征工程中特征组合的难题,降低人力特征组合的时间开销,通过模型自动学习特征的组合方式,也能达到不错的效果,且在各种任务中表现出较好的稳 ...
【搜索/推荐排序】NCF，DeepCross,Deep Crossing
文章目录 1.NCF 1.1 问题:基于FM的问题问题动机:神经网络替代点积 1.2 NCF模型图 1.3代码 2. Deep Crossing:微软:ResNet 3.Deep&Cros ...
【推荐系统】：Deep Crossing模型解析以及代码实现
Deep Crossing模型是由微软提出,在微软的搜索引擎bing的搜索广告场景当中,用户除了会返回相关的结果,还会返回相应的广告,因此尽可能的增加广告的点击率,是微软所考虑的重中之重. 因此才设计 ...
php博客系统答辩ppt,基于PHP实现的WEB图片共享系统-php(开题报告+源程序+论文+答辩PPT+文献综述)...
基于PHP实现的WEB图片共享系统-php(开题报告+源程序+论文+答辩PPT+文献综述) 如有需要请联系 QQ:958035640 摘要本系统主要从现代社会电脑化观念出发,通过对现有资料的分析. ...

Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features【论文记录】

1 摘要

2 Sponsored Search

3 特征表示

4 模型结构

4.1 Embedding and Stacking Layers

4.2 Residual Layers

5 相关工作

6 总结

Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features【论文记录】相关推荐

最新文章

热门文章