谷歌于2016年提出的Wide&Deep模型。Wide&Deep模型的主要思路正如其名，是由单层的Wide部分和多层的Deep部分组成的混合模型。其中，Wide部分的主要作用是让模型具有较强的“记忆能力”; Deep部分的主要作用是让模型具有“泛化能力”，正是这样的结构特点，使模型兼具了逻辑回归和深度神经网络的优点——能够快速处理并记忆大量历史行为特征，并且具有强大的表达能力，不仅在当时迅速成为业界争相应用的主流模型，而且衍生出了大量以Wide&Deep模型为基础结构的混合模型，影响力一直延续至今。

模型的记忆能力和泛化能力

Wide&Deep模型的设计初衷和其最大的价值在于同时具备较强的“记忆能力”和“泛化能力”。
记忆能力可以被理解为模型直接学习并利用历史数据中物品或者特征的 “共现频率”的能力。一般来说，协同过滤、逻辑回归等简单模型有较强的“记忆能力”。由于这类模型的结构简单，原始数据往往可以直接影响推荐结果，产生类似于“如果点击过A，就推荐B”这类规则式的推荐，这就相当于模型直接记住了历史数据的分布特点，并利用这些记忆进行推荐。像逻辑回归这类简单模型，如果发现这样的“强特征”，则其相应的权重就会在模型训练过程中被调整得非常大，这样就实现了对这个特征的直接记忆。相反，对于多层神经网络来说，特征会被多层处理，不断与其他特征进行交叉，因此模型对这个强特征的记忆反而没有简单模型深刻。
泛化能力可以被理解为模型传递特征的相关性，以及发掘稀疏甚至从未出现过的稀有特征与最终标签相关性的能力。矩阵分解比协同过滤的泛化能力强，因为矩阵分解引入了隐向量这样的结构，使得数据稀少的用户或者物品也能生成隐向量，从而获得有数据支撑的推荐得分，这就是非常典型的将全局数据传递到稀疏物品上，从而提高泛化能力的例子。再比如，深度神经网络通过特征的多次自动组合，可以深度发掘数据中潜在的模式，即使是非常稀疏的特征向量输入，也能得到较稳定平滑的推荐概率，这就是简单模型所缺乏的“泛化能力”。

Wide&Deep模块的结构

既然简单模型的“记忆能力”强，深度神经网络的“泛化能力”强，那么设计Wide&Deep模型的直接动机就是将二者融合，具体的模型结构如图所示。

Wide&Deep模型把单输入层的Wide部分与由Embedding层和多隐层组成的Deep部分连接起来，一起输入最终的输出层。单层的Wide部分善于处理大量稀疏的id类特征；Deep部分利用神经网络表达能力强的特点，进行深层的特征交叉，挖掘藏在特征背后的数据模式。最终，利用逻辑回归模型，输出层将Wide部分和Deep部分组合起来，形成统一的模型。
从下图中可以详细地了解到Wide&Deep模型到底将哪些特征作为Deep部分的输入，将哪些特征作为Wide部分的输入。

Deep部分的输入是全量的特征向量，包括用户年龄(Age)、已安装应用数量(#App Installs )、设备类型(Device Class )、已安装应用(User Installed App )、曝光应用(Impression App)等特征。已安装应用、曝光应用等类别型特征，需要经过Embedding层输入连接层,拼接成1200维的Embedding向量，再依次经过3层ReLU全连接层，最终输入LogLoss输出层。
Wide部分的输入仅仅是已安装应用和曝光应用两类特征，其中已安装应用代表用户的历史行为，而曝光应用代表当前的待推荐应用。选择这两类特征的原因是充分发挥Wide部分“记忆能力”强的优势。正如之前所举的“记忆能力” 的例子，简单模型善于记忆用户行为特征中的信息，并根据此类信息直接影响推荐结果。
Wide部分“已安装应用”和“曝光应用”两个特征的函数被称为交叉积变换函数，其形式化定义与下式所示：

Cki是一个布尔变量，当第i个特征值属于第k个组合特征时，Cki的值为1，否则为0；Xi是第i个特征的值。例如，对于“(user_installed_app=netflix,impression_app=pandora)”这个组合特征来说，只有当user_installed_app=netflix和impression_app=pandora这两个特征同时为1时，其对应的交叉积变换层的结果才为1，否则为0。
在通过交叉积变换层操作完成特征组合之后，Wide部分将组合特征输人最终的LogLoss输出层，与Deep部分的输出一同参与最后的目标拟合，完成Wide与Deep部分的融合。

算法实现

import tensorflow as tf# 数据集加载# Training samples path, change to your local path
training_samples_file_path = "trainingSamples.csv"
# Test samples path, change to your local path
test_samples_file_path = "testSamples.csv"
# load sample as tf dataset
def get_dataset(file_path):dataset = tf.data.experimental.make_csv_dataset(file_path,batch_size=12,label_name='label',na_value="0",num_epochs=1,ignore_errors=True)return dataset
# split as test dataset and training dataset
train_dataset = get_dataset(training_samples_file_path)
test_dataset = get_dataset(test_samples_file_path)# 特征工程# genre features vocabulary
genre_vocab = ['Film-Noir', 'Action', 'Adventure', 'Horror', 'Romance', 'War', 'Comedy', 'Western', 'Documentary','Sci-Fi', 'Drama', 'Thriller','Crime', 'Fantasy', 'Animation', 'IMAX', 'Mystery', 'Children', 'Musical']GENRE_FEATURES = {'userGenre1': genre_vocab,'userGenre2': genre_vocab,'userGenre3': genre_vocab,'userGenre4': genre_vocab,'userGenre5': genre_vocab,'movieGenre1': genre_vocab,'movieGenre2': genre_vocab,'movieGenre3': genre_vocab
}
# 类别型特征转换成one-hot向量
# all categorical features
categorical_columns = []for feature, vocab in GENRE_FEATURES.items():cat_col = tf.feature_column.categorical_column_with_vocabulary_list(key=feature, vocabulary_list=vocab)emb_col = tf.feature_column.embedding_column(cat_col, 10)categorical_columns.append(emb_col)# movie id embedding feature
movie_col = tf.feature_column.categorical_column_with_identity(key='movieId', num_buckets=1001)
movie_emb_col = tf.feature_column.embedding_column(movie_col, 10)
categorical_columns.append(movie_emb_col)# user id embedding feature
user_col = tf.feature_column.categorical_column_with_identity(key='userId', num_buckets=30001)
user_emb_col = tf.feature_column.embedding_column(user_col, 10)
categorical_columns.append(user_emb_col)# 数值型特征直接输入
# all numerical features
numerical_columns = [tf.feature_column.numeric_column('releaseYear'),tf.feature_column.numeric_column('movieRatingCount'),tf.feature_column.numeric_column('movieAvgRating'),tf.feature_column.numeric_column('movieRatingStddev'),tf.feature_column.numeric_column('userRatingCount'),tf.feature_column.numeric_column('userAvgRating'),tf.feature_column.numeric_column('userRatingStddev')]# cross feature between current movie and user historical movie
# 在生成 crossed_feature 的过程中，我其实仿照了 Google Play 的应用方式，生成了一个由“用户已好评电影”和“当前评价电影”组成的一个交叉特征，就是代码中的 crossed_feature，设置这个特征的目的在于让模型记住好评电影之间的相关规则，更具体点来说就是，就是让模型记住“一个喜欢电影 A 的用户，也会喜欢电影 B”这样的规则。#tf.feature_column.categorical_column_with_identity返回的是one-hot向量
rated_movie = tf.feature_column.categorical_column_with_identity(key='userRatedMovie1', num_buckets=1001)
#tf.feature_column.categorical_column_with_identity返回的是multi-hot向量
#tf.feature_column.crossed_column返回的是两个特征的交叉
crossed_feature = tf.feature_column.indicator_column(tf.feature_column.crossed_column([movie_col, rated_movie], 10000))# define input for keras model
inputs = {'movieAvgRating': tf.keras.layers.Input(name='movieAvgRating', shape=(), dtype='float32'),'movieRatingStddev': tf.keras.layers.Input(name='movieRatingStddev', shape=(), dtype='float32'),'movieRatingCount': tf.keras.layers.Input(name='movieRatingCount', shape=(), dtype='int32'),'userAvgRating': tf.keras.layers.Input(name='userAvgRating', shape=(), dtype='float32'),'userRatingStddev': tf.keras.layers.Input(name='userRatingStddev', shape=(), dtype='float32'),'userRatingCount': tf.keras.layers.Input(name='userRatingCount', shape=(), dtype='int32'),'releaseYear': tf.keras.layers.Input(name='releaseYear', shape=(), dtype='int32'),'movieId': tf.keras.layers.Input(name='movieId', shape=(), dtype='int32'),'userId': tf.keras.layers.Input(name='userId', shape=(), dtype='int32'),'userRatedMovie1': tf.keras.layers.Input(name='userRatedMovie1', shape=(), dtype='int32'),'userGenre1': tf.keras.layers.Input(name='userGenre1', shape=(), dtype='string'),'userGenre2': tf.keras.layers.Input(name='userGenre2', shape=(), dtype='string'),'userGenre3': tf.keras.layers.Input(name='userGenre3', shape=(), dtype='string'),'userGenre4': tf.keras.layers.Input(name='userGenre4', shape=(), dtype='string'),'userGenre5': tf.keras.layers.Input(name='userGenre5', shape=(), dtype='string'),'movieGenre1': tf.keras.layers.Input(name='movieGenre1', shape=(), dtype='string'),'movieGenre2': tf.keras.layers.Input(name='movieGenre2', shape=(), dtype='string'),'movieGenre3': tf.keras.layers.Input(name='movieGenre3', shape=(), dtype='string'),
}# wide and deep model architecture
# deep part for all input features
deep = tf.keras.layers.DenseFeatures(numerical_columns + categorical_columns)(inputs)
deep = tf.keras.layers.Dense(128, activation='relu')(deep)
deep = tf.keras.layers.Dense(128, activation='relu')(deep)# wide part for cross feature
wide = tf.keras.layers.DenseFeatures(crossed_feature)(inputs)
both = tf.keras.layers.concatenate([deep, wide])
output_layer = tf.keras.layers.Dense(1, activation='sigmoid')(both)
model = tf.keras.Model(inputs, output_layer)# compile the model, set loss function, optimizer and evaluation metrics
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy', tf.keras.metrics.AUC(curve='ROC'), tf.keras.metrics.AUC(curve='PR')])# train the model
model.fit(train_dataset, epochs=5)# evaluate the model
test_loss, test_accuracy, test_roc_auc, test_pr_auc = model.evaluate(test_dataset)
print('\n\nTest Loss {}, Test Accuracy {}, Test ROC AUC {}, Test PR AUC {}'.format(test_loss, test_accuracy,test_roc_auc, test_pr_auc))# print some predict results
predictions = model.predict(test_dataset)
for prediction, goodRating in zip(predictions[:12], list(test_dataset)[0][1][:12]):print("Predicted good rating: {:.2%}".format(prediction[0])," | Actual rating label: ",("Good Rating" if bool(goodRating) else "Bad Rating"))

Wide&Deep模型的进化——Deep&Cross模型

Wide&Deep模型的提出不仅综合了 “记忆能力”和“泛化能力”，而且开启了不同网络结构融合的新思路。在Wide&Deep模型之后，有越来越多的工作集中于分別改进Wide&Deep模型的Wide部分或是Deep部分。较典型的工作是2017年由斯坦福大学和谷歌的研究人员提出的Deep&Cross模型（简称DCN ）。
Deep&Cross模型的结构图如图所示，其主要思路是使用Cross网络替代原来的Wide部分。Deep部分的设计思路并没有本质的改变。

设计Cross网络的目的是增加特征之间的交互力度，使用多层交叉层（Cross layer ）对输入向量进行特征交叉。假设第i层交叉层的输出向量为那么第l+1 层的输出向量如下式所示。

可以看到，交叉层操作的二阶部分非常类似于PNN模型中提到的外积操作，在此基础上增加了外积操作的权重向量w以及原输入向量x和偏置向量 b。交叉层的操作如图所示。

可以看出，交叉层在增加参数方面是比较“克制”的，每一层仅增加了一个n维的权重向量w，并且在每一层均保留了输入向量，因此输出与输入之间的变化不会特别明显。由多层交叉层组成的Cross 网络在Wide&Deep模型中Wide部分的基础上进行特征的自动化交叉，避免了更多基于业务理解的人工特征组合。同Wide&Deep模型一样，Deep&Cross模型的Deep 部分相比Cross部分表达能力更强，使模型具备更强的非线性学习能力。

Wide&Deep模型的影响力

Wide&Deep模型的影响力无疑是巨大的，不仅其本身成功应用于多家一线互联网公司，而且其后续的改进创新工作也延续至今。事实上，DeepFM、NFM等模型都可以看成Wide&Deep模型的延伸。
Wide&Deep模型能够取得成功的关键在于：
(1 )抓住了业务问题的本质特点，能够融合传统模型记忆能力和深度学习模型泛化能力的优势。
(2)模型的结构并不复杂，比较容易在工程上实现、训练和上线，这加速了其在业界的推广应用。
也正是从Wide&Deep模型之后，越来越多的模型结构被加人推荐模型中，深度学习模型的结构开始朝着多样化、复杂化的方向发展。

Widedeep模型详解相关推荐

使用pickle保存机器学习模型详解及实战（pickle、joblib）
使用pickle保存机器学习模型详解及实战 pickle模块实现了用于序列化和反序列化Python对象结构的二进制协议. "Pickling"是将Python对象层次结构转换为字节 ...
Transformer 模型详解
Transformer 是 Google 的团队在 2017 年提出的一种 NLP 经典模型,现在比较火热的 Bert 也是基于 Transformer.Transformer 模型使用了 Self- ...
TensorFlow Wide And Deep 模型详解与应用 TensorFlow Wide-And-Deep 阅读344 作者简介：汪剑，现在在出门问问负责推荐与个性化。曾在微软雅虎工作，
TensorFlow Wide And Deep 模型详解与应用 TensorFlow Wide-And-Deep 阅读344 作者简介:汪剑,现在在出门问问负责推荐与个性化.曾在微软雅虎工作,从事 ...
TensorFlow Wide And Deep 模型详解与应用
Wide and deep 模型是 TensorFlow 在 2016 年 6 月左右发布的一类用于分类和回归的模型,并应用到了 Google Play 的应用推荐中 [1].wide and dee ...
数学建模——智能优化之模拟退火模型详解Python代码
数学建模--智能优化之模拟退火模型详解Python代码 #本功能实现最小值的求解#from matplotlib import pyplot as plt import numpy as np imp ...
数学建模——智能优化之粒子群模型详解Python代码
数学建模--智能优化之粒子群模型详解Python代码 import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplo ...
数学建模——支持向量机模型详解Python代码
数学建模--支持向量机模型详解Python代码 from numpy import * import random import matplotlib.pyplot as plt import num ...
数学建模——一维、二维插值模型详解Python代码
数学建模--一维.二维插值模型详解Python代码一.一维插值 # -*-coding:utf-8 -*- import numpy as np from scipy import interpol ...
数学建模——线性规划模型详解Python代码
数学建模--线性规划模型详解Python代码标准形式为: min z=2X1+3X2+x s.t x1+4x2+2x3>=8 3x1+2x2>=6 x1,x2,x3>=0 上述线性 ...
数学建模_随机森林分类模型详解Python代码
数学建模_随机森林分类模型详解Python代码随机森林需要调整的参数有: (1) 决策树的个数 (2) 特征属性的个数 (3) 递归次数(即决策树的深度)''' from numpy import ...

Widedeep模型详解