根据已有的1682部电影和943用户及部分用户对电影的评分数据,对新用户作电影推荐或预测未评价的评分。

一. 准备工作

1. 加载ex8_movies.mat数据

Y (1682*943) 用户评份数据,由1-5组成;

R 标记矩阵,R(i,j)=1代表用户j评分了电影i,没评的为0;

目标是对用户没评分的电影作预测;同时把预测分最高的电影推荐给用户。

同时,为更好地理解矩阵Y,我们试着计算下用户对第一部电影Toy Story的平均评分,即 mean(Y(1,R(1,:)))=

Average rating for movie 1 (Toy Story): 3.878319 / 5

(这里的算式确实很精简,选对函数很重要,否则按正常顺序来计算也能得到结果,但麻烦很多,如

R1_num=length(find(R(1,:)))
R1_num =  452
>> avg=sum(Y(1,:))/R1_num
avg =  3.8783

另外,还使用imagesc(Y)对Y的数据进行可视化,对直接了解矩阵数据的稀疏性、数据分布的均匀性等有帮助。

2. 特征矩阵X和参数矩阵Theta

X中的第i行x(i)表示第i部电影的特征向量,即,有多少个电影就有多少行,列数为特征数

Theta中的第j行表示第j个用户的参数,即行为用户数,列为用户参数的维度数

两者均为维度相同的向量,假如维度为100,刚X=Nm*100, Theta=Nu*100,相乘后为Nm*Nu,即评分矩阵。

二. 协同过滤算法

根据Y(i,j)=((Theta(j)T)*X(i),通过学习得到误差最小时的参数向量X(1), X(2), ...X(nm), Theta(1), Theat(2),...Theta(nu),即矩阵X,Theta.

下面分有无正则化两种情况进行成本函数,梯度等的说明,差别公在于有无lambda项。

1. 协同过滤的成本函数(无正则化)

对R(i,j)=1中所有的项依次累加得到J.

实现技巧:sum(sum(R.*M) 要得出R与M乘积后得到的矩阵中所有元素的和 (R.*M得到一矩阵,内层sum()计算出各列的和,是一向量,外层sum()计算出内层sum()中各向量的和。

2. 协同过滤梯度(无正则化)

这里得到的X梯度和Theta梯度分别是与X,Theat size一样的矩阵,即X, Theta中各元素对应位置上都将有梯度值。

汇总这两步中的实现代码为:cofiCostFunc.m

J=J+sum(sum((R.*(X*Theta'-Y).^2)))/2;

for i=1:size(X,1)  
    idx = find(R(i,:)==1);  
    Thetatemp = Theta(idx,:);  
    Ytemp = Y(i,idx);  
    X_grad(i,:)=(X(i,:)*Thetatemp'-Ytemp)*Thetatemp;

for j=1:size(Theta,1)  
    idx = find(R(:,j)==1);  
    Xtemp = X(idx,:);  
    Ytemp = Y(idx,j);  
  Theta_grad(j,:)=(Xtemp*Theta(j,:)'-Ytemp)'*Xtemp;

3. 正则化的成本函数和梯度

J=J+sum(sum((R.*(X*Theta'-Y).^2)))/2+lambda/2*sum(sum(Theta.^2))+...  
    lambda/2*sum(sum(X.^2));

for i=1:size(X,1)  
    idx = find(R(i,:)==1);  
    Thetatemp = Theta(idx,:);  
    Ytemp = Y(i,idx);  
    X_grad(i,:)=(X(i,:)*Thetatemp'-Ytemp)*Thetatemp+lambda*X(i,:);  
end

for j=1:size(Theta,1)  
    idx = find(R(:,j)==1);  
    Xtemp = X(idx,:);  
    Ytemp = Y(idx,j);  
    Theta_grad(j,:)=(Xtemp*Theta(j,:)'-Ytemp)'*Xtemp+lambda*Theta(j,:);

4. 电影评分学习和推荐

参照Y的形式,在Y中增加一列(即多一个用户的评分),即Y,R均变为1683*943的矩阵。

此次训练我们取特征数为10,对Y、R归一化后,随机初始化X,Theta,设定迭代100次,且lambda=10, 得到优化后的X, Theta,然后对新增的用户进行推荐。

Y = [my_ratings Y];
R = [(my_ratings ~= 0) R];

%  Normalize Ratings
[Ynorm, Ymean] = normalizeRatings(Y, R);

%  Useful Values
num_users = size(Y, 2);
num_movies = size(Y, 1);
num_features = 10;

% Set Initial Parameters (Theta, X)
X = randn(num_movies, num_features);
Theta = randn(num_users, num_features);

initial_parameters = [X(:); Theta(:)];

% Set options for fmincg
options = optimset('GradObj', 'on', 'MaxIter', 100);

% Set Regularization
lambda = 10;
theta = fmincg (@(t)(cofiCostFunc(t, Ynorm, R, num_users, num_movies, ...
                                num_features, lambda)), ...            initial_parameters, options);

% Unfold the returned theta back into U and W
X = reshape(theta(1:num_movies*num_features), num_movies, num_features);
Theta = reshape(theta(num_movies*num_features+1:end), ...
                num_users, num_features);

%% ================== Part 8: Recommendation for you ====================

%  After training the model, you can now make recommendations by computing
%  the predictions matrix.%

p = X * Theta';
my_predictions = p(:,1) + Ymean;

movieList = loadMovieList();

[r, ix] = sort(my_predictions, 'descend');
fprintf('\nTop recommendations for you:\n');
for i=1:10
    j = ix(i);
    fprintf('Predicting rating %.1f for movie %s\n', my_predictions(j),  movieList{j});
end

fprintf('\n\nOriginal ratings provided:\n');
for i = 1:length(my_ratings)
    if my_ratings(i) > 0 
        fprintf('Rated %d for %s\n', my_ratings(i),movieList{i});
    end
end

New user ratings:
Rated 4 for Toy Story (1995)
Rated 3 for Twelve Monkeys (1995)
Rated 5 for Usual Suspects, The (1995)
Rated 4 for Outbreak (1995)
Rated 5 for Shawshank Redemption, The (1994)
Rated 3 for While You Were Sleeping (1995)
Rated 5 for Forrest Gump (1994)
Rated 2 for Silence of the Lambs, The (1991)
Rated 4 for Alien (1979)
Rated 5 for Die Hard 2 (1990)
Rated 5 for Sphere (1998)

lambda=10,  num_features=10, 迭代100次:

Predicting rating 5.0 for movie Aiqing wansui (1994)
Predicting rating 5.0 for movie Someone Else's America (1995)
Predicting rating 5.0 for movie Santa with Muscles (1996)
Predicting rating 5.0 for movie Saint of Fort Washington, The (1993)
Predicting rating 5.0 for movie Entertaining Angels: The Dorothy Day Story (1996)
Predicting rating 5.0 for movie They Made Me a Criminal (1939)
Predicting rating 5.0 for movie Marlene Dietrich: Shadow and Light (1996)
Predicting rating 5.0 for movie Prefontaine (1997)
Predicting rating 5.0 for movie Star Kid (1997)
Predicting rating 5.0 for movie Great Day in Harlem, A (1994)

相同参数情况下,迭代1000次及修改lambda=1.5时,分别迭代100,1000次,推荐的片名已差别不大,只是个别顺序略有差异而已。

如lambda=10,  num_features=10, 迭代1000次

Predicting rating 5.0 for movie Star Kid (1997)
Predicting rating 5.0 for movie Entertaining Angels: The Dorothy Day Story (1996)
Predicting rating 5.0 for movie Saint of Fort Washington, The (1993)
Predicting rating 5.0 for movie Santa with Muscles (1996)
Predicting rating 5.0 for movie Prefontaine (1997)
Predicting rating 5.0 for movie Marlene Dietrich: Shadow and Light (1996)
Predicting rating 5.0 for movie They Made Me a Criminal (1939)
Predicting rating 5.0 for movie Great Day in Harlem, A (1994)
Predicting rating 5.0 for movie Someone Else's America (1995)
Predicting rating 5.0 for movie Aiqing wansui (1994)
lambda=1.5
Predicting rating 5.0 for movie Saint of Fort Washington, The (1993)
Predicting rating 5.0 for movie Someone Else's America (1995)
Predicting rating 5.0 for movie Star Kid (1997)
Predicting rating 5.0 for movie They Made Me a Criminal (1939)
Predicting rating 5.0 for movie Marlene Dietrich: Shadow and Light (1996)
Predicting rating 5.0 for movie Santa with Muscles (1996)
Predicting rating 5.0 for movie Aiqing wansui (1994)
Predicting rating 5.0 for movie Prefontaine (1997)
Predicting rating 5.0 for movie Great Day in Harlem, A (1994)
Predicting rating 5.0 for movie Entertaining Angels: The Dorothy Day Story (1996)

把num_features修改为20,不断调整lambda的大小,如10,3,1.5, 0.8,0.3等,分别迭代100次,将会发现,

成本函数随着lambda的减少,也在减少,但推荐列表中的电影评分已超出5,

lambda=3, Iteration   100 | Cost: 2.481120e+004
lambda=0.8, Iteration   100 | Cost: 1.827028e+004
ladbda=0.3, Iteration   100 | Cost: 1.644490e+004
lambda=0.1, Iteration   100 | Cost: 1.637439e+004
lambda=0.03, Iteration   100 | Cost: 1.625596e+004

lambda=0.1时的推荐列表为:

Top recommendations for you:
Predicting rating 12.3 for movie Don't Be a Menace to South Central While Drinking Your Juice in the Hood (1996)
Predicting rating 9.9 for movie Pather Panchali (1955)
Predicting rating 9.5 for movie Aristocats, The (1970)
Predicting rating 9.4 for movie Warriors of Virtue (1997)
Predicting rating 9.3 for movie Cérémonie, La (1995)
Predicting rating 9.2 for movie Joy Luck Club, The (1993)
Predicting rating 9.2 for movie Juror, The (1996)
Predicting rating 9.1 for movie Mouse Hunt (1997)
Predicting rating 9.0 for movie Substitute, The (1996)
Predicting rating 8.8 for movie Crossing Guard, The (1995)

lambda=0.8时的推荐列表为:

Predicting rating 6.1 for movie Happy Gilmore (1996)
Predicting rating 5.9 for movie Desperado (1995)
Predicting rating 5.8 for movie Gattaca (1997)
Predicting rating 5.7 for movie Four Rooms (1995)
Predicting rating 5.7 for movie Sleepers (1996)
Predicting rating 5.6 for movie Monty Python and the Holy Grail (1974)
Predicting rating 5.5 for movie Saint, The (1997)
Predicting rating 5.4 for movie Stargate (1994)
Predicting rating 5.4 for movie Career Girls (1997)
Predicting rating 5.3 for movie In Love and War (1996)

比较合适的可能还是lambda=3时,成本函数值较小,推荐列表评分也合适,电影片名与lambda=10时也出入不大。

lambda=3, feature=10, Iteration   100 | Cost: 3.070014e+004
Recommender system learning completed.

Program paused. Press enter to continue.

Top recommendations for you:
Predicting rating 5.0 for movie Prefontaine (1997)
Predicting rating 5.0 for movie Someone Else's America (1995)
Predicting rating 5.0 for movie Star Kid (1997)
Predicting rating 5.0 for movie Aiqing wansui (1994)
Predicting rating 5.0 for movie Marlene Dietrich: Shadow and Light (1996)
Predicting rating 5.0 for movie Great Day in Harlem, A (1994)
Predicting rating 5.0 for movie They Made Me a Criminal (1939)
Predicting rating 5.0 for movie Santa with Muscles (1996)
Predicting rating 5.0 for movie Entertaining Angels: The Dorothy Day Story (1996)
Predicting rating 5.0 for movie Saint of Fort Washington, The (1993)

简要总结一下:计算和推荐时,还是需要不断调整参数,根据结果选择较优的。这个不断调整参数的过程,确实是个时间和经验活。

斯坦福机器学习Coursera课程:第八次作业--推荐系统相关推荐

  1. 机器学习coursera 第三章编程作业

    机器学习coursera 第三章编程作业 Multi-class Classification and Neural Networks lrCostFunction 整个题目给了两个数据集,一个是关于 ...

  2. 吴恩达《机器学习》课程总结(16)推荐系统

    16.1问题形式化 (1)讲推荐系统的原因主要有以下几点: 1.推荐系统是一个很重要的机器学习的应用,虽然在学术界上占比较低,但是在商业应用中非常的重要,占有很高的优先级. 2.传达机器学习的一个大思 ...

  3. 吴恩达Coursera, 机器学习专项课程, Machine Learning:Advanced Learning Algorithms第三周编程作业...

    吴恩达Coursera, 机器学习专项课程, Machine Learning:Advanced Learning Algorithms第三周所有jupyter notebook文件: 吴恩达,机器学 ...

  4. 吴恩达Coursera, 机器学习专项课程, Machine Learning:Advanced Learning Algorithms第二周编程作业...

    吴恩达Coursera, 机器学习专项课程, Machine Learning:Advanced Learning Algorithms第二周所有jupyter notebook文件: 吴恩达,机器学 ...

  5. 斯坦福 机器学习课程汇总

    斯坦福 机器学习课程汇总 以下内容来自:http://studyai.site/ 前言 首先感谢吴恩达建立Coursera这样一个优秀的在线学习平台,以及他发布在这个平台上的机器学习课程. 这门课程将 ...

  6. 斯坦福2019秋季课程:图机器学习资料全公开

    点上方蓝字计算机视觉联盟获取更多干货 在右上方 ··· 设为星标 ★,与你不见不散 编辑:Sophia 计算机视觉联盟  报道  | 公众号 CVLianMeng 转载于 :斯坦福,量子位 [人工智能 ...

  7. 极客日报:苹果承认从2019年开始扫描用户邮件寻找虐童资料;新浪回应“花钱买热搜”传闻;李沐斯坦福《机器学习》课程上线

    一分钟速览新闻点! 小米成立公寓管理公司:员工宿舍,增强员工幸福感 阿里云回应用户注册信息泄露事件 新浪公布微博热搜管理规则:不存在商业售卖位置 阿里组织调整:俞永福担任本地生活 CEO 高德推出 D ...

  8. 温州大学《机器学习》课程课件(八、集成学习)

    温州大学<机器学习>课程,主讲:黄海广 下载地址: https://github.com/fengdu78/WZU-machine-learning-course 包含pdf课件.代码等. ...

  9. 吴恩达机器学习新课程又来了!旁听免费,小白友好

    Alex 发自 凹非寺 量子位 | 公众号 QbitAI 吴恩达的经典机器学习课程又双叒开新课了! 今天,吴老师发推分享了这则好消息. 该课程由deeplearning.ai和斯坦福大学提供,目前已上 ...

最新文章

  1. matlab图像定位分割,車牌定位matlab程序:通過hsv彩色分割方式定位車牌
  2. [答网友问]让GridLength支持动画
  3. 载体构建实例解析——构建 SETD3-pEGFP-N1(Snapgene 设计引物)
  4. 【畅捷通T+Cloud】12.3版本上线啦!
  5. linux编译 __stdcall,Linux中是否有STDCALL?
  6. linux自动升级关闭,Ubuntu关闭内核自动更新
  7. ios版塔防类游戏源码
  8. 李娟计算机学院,李娟(青岛农业大学教授)_百度百科
  9. html在线考试系统论文,在线考试系统
  10. [艾兰岛]菜鸟用编辑器做传送门——kura酱长期更新
  11. 计算机二级考风考纪主题班会,2021年我国计算机二级考试基础概述.doc
  12. opencv图像识别训练实例
  13. 字符集、字符编码、国际化、本地化简要总结(UNICODE/UTF/ASCII/GB2312/GBK/GB18030)
  14. ThinkPHP5.1批量删除
  15. 量化交易中的资金管理模型分享
  16. nodejs简单学习
  17. 《中国城市统计年鉴》面板数据整理(2000-2022年)
  18. UCINET入门案例
  19. go语言html模板,go html模板的使用
  20. 经验分享(二)如何使用Origin画出SCAPS-1D的仿真结果图

热门文章

  1. C++---关于静态库,动态库,中间文件的概念
  2. 最新shsh备份详细教程(现在只能备份最新的固件)
  3. 数据结构之二叉树遍历经典题目
  4. 青少年心理问题的这几点症状,父母需要重视
  5. 新浪短信WEBSERVICE--本文转载
  6. The App Launch Cycle
  7. STM32配置W5500
  8. 如何让游戏手柄joystick的按键映射键盘keyboard按键,方便不支持手柄的pygame游戏可以通过简单设置后用手柄进行操控
  9. 业主应该重视装修中的“道”而不是“术”!极家精工装修好不好!
  10. attr_accessor :motherland 相当于attr_reader:motherland; attr_writer :motherland