mllib建立推荐模型

数据准备

  • 数据包含在ml-100k的文件夹中,文件夹中比较重要的几个文件是u.user(用户属性数据集)、u.item(电影元数据)和u.data(用户对电影的评分数据)
  • (1)u.user数据的每列分别代表用户ID、年龄、性别、职业、邮政编码,其分隔符为“|”;
  • (2)u.item数据包含的列为电影ID、电影名、上映日期及其它一些属性信息,分隔符也为“|”;
  • (3)u.data数据包含用户ID、电影ID、评分(1-5分)和时间戳,分隔符为制表符(\t)
  • 其他数据的说明可以从README获取
rawData = sc.textFile('hdfs://master:9000/ml-100k/u.data')
rawData.take(5)
[u'196\t242\t3\t881250949',u'186\t302\t3\t891717742',u'22\t377\t1\t878887116',u'244\t51\t2\t880606923',u'166\t346\t1\t886397596']
rawRatings = rawData.map(lambda line: line.split('\t'))
rawRatings.take(5)
[[u'196', u'242', u'3', u'881250949'],[u'186', u'302', u'3', u'891717742'],[u'22', u'377', u'1', u'878887116'],[u'244', u'51', u'2', u'880606923'],[u'166', u'346', u'1', u'886397596']]

在mllib的recommendation模块中,提供了一个类Rating,用于将数据转化为用于ALS算法的指定结构,转化的过程如下:

from pyspark.mllib.recommendation import Rating
ratings = rawRatings.map(lambda line: Rating(int(line[0]),int(line[1]),float(line[2])))
ratings.take(5)
[Rating(user=196, product=242, rating=3.0),Rating(user=186, product=302, rating=3.0),Rating(user=22, product=377, rating=1.0),Rating(user=244, product=51, rating=2.0),Rating(user=166, product=346, rating=1.0)]

转化后的RDD是由Rating对象构成的,从结果中可以看出,Rating对象包含了三个值,user、product和rating,即用户、产品和打分。原始评分数据中第四列为时间戳,在本例中用不到,因此被抛弃,Rating类只能接受三个参数。该对象的使用也很简单,要提取指定的值,只需要使用原点“.”加属性名即可,也可以使用索引,将要引用的成员的索引号以方括号跟在对象后面即可

r_1 = ratings.first()
print r_1.user, r_1.product, r_1.rating
print r_1[0], r_1[1], r_1[2]
196 242 3.0
196 242 3.0

建模

from pyspark.mllib.recommendation import ALS
cf_model = ALS.train(ratings, 50, 10, 0.01, nonnegative=False, seed=12345)
cf_model
<pyspark.mllib.recommendation.MatrixFactorizationModel at 0x7fd1a8023390>

训练后的模型为一个MatrixFactorizationModel对象,该对象提供的方法可用于提取因子矩阵和进行预测,比如:

cf_model.userFeatures().first()
(2,array('d', [0.8257154226303101, -0.08174031972885132, -0.4485216736793518, 0.2816902697086334, 0.281324565410614, 0.2871280014514923, -0.28037557005882263, -0.5780994892120361, 0.04380865767598152, -0.03685721382498741, 0.33663856983184814, 0.8575121164321899, -0.26763004064559937, -0.22665703296661377, -0.030370648950338364, -0.4087982177734375, 0.28470417857170105, 0.17012149095535278, -0.46445152163505554, -0.39363399147987366, 0.4133472442626953, 0.0196047555655241, -0.6278623342514038, 0.8203023672103882, 0.36110371351242065, -0.3623308539390564, 0.07974052429199219, 0.3489876985549927, 0.009540693834424019, -0.1018930971622467, -0.3096586763858795, -0.08348742127418518, 0.546208918094635, 0.14119906723499298, -0.11057484149932861, 0.003356723114848137, -0.42252105474472046, 0.5306751728057861, 0.18785302340984344, 0.30044302344322205, -0.017208704724907875, 0.4387732148170471, -0.06367648392915726, 0.1654045730829239, 0.28026890754699707, -0.18949449062347412, -0.17139069736003876, -0.24911031126976013, 0.05620288848876953, -0.48843708634376526]))
cf_model.productFeatures().take(1)
[(2,array('d', [0.7962743639945984, 0.17431776225566864, -0.1990462988615036, -1.1859782934188843, -0.3959435522556305, -0.5246215462684631, 0.5594768524169922, -1.0115996599197388, 0.16964347660541534, 0.5268467664718628, -0.25127914547920227, -0.6580895185470581, 0.5533314943313599, 0.2781536877155304, -0.8546806573867798, 0.003281824290752411, -0.1445930451154709, -0.4302116930484772, -0.9390072226524353, -0.012757998891174793, -0.2912135422229767, -0.1968940943479538, -1.0604552030563354, 0.8730130195617676, 0.20659275352954865, -1.2206825017929077, -1.2894175052642822, 0.3126126825809479, 0.3025003671646118, 0.3809513449668884, -0.6017365455627441, 0.46676522493362427, -0.17819972336292267, 0.03601028397679329, 0.5260732769966125, -0.3788948357105255, -1.3027641773223877, -0.08637615293264389, -0.22254005074501038, 1.1796964406967163, 0.7695205807685852, 0.420034259557724, -0.31719499826431274, -0.43826058506965637, 0.7229039669036865, 0.1820073425769806, 0.3955117464065552, 0.4843427240848541, 0.5735178589820862, -1.1487818956375122]))]

预测

cf_model.predict(123, 456)
0.5189201089615622

一般来说我们不会关心某个具体用户对某部具体电影的评分,而是希望能对预测的用户评分进行排序,此时需要使用predictAll方法,该方法接受“用户-产品”类型的RDD,返回所有预测评分,比如我们可以先生成一个RDD,该RDD包含用户123所有评分过的电影:

r_123 = ratings.filter(lambda r: r.user == 123)
user_product_pairs = r_123.map(lambda r: (r.user, r.product))
user_product_pairs.take(5)
[(123, 427), (123, 531), (123, 135), (123, 192), (123, 13)]

预测123所有评分

est_123 = cf_model.predictAll(user_product_pairs)
est_123.take(5)
[Rating(user=123, product=14, rating=4.781807508293159),Rating(user=123, product=192, rating=4.8440853953154654),Rating(user=123, product=64, rating=3.189367663814919),Rating(user=123, product=432, rating=4.826974367452291),Rating(user=123, product=480, rating=3.4997134540180883)]

因为所有预测的评分都是用户123已经确实评分过的,我们可以将两个结果放在一起,方便比较:

left = r_123.map(lambda r: (r.product, r.rating))
right = est_123.map(lambda r: (r.product, r.rating))
left.join(right).take(5)
[(704, (3.0, 3.12740470516471)),(64, (3.0, 3.189367663814919)),(132, (3.0, 3.255692030148917)),(192, (5.0, 4.8440853953154654)),(288, (3.0, 2.6833371635987526))]

recommendProducts和recommendUsers推荐产品和用户

topK_users = cf_model.recommendUsers(456, 5)
topK_users
[Rating(user=534, product=456, rating=4.915272594291057),Rating(user=620, product=456, rating=4.554502950952971),Rating(user=462, product=456, rating=4.548246109196231),Rating(user=283, product=456, rating=4.514564326007004),Rating(user=56, product=456, rating=4.2624704017243324)]
topK_movies = cf_model.recommendProducts(123, 5)
topK_movies
[Rating(user=123, product=287, rating=6.238110881748787),Rating(user=123, product=269, rating=6.168910818255823),Rating(user=123, product=515, rating=5.954825359796975),Rating(user=123, product=416, rating=5.921099652606939),Rating(user=123, product=503, rating=5.670748725720875)]

连接电影名称

movies = sc.textFile('hdfs://master:9000/ml-100k/u.item')
titles = movies.map(lambda line: line.split('|'))\.map(lambda x: (int(x[0]), x[1])).collectAsMap()
titles
{1: u'Toy Story (1995)',2: u'GoldenEye (1995)',3: u'Four Rooms (1995)',4: u'Get Shorty (1995)',5: u'Copycat (1995)',6: u'Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)',7: u'Twelve Monkeys (1995)',8: u'Babe (1995)',9: u'Dead Man Walking (1995)',10: u'Richard III (1995)',11: u'Seven (Se7en) (1995)',12: u'Usual Suspects, The (1995)',13: u'Mighty Aphrodite (1995)',14: u'Postino, Il (1994)',15: u"Mr. Holland's Opus (1995)",16: u'French Twist (Gazon maudit) (1995)',17: u'From Dusk Till Dawn (1996)',18: u'White Balloon, The (1995)',19: u"Antonia's Line (1995)",20: u'Angels and Insects (1995)',21: u'Muppet Treasure Island (1996)',22: u'Braveheart (1995)',23: u'Taxi Driver (1976)',24: u'Rumble in the Bronx (1995)',25: u'Birdcage, The (1996)',26: u'Brothers McMullen, The (1995)',27: u'Bad Boys (1995)',28: u'Apollo 13 (1995)',29: u'Batman Forever (1995)',30: u'Belle de jour (1967)',31: u'Crimson Tide (1995)',32: u'Crumb (1994)',33: u'Desperado (1995)',34: u'Doom Generation, The (1995)',35: u'Free Willy 2: The Adventure Home (1995)',36: u'Mad Love (1995)',37: u'Nadja (1994)',38: u'Net, The (1995)',39: u'Strange Days (1995)',40: u'To Wong Foo, Thanks for Everything! Julie Newmar (1995)',41: u'Billy Madison (1995)',42: u'Clerks (1994)',43: u'Disclosure (1994)',44: u'Dolores Claiborne (1994)',45: u'Eat Drink Man Woman (1994)',46: u'Exotica (1994)',47: u'Ed Wood (1994)',48: u'Hoop Dreams (1994)',49: u'I.Q. (1994)',50: u'Star Wars (1977)',51: u'Legends of the Fall (1994)',52: u'Madness of King George, The (1994)',53: u'Natural Born Killers (1994)',54: u'Outbreak (1995)',55: u'Professional, The (1994)',56: u'Pulp Fiction (1994)',57: u'Priest (1994)',58: u'Quiz Show (1994)',59: u'Three Colors: Red (1994)',60: u'Three Colors: Blue (1993)',61: u'Three Colors: White (1994)',62: u'Stargate (1994)',63: u'Santa Clause, The (1994)',64: u'Shawshank Redemption, The (1994)',65: u"What's Eating Gilbert Grape (1993)",66: u'While You Were Sleeping (1995)',67: u'Ace Ventura: Pet Detective (1994)',68: u'Crow, The (1994)',69: u'Forrest Gump (1994)',70: u'Four Weddings and a Funeral (1994)',71: u'Lion King, The (1994)',72: u'Mask, The (1994)',73: u'Maverick (1994)',74: u'Faster Pussycat! Kill! Kill! (1965)',75: u'Brother Minister: The Assassination of Malcolm X (1994)',76: u"Carlito's Way (1993)",77: u'Firm, The (1993)',78: u'Free Willy (1993)',79: u'Fugitive, The (1993)',80: u'Hot Shots! Part Deux (1993)',81: u'Hudsucker Proxy, The (1994)',82: u'Jurassic Park (1993)',83: u'Much Ado About Nothing (1993)',84: u"Robert A. Heinlein's The Puppet Masters (1994)",85: u'Ref, The (1994)',86: u'Remains of the Day, The (1993)',87: u'Searching for Bobby Fischer (1993)',88: u'Sleepless in Seattle (1993)',89: u'Blade Runner (1982)',90: u'So I Married an Axe Murderer (1993)',91: u'Nightmare Before Christmas, The (1993)',92: u'True Romance (1993)',93: u'Welcome to the Dollhouse (1995)',94: u'Home Alone (1990)',95: u'Aladdin (1992)',96: u'Terminator 2: Judgment Day (1991)',97: u'Dances with Wolves (1990)',98: u'Silence of the Lambs, The (1991)',99: u'Snow White and the Seven Dwarfs (1937)',100: u'Fargo (1996)',101: u'Heavy Metal (1981)',102: u'Aristocats, The (1970)',103: u'All Dogs Go to Heaven 2 (1996)',104: u'Theodore Rex (1995)',105: u'Sgt. Bilko (1996)',106: u'Diabolique (1996)',107: u'Moll Flanders (1996)',108: u'Kids in the Hall: Brain Candy (1996)',109: u'Mystery Science Theater 3000: The Movie (1996)',110: u'Operation Dumbo Drop (1995)',111: u'Truth About Cats & Dogs, The (1996)',112: u'Flipper (1996)',113: u'Horseman on the Roof, The (Hussard sur le toit, Le) (1995)',114: u'Wallace & Gromit: The Best of Aardman Animation (1996)',115: u'Haunted World of Edward D. Wood Jr., The (1995)',116: u'Cold Comfort Farm (1995)',117: u'Rock, The (1996)',118: u'Twister (1996)',119: u'Maya Lin: A Strong Clear Vision (1994)',120: u'Striptease (1996)',121: u'Independence Day (ID4) (1996)',122: u'Cable Guy, The (1996)',123: u'Frighteners, The (1996)',124: u'Lone Star (1996)',125: u'Phenomenon (1996)',126: u'Spitfire Grill, The (1996)',127: u'Godfather, The (1972)',128: u'Supercop (1992)',129: u'Bound (1996)',130: u'Kansas City (1996)',131: u"Breakfast at Tiffany's (1961)",132: u'Wizard of Oz, The (1939)',133: u'Gone with the Wind (1939)',134: u'Citizen Kane (1941)',135: u'2001: A Space Odyssey (1968)',136: u'Mr. Smith Goes to Washington (1939)',137: u'Big Night (1996)',138: u'D3: The Mighty Ducks (1996)',139: u'Love Bug, The (1969)',140: u'Homeward Bound: The Incredible Journey (1993)',141: u'20,000 Leagues Under the Sea (1954)',142: u'Bedknobs and Broomsticks (1971)',143: u'Sound of Music, The (1965)',144: u'Die Hard (1988)',145: u'Lawnmower Man, The (1992)',146: u'Unhook the Stars (1996)',147: u'Long Kiss Goodnight, The (1996)',148: u'Ghost and the Darkness, The (1996)',149: u'Jude (1996)',150: u'Swingers (1996)',151: u'Willy Wonka and the Chocolate Factory (1971)',152: u'Sleeper (1973)',153: u'Fish Called Wanda, A (1988)',154: u"Monty Python's Life of Brian (1979)",155: u'Dirty Dancing (1987)',156: u'Reservoir Dogs (1992)',157: u'Platoon (1986)',158: u"Weekend at Bernie's (1989)",159: u'Basic Instinct (1992)',160: u'Glengarry Glen Ross (1992)',161: u'Top Gun (1986)',162: u'On Golden Pond (1981)',163: u'Return of the Pink Panther, The (1974)',164: u'Abyss, The (1989)',165: u'Jean de Florette (1986)',166: u'Manon of the Spring (Manon des sources) (1986)',167: u'Private Benjamin (1980)',168: u'Monty Python and the Holy Grail (1974)',169: u'Wrong Trousers, The (1993)',170: u'Cinema Paradiso (1988)',171: u'Delicatessen (1991)',172: u'Empire Strikes Back, The (1980)',173: u'Princess Bride, The (1987)',174: u'Raiders of the Lost Ark (1981)',175: u'Brazil (1985)',176: u'Aliens (1986)',177: u'Good, The Bad and The Ugly, The (1966)',178: u'12 Angry Men (1957)',179: u'Clockwork Orange, A (1971)',180: u'Apocalypse Now (1979)',181: u'Return of the Jedi (1983)',182: u'GoodFellas (1990)',183: u'Alien (1979)',184: u'Army of Darkness (1993)',185: u'Psycho (1960)',186: u'Blues Brothers, The (1980)',187: u'Godfather: Part II, The (1974)',188: u'Full Metal Jacket (1987)',189: u'Grand Day Out, A (1992)',190: u'Henry V (1989)',191: u'Amadeus (1984)',192: u'Raging Bull (1980)',193: u'Right Stuff, The (1983)',194: u'Sting, The (1973)',195: u'Terminator, The (1984)',196: u'Dead Poets Society (1989)',197: u'Graduate, The (1967)',198: u'Nikita (La Femme Nikita) (1990)',199: u'Bridge on the River Kwai, The (1957)',200: u'Shining, The (1980)',201: u'Evil Dead II (1987)',202: u'Groundhog Day (1993)',203: u'Unforgiven (1992)',204: u'Back to the Future (1985)',205: u'Patton (1970)',206: u'Akira (1988)',207: u'Cyrano de Bergerac (1990)',208: u'Young Frankenstein (1974)',209: u'This Is Spinal Tap (1984)',210: u'Indiana Jones and the Last Crusade (1989)',211: u'M*A*S*H (1970)',212: u'Unbearable Lightness of Being, The (1988)',213: u'Room with a View, A (1986)',214: u'Pink Floyd - The Wall (1982)',215: u'Field of Dreams (1989)',216: u'When Harry Met Sally... (1989)',217: u"Bram Stoker's Dracula (1992)",218: u'Cape Fear (1991)',219: u'Nightmare on Elm Street, A (1984)',220: u'Mirror Has Two Faces, The (1996)',221: u'Breaking the Waves (1996)',222: u'Star Trek: First Contact (1996)',223: u'Sling Blade (1996)',224: u'Ridicule (1996)',225: u'101 Dalmatians (1996)',226: u'Die Hard 2 (1990)',227: u'Star Trek VI: The Undiscovered Country (1991)',228: u'Star Trek: The Wrath of Khan (1982)',229: u'Star Trek III: The Search for Spock (1984)',230: u'Star Trek IV: The Voyage Home (1986)',231: u'Batman Returns (1992)',232: u'Young Guns (1988)',233: u'Under Siege (1992)',234: u'Jaws (1975)',235: u'Mars Attacks! (1996)',236: u'Citizen Ruth (1996)',237: u'Jerry Maguire (1996)',238: u'Raising Arizona (1987)',239: u'Sneakers (1992)',240: u'Beavis and Butt-head Do America (1996)',241: u'Last of the Mohicans, The (1992)',242: u'Kolya (1996)',243: u'Jungle2Jungle (1997)',244: u"Smilla's Sense of Snow (1997)",245: u"Devil's Own, The (1997)",246: u'Chasing Amy (1997)',247: u'Turbo: A Power Rangers Movie (1997)',248: u'Grosse Pointe Blank (1997)',249: u'Austin Powers: International Man of Mystery (1997)',250: u'Fifth Element, The (1997)',251: u'Shall We Dance? (1996)',252: u'Lost World: Jurassic Park, The (1997)',253: u'Pillow Book, The (1995)',254: u'Batman & Robin (1997)',255: u"My Best Friend's Wedding (1997)",256: u'When the Cats Away (Chacun cherche son chat) (1996)',257: u'Men in Black (1997)',258: u'Contact (1997)',259: u'George of the Jungle (1997)',260: u'Event Horizon (1997)',261: u'Air Bud (1997)',262: u'In the Company of Men (1997)',263: u'Steel (1997)',264: u'Mimic (1997)',265: u'Hunt for Red October, The (1990)',266: u'Kull the Conqueror (1997)',267: u'unknown',268: u'Chasing Amy (1997)',269: u'Full Monty, The (1997)',270: u'Gattaca (1997)',271: u'Starship Troopers (1997)',272: u'Good Will Hunting (1997)',273: u'Heat (1995)',274: u'Sabrina (1995)',275: u'Sense and Sensibility (1995)',276: u'Leaving Las Vegas (1995)',277: u'Restoration (1995)',278: u'Bed of Roses (1996)',279: u'Once Upon a Time... When We Were Colored (1995)',280: u'Up Close and Personal (1996)',281: u'River Wild, The (1994)',282: u'Time to Kill, A (1996)',283: u'Emma (1996)',284: u'Tin Cup (1996)',285: u'Secrets & Lies (1996)',286: u'English Patient, The (1996)',287: u"Marvin's Room (1996)",288: u'Scream (1996)',289: u'Evita (1996)',290: u'Fierce Creatures (1997)',291: u'Absolute Power (1997)',292: u'Rosewood (1997)',293: u'Donnie Brasco (1997)',294: u'Liar Liar (1997)',295: u'Breakdown (1997)',296: u'Promesse, La (1996)',297: u"Ulee's Gold (1997)",298: u'Face/Off (1997)',299: u'Hoodlum (1997)',300: u'Air Force One (1997)',301: u'In & Out (1997)',302: u'L.A. Confidential (1997)',303: u"Ulee's Gold (1997)",304: u'Fly Away Home (1996)',305: u'Ice Storm, The (1997)',306: u'Mrs. Brown (Her Majesty, Mrs. Brown) (1997)',307: u"Devil's Advocate, The (1997)",308: u'FairyTale: A True Story (1997)',309: u'Deceiver (1997)',310: u'Rainmaker, The (1997)',311: u'Wings of the Dove, The (1997)',312: u'Midnight in the Garden of Good and Evil (1997)',313: u'Titanic (1997)',314: u'3 Ninjas: High Noon At Mega Mountain (1998)',315: u'Apt Pupil (1998)',316: u'As Good As It Gets (1997)',317: u'In the Name of the Father (1993)',318: u"Schindler's List (1993)",319: u'Everyone Says I Love You (1996)',320: u'Paradise Lost: The Child Murders at Robin Hood Hills (1996)',321: u'Mother (1996)',322: u'Murder at 1600 (1997)',323: u"Dante's Peak (1997)",324: u'Lost Highway (1997)',325: u'Crash (1996)',326: u'G.I. Jane (1997)',327: u'Cop Land (1997)',328: u'Conspiracy Theory (1997)',329: u'Desperate Measures (1998)',330: u'187 (1997)',331: u'Edge, The (1997)',332: u'Kiss the Girls (1997)',333: u'Game, The (1997)',334: u'U Turn (1997)',335: u'How to Be a Player (1997)',336: u'Playing God (1997)',337: u'House of Yes, The (1997)',338: u'Bean (1997)',339: u'Mad City (1997)',340: u'Boogie Nights (1997)',341: u'Critical Care (1997)',342: u'Man Who Knew Too Little, The (1997)',343: u'Alien: Resurrection (1997)',344: u'Apostle, The (1997)',345: u'Deconstructing Harry (1997)',346: u'Jackie Brown (1997)',347: u'Wag the Dog (1997)',348: u'Desperate Measures (1998)',349: u'Hard Rain (1998)',350: u'Fallen (1998)',351: u'Prophecy II, The (1998)',352: u'Spice World (1997)',353: u'Deep Rising (1998)',354: u'Wedding Singer, The (1998)',355: u'Sphere (1998)',356: u'Client, The (1994)',357: u"One Flew Over the Cuckoo's Nest (1975)",358: u'Spawn (1997)',359: u'Assignment, The (1997)',360: u'Wonderland (1997)',361: u'Incognito (1997)',362: u'Blues Brothers 2000 (1998)',363: u'Sudden Death (1995)',364: u'Ace Ventura: When Nature Calls (1995)',365: u'Powder (1995)',366: u'Dangerous Minds (1995)',367: u'Clueless (1995)',368: u'Bio-Dome (1996)',369: u'Black Sheep (1996)',370: u'Mary Reilly (1996)',371: u'Bridges of Madison County, The (1995)',372: u'Jeffrey (1995)',373: u'Judge Dredd (1995)',374: u'Mighty Morphin Power Rangers: The Movie (1995)',375: u'Showgirls (1995)',376: u'Houseguest (1994)',377: u'Heavyweights (1994)',378: u'Miracle on 34th Street (1994)',379: u'Tales From the Crypt Presents: Demon Knight (1995)',380: u'Star Trek: Generations (1994)',381: u"Muriel's Wedding (1994)",382: u'Adventures of Priscilla, Queen of the Desert, The (1994)',383: u'Flintstones, The (1994)',384: u'Naked Gun 33 1/3: The Final Insult (1994)',385: u'True Lies (1994)',386: u'Addams Family Values (1993)',387: u'Age of Innocence, The (1993)',388: u'Beverly Hills Cop III (1994)',389: u'Black Beauty (1994)',390: u'Fear of a Black Hat (1993)',391: u'Last Action Hero (1993)',392: u'Man Without a Face, The (1993)',393: u'Mrs. Doubtfire (1993)',394: u'Radioland Murders (1994)',395: u'Robin Hood: Men in Tights (1993)',396: u'Serial Mom (1994)',397: u'Striking Distance (1993)',398: u'Super Mario Bros. (1993)',399: u'Three Musketeers, The (1993)',400: u'Little Rascals, The (1994)',401: u'Brady Bunch Movie, The (1995)',402: u'Ghost (1990)',403: u'Batman (1989)',404: u'Pinocchio (1940)',405: u'Mission: Impossible (1996)',406: u'Thinner (1996)',407: u'Spy Hard (1996)',408: u'Close Shave, A (1995)',409: u'Jack (1996)',410: u'Kingpin (1996)',411: u'Nutty Professor, The (1996)',412: u'Very Brady Sequel, A (1996)',413: u'Tales from the Crypt Presents: Bordello of Blood (1996)',414: u'My Favorite Year (1982)',415: u'Apple Dumpling Gang, The (1975)',416: u'Old Yeller (1957)',417: u'Parent Trap, The (1961)',418: u'Cinderella (1950)',419: u'Mary Poppins (1964)',420: u'Alice in Wonderland (1951)',421: u"William Shakespeare's Romeo and Juliet (1996)",422: u'Aladdin and the King of Thieves (1996)',423: u'E.T. the Extra-Terrestrial (1982)',424: u'Children of the Corn: The Gathering (1996)',425: u'Bob Roberts (1992)',426: u'Transformers: The Movie, The (1986)',427: u'To Kill a Mockingbird (1962)',428: u'Harold and Maude (1971)',429: u'Day the Earth Stood Still, The (1951)',430: u'Duck Soup (1933)',431: u'Highlander (1986)',432: u'Fantasia (1940)',433: u'Heathers (1989)',434: u'Forbidden Planet (1956)',435: u'Butch Cassidy and the Sundance Kid (1969)',436: u'American Werewolf in London, An (1981)',437: u"Amityville 1992: It's About Time (1992)",438: u'Amityville 3-D (1983)',439: u'Amityville: A New Generation (1993)',440: u'Amityville II: The Possession (1982)',441: u'Amityville Horror, The (1979)',442: u'Amityville Curse, The (1990)',443: u'Birds, The (1963)',444: u'Blob, The (1958)',445: u'Body Snatcher, The (1945)',446: u'Burnt Offerings (1976)',447: u'Carrie (1976)',448: u'Omen, The (1976)',449: u'Star Trek: The Motion Picture (1979)',450: u'Star Trek V: The Final Frontier (1989)',451: u'Grease (1978)',452: u'Jaws 2 (1978)',453: u'Jaws 3-D (1983)',454: u'Bastard Out of Carolina (1996)',455: u"Jackie Chan's First Strike (1996)",456: u'Beverly Hills Ninja (1997)',457: u'Free Willy 3: The Rescue (1997)',458: u'Nixon (1995)',459: u'Cry, the Beloved Country (1995)',460: u'Crossing Guard, The (1995)',461: u'Smoke (1995)',462: u'Like Water For Chocolate (Como agua para chocolate) (1992)',463: u'Secret of Roan Inish, The (1994)',464: u'Vanya on 42nd Street (1994)',465: u'Jungle Book, The (1994)',466: u'Red Rock West (1992)',467: u'Bronx Tale, A (1993)',468: u'Rudy (1993)',469: u'Short Cuts (1993)',470: u'Tombstone (1993)',471: u'Courage Under Fire (1996)',472: u'Dragonheart (1996)',473: u'James and the Giant Peach (1996)',474: u'Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963)',475: u'Trainspotting (1996)',476: u'First Wives Club, The (1996)',477: u'Matilda (1996)',478: u'Philadelphia Story, The (1940)',479: u'Vertigo (1958)',480: u'North by Northwest (1959)',481: u'Apartment, The (1960)',482: u'Some Like It Hot (1959)',483: u'Casablanca (1942)',484: u'Maltese Falcon, The (1941)',485: u'My Fair Lady (1964)',486: u'Sabrina (1954)',487: u'Roman Holiday (1953)',488: u'Sunset Blvd. (1950)',489: u'Notorious (1946)',490: u'To Catch a Thief (1955)',491: u'Adventures of Robin Hood, The (1938)',492: u'East of Eden (1955)',493: u'Thin Man, The (1934)',494: u'His Girl Friday (1940)',495: u'Around the World in 80 Days (1956)',496: u"It's a Wonderful Life (1946)",497: u'Bringing Up Baby (1938)',498: u'African Queen, The (1951)',499: u'Cat on a Hot Tin Roof (1958)',500: u'Fly Away Home (1996)',501: u'Dumbo (1941)',502: u'Bananas (1971)',503: u'Candidate, The (1972)',504: u'Bonnie and Clyde (1967)',505: u'Dial M for Murder (1954)',506: u'Rebel Without a Cause (1955)',507: u'Streetcar Named Desire, A (1951)',508: u'People vs. Larry Flynt, The (1996)',509: u'My Left Foot (1989)',510: u'Magnificent Seven, The (1954)',511: u'Lawrence of Arabia (1962)',512: u'Wings of Desire (1987)',513: u'Third Man, The (1949)',514: u'Annie Hall (1977)',515: u'Boot, Das (1981)',516: u'Local Hero (1983)',517: u'Manhattan (1979)',518: u"Miller's Crossing (1990)",519: u'Treasure of the Sierra Madre, The (1948)',520: u'Great Escape, The (1963)',521: u'Deer Hunter, The (1978)',522: u'Down by Law (1986)',523: u'Cool Hand Luke (1967)',524: u'Great Dictator, The (1940)',525: u'Big Sleep, The (1946)',526: u'Ben-Hur (1959)',527: u'Gandhi (1982)',528: u'Killing Fields, The (1984)',529: u'My Life as a Dog (Mitt liv som hund) (1985)',530: u'Man Who Would Be King, The (1975)',531: u'Shine (1996)',532: u'Kama Sutra: A Tale of Love (1996)',533: u'Daytrippers, The (1996)',534: u'Traveller (1997)',535: u'Addicted to Love (1997)',536: u'Ponette (1996)',537: u'My Own Private Idaho (1991)',538: u'Anastasia (1997)',539: u'Mouse Hunt (1997)',540: u'Money Train (1995)',541: u'Mortal Kombat (1995)',542: u'Pocahontas (1995)',543: u'Mis\ufffdrables, Les (1995)',544: u"Things to Do in Denver when You're Dead (1995)",545: u'Vampire in Brooklyn (1995)',546: u'Broken Arrow (1996)',547: u"Young Poisoner's Handbook, The (1995)",548: u'NeverEnding Story III, The (1994)',549: u'Rob Roy (1995)',550: u'Die Hard: With a Vengeance (1995)',551: u'Lord of Illusions (1995)',552: u'Species (1995)',553: u'Walk in the Clouds, A (1995)',554: u'Waterworld (1995)',555: u"White Man's Burden (1995)",556: u'Wild Bill (1995)',557: u'Farinelli: il castrato (1994)',558: u'Heavenly Creatures (1994)',559: u'Interview with the Vampire (1994)',560: u"Kid in King Arthur's Court, A (1995)",561: u"Mary Shelley's Frankenstein (1994)",562: u'Quick and the Dead, The (1995)',563: u"Stephen King's The Langoliers (1995)",564: u'Tales from the Hood (1995)',565: u'Village of the Damned (1995)',566: u'Clear and Present Danger (1994)',567: u"Wes Craven's New Nightmare (1994)",568: u'Speed (1994)',569: u'Wolf (1994)',570: u'Wyatt Earp (1994)',571: u'Another Stakeout (1993)',572: u'Blown Away (1994)',573: u'Body Snatchers (1993)',574: u'Boxing Helena (1993)',575: u"City Slickers II: The Legend of Curly's Gold (1994)",576: u'Cliffhanger (1993)',577: u'Coneheads (1993)',578: u'Demolition Man (1993)',579: u'Fatal Instinct (1993)',580: u'Englishman Who Went Up a Hill, But Came Down a Mountain, The (1995)',581: u'Kalifornia (1993)',582: u'Piano, The (1993)',583: u'Romeo Is Bleeding (1993)',584: u'Secret Garden, The (1993)',585: u'Son in Law (1993)',586: u'Terminal Velocity (1994)',587: u'Hour of the Pig, The (1993)',588: u'Beauty and the Beast (1991)',589: u'Wild Bunch, The (1969)',590: u'Hellraiser: Bloodline (1996)',591: u'Primal Fear (1996)',592: u'True Crime (1995)',593: u'Stalingrad (1993)',594: u'Heavy (1995)',595: u'Fan, The (1996)',596: u'Hunchback of Notre Dame, The (1996)',597: u'Eraser (1996)',598: u'Big Squeeze, The (1996)',599: u'Police Story 4: Project S (Chao ji ji hua) (1993)',600: u"Daniel Defoe's Robinson Crusoe (1996)",601: u'For Whom the Bell Tolls (1943)',602: u'American in Paris, An (1951)',603: u'Rear Window (1954)',604: u'It Happened One Night (1934)',605: u'Meet Me in St. Louis (1944)',606: u'All About Eve (1950)',607: u'Rebecca (1940)',608: u'Spellbound (1945)',609: u'Father of the Bride (1950)',610: u'Gigi (1958)',611: u'Laura (1944)',612: u'Lost Horizon (1937)',613: u'My Man Godfrey (1936)',614: u'Giant (1956)',615: u'39 Steps, The (1935)',616: u'Night of the Living Dead (1968)',617: u'Blue Angel, The (Blaue Engel, Der) (1930)',618: u'Picnic (1955)',619: u'Extreme Measures (1996)',620: u'Chamber, The (1996)',621: u'Davy Crockett, King of the Wild Frontier (1955)',622: u'Swiss Family Robinson (1960)',623: u'Angels in the Outfield (1994)',624: u'Three Caballeros, The (1945)',625: u'Sword in the Stone, The (1963)',626: u'So Dear to My Heart (1949)',627: u'Robin Hood: Prince of Thieves (1991)',628: u'Sleepers (1996)',629: u'Victor/Victoria (1982)',630: u'Great Race, The (1965)',631: u'Crying Game, The (1992)',632: u"Sophie's Choice (1982)",633: u'Christmas Carol, A (1938)',634: u"Microcosmos: Le peuple de l'herbe (1996)",635: u'Fog, The (1980)',636: u'Escape from New York (1981)',637: u'Howling, The (1981)',638: u'Return of Martin Guerre, The (Retour de Martin Guerre, Le) (1982)',639: u'Tin Drum, The (Blechtrommel, Die) (1979)',640: u'Cook the Thief His Wife & Her Lover, The (1989)',641: u'Paths of Glory (1957)',642: u'Grifters, The (1990)',643: u'The Innocent (1994)',644: u'Thin Blue Line, The (1988)',645: u'Paris Is Burning (1990)',646: u'Once Upon a Time in the West (1969)',647: u'Ran (1985)',648: u'Quiet Man, The (1952)',649: u'Once Upon a Time in America (1984)',650: u'Seventh Seal, The (Sjunde inseglet, Det) (1957)',651: u'Glory (1989)',652: u'Rosencrantz and Guildenstern Are Dead (1990)',653: u'Touch of Evil (1958)',654: u'Chinatown (1974)',655: u'Stand by Me (1986)',656: u'M (1931)',657: u'Manchurian Candidate, The (1962)',658: u'Pump Up the Volume (1990)',659: u'Arsenic and Old Lace (1944)',660: u'Fried Green Tomatoes (1991)',661: u'High Noon (1952)',662: u'Somewhere in Time (1980)',663: u'Being There (1979)',664: u'Paris, Texas (1984)',665: u'Alien 3 (1992)',666: u"Blood For Dracula (Andy Warhol's Dracula) (1974)",667: u'Audrey Rose (1977)',668: u'Blood Beach (1981)',669: u'Body Parts (1991)',670: u'Body Snatchers (1993)',671: u'Bride of Frankenstein (1935)',672: u'Candyman (1992)',673: u'Cape Fear (1962)',674: u'Cat People (1982)',675: u'Nosferatu (Nosferatu, eine Symphonie des Grauens) (1922)',676: u'Crucible, The (1996)',677: u'Fire on the Mountain (1996)',678: u'Volcano (1997)',679: u'Conan the Barbarian (1981)',680: u'Kull the Conqueror (1997)',681: u'Wishmaster (1997)',682: u'I Know What You Did Last Summer (1997)',683: u'Rocket Man (1997)',684: u'In the Line of Fire (1993)',685: u'Executive Decision (1996)',686: u'Perfect World, A (1993)',687: u"McHale's Navy (1997)",688: u'Leave It to Beaver (1997)',689: u'Jackal, The (1997)',690: u'Seven Years in Tibet (1997)',691: u'Dark City (1998)',692: u'American President, The (1995)',693: u'Casino (1995)',694: u'Persuasion (1995)',695: u'Kicking and Screaming (1995)',696: u'City Hall (1996)',697: u'Basketball Diaries, The (1995)',698: u'Browning Version, The (1994)',699: u'Little Women (1994)',700: u'Miami Rhapsody (1995)',701: u'Wonderful, Horrible Life of Leni Riefenstahl, The (1993)',702: u'Barcelona (1994)',703: u"Widows' Peak (1994)",704: u'House of the Spirits, The (1993)',705: u"Singin' in the Rain (1952)",706: u'Bad Moon (1996)',707: u'Enchanted April (1991)',708: u'Sex, Lies, and Videotape (1989)',709: u'Strictly Ballroom (1992)',710: u'Better Off Dead... (1985)',711: u'Substance of Fire, The (1996)',712: u'Tin Men (1987)',713: u'Othello (1995)',714: u'Carrington (1995)',715: u'To Die For (1995)',716: u'Home for the Holidays (1995)',717: u'Juror, The (1996)',718: u'In the Bleak Midwinter (1995)',719: u'Canadian Bacon (1994)',720: u'First Knight (1995)',721: u'Mallrats (1995)',722: u'Nine Months (1995)',723: u'Boys on the Side (1995)',724: u'Circle of Friends (1995)',725: u'Exit to Eden (1994)',726: u'Fluke (1995)',727: u'Immortal Beloved (1994)',728: u'Junior (1994)',729: u'Nell (1994)',730: u'Queen Margot (Reine Margot, La) (1994)',731: u'Corrina, Corrina (1994)',732: u'Dave (1993)',733: u'Go Fish (1994)',734: u'Made in America (1993)',735: u'Philadelphia (1993)',736: u'Shadowlands (1993)',737: u'Sirens (1994)',738: u'Threesome (1994)',739: u'Pretty Woman (1990)',740: u'Jane Eyre (1996)',741: u'Last Supper, The (1995)',742: u'Ransom (1996)',743: u'Crow: City of Angels, The (1996)',744: u'Michael Collins (1996)',745: u'Ruling Class, The (1972)',746: u'Real Genius (1985)',747: u'Benny & Joon (1993)',748: u'Saint, The (1997)',749: u'MatchMaker, The (1997)',750: u'Amistad (1997)',751: u'Tomorrow Never Dies (1997)',752: u'Replacement Killers, The (1998)',753: u'Burnt By the Sun (1994)',754: u'Red Corner (1997)',755: u'Jumanji (1995)',756: u'Father of the Bride Part II (1995)',757: u'Across the Sea of Time (1995)',758: u'Lawnmower Man 2: Beyond Cyberspace (1996)',759: u'Fair Game (1995)',760: u'Screamers (1995)',761: u'Nick of Time (1995)',762: u'Beautiful Girls (1996)',763: u'Happy Gilmore (1996)',764: u'If Lucy Fell (1996)',765: u'Boomerang (1992)',766: u'Man of the Year (1995)',767: u'Addiction, The (1995)',768: u'Casper (1995)',769: u'Congo (1995)',770: u'Devil in a Blue Dress (1995)',771: u'Johnny Mnemonic (1995)',772: u'Kids (1995)',773: u'Mute Witness (1994)',774: u'Prophecy, The (1995)',775: u'Something to Talk About (1995)',776: u'Three Wishes (1995)',777: u'Castle Freak (1995)',778: u'Don Juan DeMarco (1995)',779: u'Drop Zone (1994)',780: u'Dumb & Dumber (1994)',781: u'French Kiss (1995)',782: u'Little Odessa (1994)',783: u'Milk Money (1994)',784: u'Beyond Bedlam (1993)',785: u'Only You (1994)',786: u'Perez Family, The (1995)',787: u'Roommates (1995)',788: u'Relative Fear (1994)',789: u'Swimming with Sharks (1995)',790: u'Tommy Boy (1995)',791: u'Baby-Sitters Club, The (1995)',792: u'Bullets Over Broadway (1994)',793: u'Crooklyn (1994)',794: u'It Could Happen to You (1994)',795: u'Richie Rich (1994)',796: u'Speechless (1994)',797: u'Timecop (1994)',798: u'Bad Company (1995)',799: u'Boys Life (1995)',800: u'In the Mouth of Madness (1995)',801: u'Air Up There, The (1994)',802: u'Hard Target (1993)',803: u'Heaven & Earth (1993)',804: u'Jimmy Hollywood (1994)',805: u'Manhattan Murder Mystery (1993)',806: u'Menace II Society (1993)',807: u'Poetic Justice (1993)',808: u'Program, The (1993)',809: u'Rising Sun (1993)',810: u'Shadow, The (1994)',811: u'Thirty-Two Short Films About Glenn Gould (1993)',812: u'Andre (1994)',813: u'Celluloid Closet, The (1995)',814: u'Great Day in Harlem, A (1994)',815: u'One Fine Day (1996)',816: u'Candyman: Farewell to the Flesh (1995)',817: u'Frisk (1995)',818: u'Girl 6 (1996)',819: u'Eddie (1996)',820: u'Space Jam (1996)',821: u'Mrs. Winterbourne (1996)',822: u'Faces (1968)',823: u'Mulholland Falls (1996)',824: u'Great White Hype, The (1996)',825: u'Arrival, The (1996)',826: u'Phantom, The (1996)',827: u'Daylight (1996)',828: u'Alaska (1996)',829: u'Fled (1996)',830: u'Power 98 (1995)',831: u'Escape from L.A. (1996)',832: u'Bogus (1996)',833: u'Bulletproof (1996)',834: u'Halloween: The Curse of Michael Myers (1995)',835: u'Gay Divorcee, The (1934)',836: u'Ninotchka (1939)',837: u'Meet John Doe (1941)',838: u'In the Line of Duty 2 (1987)',839: u'Loch Ness (1995)',840: u'Last Man Standing (1996)',841: u'Glimmer Man, The (1996)',842: u'Pollyanna (1960)',843: u'Shaggy Dog, The (1959)',844: u'Freeway (1996)',845: u'That Thing You Do! (1996)',846: u'To Gillian on Her 37th Birthday (1996)',847: u'Looking for Richard (1996)',848: u'Murder, My Sweet (1944)',849: u'Days of Thunder (1990)',850: u'Perfect Candidate, A (1996)',851: u'Two or Three Things I Know About Her (1966)',852: u'Bloody Child, The (1996)',853: u'Braindead (1992)',854: u'Bad Taste (1987)',855: u'Diva (1981)',856: u'Night on Earth (1991)',857: u'Paris Was a Woman (1995)',858: u'Amityville: Dollhouse (1996)',859: u"April Fool's Day (1986)",860: u'Believers, The (1987)',861: u'Nosferatu a Venezia (1986)',862: u'Jingle All the Way (1996)',863: u'Garden of Finzi-Contini, The (Giardino dei Finzi-Contini, Il) (1970)',864: u'My Fellow Americans (1996)',865: u'Ice Storm, The (1997)',866: u'Michael (1996)',867: u'Whole Wide World, The (1996)',868: u'Hearts and Minds (1996)',869: u'Fools Rush In (1997)',870: u'Touch (1997)',871: u'Vegas Vacation (1997)',872: u'Love Jones (1997)',873: u'Picture Perfect (1997)',874: u'Career Girls (1997)',875: u"She's So Lovely (1997)",876: u'Money Talks (1997)',877: u'Excess Baggage (1997)',878: u'That Darn Cat! (1997)',879: u'Peacemaker, The (1997)',880: u'Soul Food (1997)',881: u'Money Talks (1997)',882: u'Washington Square (1997)',883: u'Telling Lies in America (1997)',884: u'Year of the Horse (1997)',885: u'Phantoms (1998)',886: u'Life Less Ordinary, A (1997)',887: u"Eve's Bayou (1997)",888: u'One Night Stand (1997)',889: u'Tango Lesson, The (1997)',890: u'Mortal Kombat: Annihilation (1997)',891: u'Bent (1997)',892: u'Flubber (1997)',893: u'For Richer or Poorer (1997)',894: u'Home Alone 3 (1997)',895: u'Scream 2 (1997)',896: u'Sweet Hereafter, The (1997)',897: u'Time Tracers (1995)',898: u'Postman, The (1997)',899: u'Winter Guest, The (1997)',900: u'Kundun (1997)',901: u'Mr. Magoo (1997)',902: u'Big Lebowski, The (1998)',903: u'Afterglow (1997)',904: u'Ma vie en rose (My Life in Pink) (1997)',905: u'Great Expectations (1998)',906: u'Oscar & Lucinda (1997)',907: u'Vermin (1998)',908: u'Half Baked (1998)',909: u'Dangerous Beauty (1998)',910: u'Nil By Mouth (1997)',911: u'Twilight (1998)',912: u'U.S. Marshalls (1998)',913: u'Love and Death on Long Island (1997)',914: u'Wild Things (1998)',915: u'Primary Colors (1998)',916: u'Lost in Space (1998)',917: u'Mercury Rising (1998)',918: u'City of Angels (1998)',919: u'City of Lost Children, The (1995)',920: u'Two Bits (1995)',921: u'Farewell My Concubine (1993)',922: u'Dead Man (1995)',923: u'Raise the Red Lantern (1991)',924: u'White Squall (1996)',925: u'Unforgettable (1996)',926: u'Down Periscope (1996)',927: u'Flower of My Secret, The (Flor de mi secreto, La) (1995)',928: u'Craft, The (1996)',929: u'Harriet the Spy (1996)',930: u'Chain Reaction (1996)',931: u'Island of Dr. Moreau, The (1996)',932: u'First Kid (1996)',933: u'Funeral, The (1996)',934: u"Preacher's Wife, The (1996)",935: u'Paradise Road (1997)',936: u'Brassed Off (1996)',937: u'Thousand Acres, A (1997)',938: u'Smile Like Yours, A (1997)',939: u'Murder in the First (1995)',940: u'Airheads (1994)',941: u'With Honors (1994)',942: u"What's Love Got to Do with It (1993)",943: u'Killing Zoe (1994)',944: u'Renaissance Man (1994)',945: u'Charade (1963)',946: u'Fox and the Hound, The (1981)',947: u'Big Blue, The (Grand bleu, Le) (1988)',948: u'Booty Call (1997)',949: u'How to Make an American Quilt (1995)',950: u'Georgia (1995)',951: u'Indian in the Cupboard, The (1995)',952: u'Blue in the Face (1995)',953: u'Unstrung Heroes (1995)',954: u'Unzipped (1995)',955: u'Before Sunrise (1995)',956: u"Nobody's Fool (1994)",957: u'Pushing Hands (1992)',958: u'To Live (Huozhe) (1994)',959: u'Dazed and Confused (1993)',960: u'Naked (1993)',961: u'Orlando (1993)',962: u'Ruby in Paradise (1993)',963: u'Some Folks Call It a Sling Blade (1993)',964: u'Month by the Lake, A (1995)',965: u'Funny Face (1957)',966: u'Affair to Remember, An (1957)',967: u'Little Lord Fauntleroy (1936)',968: u'Inspector General, The (1949)',969: u'Winnie the Pooh and the Blustery Day (1968)',970: u'Hear My Song (1991)',971: u'Mediterraneo (1991)',972: u'Passion Fish (1992)',973: u'Grateful Dead (1995)',974: u'Eye for an Eye (1996)',975: u'Fear (1996)',976: u'Solo (1996)',977: u'Substitute, The (1996)',978: u"Heaven's Prisoners (1996)",979: u'Trigger Effect, The (1996)',980: u'Mother Night (1996)',981: u'Dangerous Ground (1997)',982: u'Maximum Risk (1996)',983: u"Rich Man's Wife, The (1996)",984: u'Shadow Conspiracy (1997)',985: u'Blood & Wine (1997)',986: u'Turbulence (1997)',987: u'Underworld (1997)',988: u'Beautician and the Beast, The (1997)',989: u"Cats Don't Dance (1997)",990: u'Anna Karenina (1997)',991: u'Keys to Tulsa (1997)',992: u'Head Above Water (1996)',993: u'Hercules (1997)',994: u'Last Time I Committed Suicide, The (1997)',995: u'Kiss Me, Guido (1997)',996: u'Big Green, The (1995)',997: u'Stuart Saves His Family (1995)',998: u'Cabin Boy (1994)',999: u'Clean Slate (1994)',1000: u'Lightning Jack (1994)',...}
r_with_titles = r_123.map(lambda x: (titles[x.product], x.rating))
r_with_titles.sortBy(lambda x:-x[1]).take(10)
[(u'Fantasia (1940)', 5.0),(u'Postino, Il (1994)', 5.0),(u'2001: A Space Odyssey (1968)', 5.0),(u'Raging Bull (1980)', 5.0),(u'Jean de Florette (1986)', 5.0),(u'Secrets & Lies (1996)', 5.0),(u'My Fair Lady (1964)', 5.0),(u'Godfather, The (1972)', 5.0),(u'Lawrence of Arabia (1962)', 5.0),(u'Enchanted April (1991)', 5.0)]
topK_with_titles = map(lambda x:(titles[x.product], x.rating), topK_movies)
topK_with_titles
[(u"Marvin's Room (1996)", 6.238110881748787),(u'Full Monty, The (1997)', 6.168910818255823),(u'Boot, Das (1981)', 5.954825359796975),(u'Old Yeller (1957)', 5.921099652606939),(u'Candidate, The (1972)', 5.670748725720875)]

相似度计算

sampleRDD = sc.parallelize([['A', 111], ['A', 222], ['B', 333]])
sampleRDD.lookup('A')
[111, 222]

使用余弦相似度来计算电影之间的相似程度,比如我们要计算编号为456的电影与其他电影的余弦相似度,首先我们要将电影456的因子提取出来:

arr = cf_model.productFeatures().lookup(456)[0]
arr
array('d', [0.029076049104332924, 0.21009714901447296, 0.0290555227547884, -0.5571964383125305, -0.3824714124202728, -0.4592842161655426, 0.6329585313796997, -0.362333744764328, -0.1305536925792694, 0.8419598340988159, -0.13552409410476685, -0.6138198971748352, 0.02604905515909195, 0.08060657978057861, -0.16706441342830658, -0.3220045566558838, 0.43676093220710754, 0.07212082296609879, 0.16547970473766327, 0.049271613359451294, -0.018478330224752426, 0.4917396306991577, -1.259914517402649, 0.30777591466903687, 0.3512609004974365, -1.1641650199890137, -0.08893561363220215, 0.5041327476501465, -0.5516676902770996, -0.13129214942455292, -0.7094163298606873, -0.095136858522892, 0.0024825106374919415, -0.5574610233306885, 0.6876130104064941, -0.14038291573524475, -0.3861311674118042, 0.08736740052700043, 0.7943630218505859, 1.0195096731185913, 0.49429407715797424, -0.07107719779014587, 0.21480131149291992, -0.572085976600647, 0.030756879597902298, 1.120257019996643, 0.012996670790016651, 0.5901889801025391, 0.9270225167274475, -0.8173779845237732])
# 与lookup效果一样
cf_model.productFeatures().filter(lambda x:x[0] == 456)\
.map(lambda x:x[1]).first()
array('d', [0.029076049104332924, 0.21009714901447296, 0.0290555227547884, -0.5571964383125305, -0.3824714124202728, -0.4592842161655426, 0.6329585313796997, -0.362333744764328, -0.1305536925792694, 0.8419598340988159, -0.13552409410476685, -0.6138198971748352, 0.02604905515909195, 0.08060657978057861, -0.16706441342830658, -0.3220045566558838, 0.43676093220710754, 0.07212082296609879, 0.16547970473766327, 0.049271613359451294, -0.018478330224752426, 0.4917396306991577, -1.259914517402649, 0.30777591466903687, 0.3512609004974365, -1.1641650199890137, -0.08893561363220215, 0.5041327476501465, -0.5516676902770996, -0.13129214942455292, -0.7094163298606873, -0.095136858522892, 0.0024825106374919415, -0.5574610233306885, 0.6876130104064941, -0.14038291573524475, -0.3861311674118042, 0.08736740052700043, 0.7943630218505859, 1.0195096731185913, 0.49429407715797424, -0.07107719779014587, 0.21480131149291992, -0.572085976600647, 0.030756879597902298, 1.120257019996643, 0.012996670790016651, 0.5901889801025391, 0.9270225167274475, -0.8173779845237732])

提取的电影456的因子以数组形式返回,为了计算余弦相似度,需要将其向量化

from pyspark.mllib.linalg import DenseVectorselectedVector = DenseVector(arr)
selectedVector
DenseVector([0.0291, 0.2101, 0.0291, -0.5572, -0.3825, -0.4593, 0.633, -0.3623, -0.1306, 0.842, -0.1355, -0.6138, 0.026, 0.0806, -0.1671, -0.322, 0.4368, 0.0721, 0.1655, 0.0493, -0.0185, 0.4917, -1.2599, 0.3078, 0.3513, -1.1642, -0.0889, 0.5041, -0.5517, -0.1313, -0.7094, -0.0951, 0.0025, -0.5575, 0.6876, -0.1404, -0.3861, 0.0874, 0.7944, 1.0195, 0.4943, -0.0711, 0.2148, -0.5721, 0.0308, 1.1203, 0.013, 0.5902, 0.927, -0.8174])
# 定义余弦相似度函数
def cosSim(vectorA, vectorB):return vectorA.dot(vectorB) / (vectorA.norm(2)*vectorB.norm(2))cosSim(selectedVector, selectedVector)
1.0

使用map方法将cosSim函数映射到每一个电影的因子上,返回由电影编号和余弦相似度组成的元组

sims = cf_model.productFeatures()\
.map(lambda x:(x[0], cosSim(selectedVector, DenseVector(x[1]))))
sims.take(5)
[(2, 0.63205163077832793),(4, 0.57081651505456033),(6, 0.57056619721078805),(8, 0.61730808637739021),(10, 0.55560185135898443)]

取相似度最高的10部电影

simsTopK = sims.top(10, lambda x:x[1])
simsTopK
[(456, 1.0),(1446, 0.77381634899106144),(249, 0.75186850153352536),(1206, 0.75042081098056868),(1028, 0.74412474419118724),(1435, 0.7440397393142627),(42, 0.73813968356865434),(1249, 0.73347222767192244),(411, 0.73245706263195443),(240, 0.7307674356843924)]

使用top方法和takeOrdered方法的效率比较高,因为只要将指定的记录返回就可以了,不需要对所有记录都进行排序;而使用sortBy方法的执行效率则较低,因为这要将所有记录都排序之后再选择记录。

sims.takeOrdered(10, lambda x:-x[1])
sims.sortBy(lambda x:x[1], False).take(10)
[(456, 1.0),(1446, 0.77381634899106144),(249, 0.75186850153352536),(1206, 0.75042081098056868),(1028, 0.74412474419118724),(1435, 0.7440397393142627),(42, 0.73813968356865434),(1249, 0.73347222767192244),(411, 0.73245706263195443),(240, 0.7307674356843924)]
map( lambda x:(titles[x[0]], x[1]), simsTopK)
[(u'Beverly Hills Ninja (1997)', 1.0),(u'Bye Bye, Love (1995)', 0.77381634899106144),(u'Austin Powers: International Man of Mystery (1997)', 0.75186850153352536),(u'Amos & Andrew (1993)', 0.75042081098056868),(u'Grumpier Old Men (1995)', 0.74412474419118724),(u'Steal Big, Steal Little (1995)', 0.7440397393142627),(u'Clerks (1994)', 0.73813968356865434),(u'For Love or Money (1993)', 0.73347222767192244),(u'Nutty Professor, The (1996)', 0.73245706263195443),(u'Beavis and Butt-head Do America (1996)', 0.7307674356843924)]

模型验证

MSE\RMSE\MAE

mllib在其evaluation模块中提供了相应的RegressionMetrics类,可用于计算MSE、 RMSE和MAE,该类只需要传入一个由“实际评分-预测评分”组成的RDD即可生成相应对象

actual = ratings.map(lambda r: ((r.user, r.product), r.rating))
prediction = cf_model.predictAll(actual.map(lambda x: x[0]))\.map(lambda r: ((r.user, r.product), r.rating))
actual_prediction = actual.join(prediction)
actual_prediction.take(5)
[((506, 568), (5.0, 4.495298253573846)),((109, 365), (4.0, 4.068188145295891)),((621, 577), (3.0, 3.2018734286795425)),((720, 286), (5.0, 4.963056006543837)),((812, 326), (4.0, 3.999871511192283))]
from pyspark.mllib.evaluation import RegressionMetricsmetrics = RegressionMetrics(actual_prediction.map(lambda x: x[1]))
print 'MSE =', metrics.meanSquaredError
print 'RMSE =', metrics.rootMeanSquaredError
print 'MAE =', metrics.meanAbsoluteError
MSE = 0.0845211871908
RMSE = 0.290725277867
MAE = 0.204405585188

MAP

Mllib在其evaluation模块中有一个RankingMetrics类,可以很方便地计算PK和MAP。该类需要传入一个“(Prediction, Labels)”类型的RDD,其中Prediction是某个用户按模型预测排序的产品列表,Labels为该用户实际购买的产品列表

productIDs = cf_model.productFeatures().map(lambda p: p[0]).collect()
productMatrix = cf_model.productFeatures().map(lambda p: p[1]).collect()

产品因子矩阵与每一个用户因子矩阵做点积计算预测评分,把相对应的电影ID关联进去之后排序,排序后预测评分就不需要,只保留排过序的电影ID

import numpy as npestRatings = cf_model.userFeatures()\
.map(lambda x: (x[0], list(np.dot(productMatrix, x[1]))))\
.map(lambda x: (x[0], zip(x[1], productIDs)))\
.map(lambda x: (x[0], sorted(x[1] ,key=(lambda m: m[0]), reverse=True)))\
.map(lambda x: (x[0], [i[1] for i in x[1]]))
estRatings.first()
(2,[778,530,519,211,521,654,528,671,491,604,936,511,770,48,408,506,482,641,87,498,923,649,191,474,1194,584,520,507,610,963,241,487,178,45,524,427,144,97,495,187,615,601,493,162,513,132,1012,509,568,489,504,381,133,855,648,492,212,194,699,199,182,216,127,516,195,124,1210,526,837,514,510,1126,57,387,454,275,655,272,251,620,55,735,100,508,589,129,523,59,223,208,44,747,529,661,283,302,612,500,488,183,423,613,82,285,152,180,656,663,525,614,107,116,96,50,603,229,471,169,316,479,79,134,242,742,311,1020,657,313,736,98,955,1142,203,606,646,694,684,1147,527,168,450,402,435,484,174,595,404,462,543,633,65,724,753,480,1203,712,318,517,1197,136,126,215,921,872,558,665,151,157,651,317,486,638,664,466,430,847,645,468,233,224,1046,477,550,591,330,685,193,378,15,618,197,502,515,421,746,531,709,205,915,566,675,1222,490,9,867,750,189,605,481,188,172,165,58,30,1074,32,198,693,185,270,428,880,414,110,632,286,705,1004,1094,356,739,972,150,33,636,22,367,570,295,226,76,269,6,64,745,900,52,909,159,201,210,445,23,496,28,357,204,690,1204,14,166,485,4,61,135,303,805,12,483,1,1152,503,273,348,588,256,153,292,806,227,60,114,499,432,721,265,354,300,277,443,965,279,740,306,209,293,299,25,111,856,304,631,297,228,255,276,161,301,237,660,727,478,1591,154,1121,1124,602,1119,310,177,494,137,125,335,239,826,1021,639,617,644,775,282,13,8,958,176,429,1286,729,616,213,380,340,732,449,969,676,470,99,1109,951,20,69,149,257,1269,512,1134,234,261,1167,903,119,879,1039,754,628,245,611,716,956,522,121,284,89,246,11,959,71,53,56,962,1172,434,647,650,441,1221,686,436,88,929,653,1284,703,338,1421,1063,95,416,382,16,883,702,622,835,1016,1454,252,898,672,1007,349,730,77,364,790,141,322,844,794,202,21,181,171,344,1592,914,1285,196,697,329,924,31,848,334,995,1056,26,576,714,463,1238,578,164,92,1098,692,1176,536,326,708,1169,980,781,460,941,546,1105,336,874,744,768,887,54,186,148,179,765,1448,207,882,86,537,1381,222,117,219,371,258,674,459,813,1268,458,501,372,845,411,553,1298,1400,39,1101,2,1050,418,925,1078,67,715,1019,1451,549,108,268,662,966,807,113,1558,433,232,175,49,701,722,562,1070,444,942,811,1136,1161,993,796,145,403,83,305,944,881,792,73,979,1125,627,173,554,280,1042,1149,773,131,238,573,1097,1278,1439,1099,1456,973,621,190,707,977,710,1065,331,1062,619,362,876,200,94,1086,696,836,192,939,42,274,1251,323,1218,975,281,291,308,290,659,1281,945,155,365,1264,393,263,405,288,1263,399,593,1245,717,425,278,1516,298,540,1224,109,78,1258,467,1242,287,289,3,406,748,1449,1280,70,72,312,420,1184,461,679,75,533,262,575,850,624,307,865,954,396,1005,846,1470,1073,296,1123,1185,1135,623,1131,102,417,1225,791,557,821,896,19,1277,347,236,863,81,497,658,789,904,779,473,762,475,535,786,472,843,230,118,106,968,1192,829,248,206,332,832,642,928,419,946,385,585,447,1153,350,1084,170,580,749,761,961,266,267,698,560,1189,1406,351,41,917,1615,577,1303,949,1171,783,327,346,18,455,1368,563,1527,142,1450,572,1148,559,337,1295,1555,785,864,552,854,609,355,156,1009,808,713,366,934,1311,608,731,797,1025,1367,66,1091,1102,51,1643,1141,476,947,937,728,46,803,596,1605,138,902,1229,1656,505,1137,889,1331,873,986,1462,1411,24,1143,918,802,853,833,1397,1211,755,834,221,1045,680,1220,542,587,931,793,386,1428,339,1518,320,1122,40,990,464,737,1388,1092,448,652,1407,63,469,824,1288,812,1265,43,943,878,888,1193,1228,1011,782,392,1118,123,1186,400,1312,579,637,842,271,1475,1617,1205,1248,105,899,1044,328,1300,983,1048,115,967,1107,1200,800,953,333,706,1103,905,410,825,321,927,583,912,231,352,919,1138,764,1379,752,815,146,1116,91,167,629,534,877,970,1079,689,1441,1014,218,1064,532,1090,1322,1434,85,431,1333,1058,1150,249,1060,160,143,5,1512,827,518,128,630,1531,886,960,799,1139,1188,561,1195,787,1187,670,1035,1483,1444,801,10,147,1159,626,1261,922,1620,62,908,933,1262,673,564,1082,1473,734,1267,607,345,1337,1040,769,47,592,1508,1059,971,667,453,695,1282,1378,1168,831,741,1223,220,122,1468,1296,948,996,1534,751,1293,1226,823,84,938,809,1594,766,733,1299,891,1445,452,1369,415,1061,1232,395,935,1026,1436,1217,1214,394,994,718,820,1010,1128,341,1071,691,407,1305,1372,998,885,952,1053,1355,597,859,1266,1067,625,1006,1443,700,569,163,723,1208,...])

用户实际评分的电影,要生成的是key-value类型的RDD

userMovies = ratings.map(lambda r:(r.user, r.product))\.groupByKey()\.mapValues(list)
userMovies.first()
(2,[237,300,100,127,285,289,304,272,278,288,286,275,302,296,292,251,50,314,297,290,312,281,13,280,303,308,307,257,316,315,301,313,279,299,298,19,277,282,111,258,295,242,283,276,1,305,14,287,291,293,294,310,309,306,25,273,10,311,269,255,284,274])

userMovies与estRatings具有相似的内部结构,将这两个RDD连接起来,再提取其中的预测结果和实际结果,就构成了需要传入RankingMetrics类的RDD

predictionAndLabels = estRatings.join(userMovies).map(lambda x: x[1])
predictionAndLabels.first()
([42,1073,474,171,188,177,60,150,180,513,530,89,462,98,199,652,168,427,523,512,127,174,55,357,663,654,56,203,318,183,198,527,286,11,346,50,273,186,97,176,179,211,639,657,170,522,603,276,511,942,169,14,45,33,64,285,484,59,856,224,641,1194,520,165,475,17,492,172,702,190,185,156,12,480,435,166,724,615,372,493,518,9,134,135,48,955,154,1142,187,195,483,311,505,661,205,269,515,61,721,410,189,659,69,116,96,509,238,963,223,648,216,246,202,197,490,919,6,847,1010,253,482,111,558,182,429,404,162,516,315,528,92,478,175,479,32,81,443,489,789,497,960,421,173,209,459,178,316,268,481,628,979,498,770,52,408,108,153,608,466,93,1007,114,129,693,234,1070,444,317,251,713,649,137,638,124,193,647,660,813,432,557,923,507,7,100,469,762,709,194,524,272,651,529,735,921,144,1021,3,632,593,526,250,903,53,504,501,279,604,136,200,922,503,181,293,301,26,544,428,746,344,463,184,67,506,510,499,525,191,1009,196,508,265,71,306,430,43,82,236,467,160,128,566,302,637,1,141,248,152,640,23,287,971,221,242,1459,1245,13,425,496,570,750,614,650,1401,1067,83,1238,1048,207,611,210,653,633,1114,546,792,1109,159,433,631,1005,993,206,864,99,607,642,1019,161,471,946,1240,612,79,531,562,24,22,636,132,671,218,117,51,305,925,262,613,115,208,616,192,708,461,1197,514,1286,487,1134,580,65,491,634,458,780,965,975,837,310,163,249,256,582,347,151,936,58,4,763,943,382,327,295,239,664,416,106,10,673,584,464,778,645,549,727,39,697,447,331,257,794,1126,753,95,854,764,896,646,952,625,396,455,255,20,630,705,622,1018,710,601,364,1011,320,537,1012,488,741,204,655,882,86,237,28,282,596,519,241,1039,80,1120,902,956,76,486,214,1226,521,924,201,576,8,844,543,436,226,1103,712,806,554,1099,1149,1098,718,939,244,441,1143,460,232,915,751,420,1017,431,126,980,1203,744,19,494,694,46,87,959,1065,228,367,1059,736,324,386,16,418,947,385,684,609,1020,303,57,448,707,1118,77,1093,215,452,15,587,793,904,1251,875,610,550,737,381,365,588,101,109,945,849,284,569,761,1113,495,536,133,167,30,1195,665,148,972,863,434,334,644,277,534,591,380,113,624,47,121,485,517,41,1221,1008,1478,125,619,855,31,1101,213,470,139,384,291,1267,333,1050,94,679,147,267,887,275,733,307,229,732,1176,820,928,1047,900,1062,1107,739,1172,629,345,445,502,730,1449,774,164,414,953,969,1192,818,865,824,330,1046,772,340,1524,70,606,90,589,456,297,1111,411,1124,1097,695,620,283,88,740,233,635,532,131,1131,313,157,258,618,810,1119,1112,423,595,288,656,1115,1171,696,290,941,747,1248,1085,1081,1068,621,473,107,91,298,54,1117,378,836,1218,1028,1116,745,686,966,602,1147,825,227,805,44,675,339,841,73,321,961,755,949,866,872,1110,831,358,328,476,1188,393,843,886,252,212,940,535,356,1063,343,217,977,49,930,355,389,1206,235,123,933,1161,1077,281,909,883,219,1404,729,230,105,823,1512,680,1335,592,568,1042,27,1208,967,1058,222,748,605,1129,533,1169,859,715,809,685,995,786,1312,62,240,692,155,719,1123,1198,402,1006,1296,1024,145,292,1187,274,1139,1375,1451,2,583,359,1136,245,1153,968,578,322,1298,335,371,1315,130,581,63,1056,781,962,329,874,728,472,808,898,840,158,754,465,765,964,577,1284,833,742,714,899,552,309,1269,366,1150,950,68,563,835,704,477,401,658,122,895,450,1199,1170,1086,905,1200,1045,149,1244,722,1074,1421,351,1281,929,540,689,5,412,564,319,701,280,260,674,1051,326,749,1035,25,559,1311,337,1324,888,573,690,547,829,1231,1222,312,352,1220,561,1078,1431,300,118,662,585,278,387,1592,500,403,790,845,985,102,1501,802,468,1263,782,811,867,1132,1434,698,399,720,373,1473,1015,1152,1121,270,1211,1137,1168,1184,906,1016,627,958,804,879,38,723,299,84,342,332,66,405,1204,453,1411,294,997,85,1022,556,1367,1125,120,379,1072,848,876,1243,395,853,266,388,362,912,1210,572,1069,1060,991,670,1127,1041,1301,934,415,1388,914,597,880,1157,752,1558,231,336,406,970,1479,862,725,304,289,1066,296,1052,1264,926,738,1278,551,1368,699,1230,869,350,1091,1288,419,743,1023,72,1379,892,787,119,672,1597,797,271,1166,1495,785,717,1255,574,1141,451,951,354,1004,944,1080,889,1084,143,1090,369,916,617,1178,881,1462,773,353,308,1228,1189,1025,338,1303,449,18,771,873,363,1537,1071,937,1560,1305,555,984,1225,801,1014,815,779,264,541,1232,538,1277,1095,716,1223,1268,1273,800,548,1160,...],[273,258,286,183,50,325,1238,186,265,23,1,198,318,11,1459,313,97,191,527,302,56])

计算PK和MAP

from pyspark.mllib.evaluation import RankingMetricsrankingMetrics = RankingMetrics(predictionAndLabels)
print 'MAP =', rankingMetrics.meanAveragePrecision
print 'PrecisionAtK =', rankingMetrics.precisionAt(20)
MAP = 0.192062343904
PrecisionAtK = 0.182025450689

模型持久化

cf_model.save(sc, './cf_model')
from pyspark.mllib.recommendation import MatrixFactorizationModelload_cf_model = MatrixFactorizationModel.load(sc, './cf_model')
load_cf_model.predict(123, 456)
0.5189201089615622

movie_recommendation_spark1相关推荐

最新文章

  1. 使用Notepad++比较文件的差异
  2. 有没有适合部署在局域网的团队协作平台?
  3. 5.25. Spring boot with Git version
  4. 判断一个字符串是否为数字
  5. 数学--数论--HDU 6128 Inverse of sum (公式推导论)
  6. selenuim自动化爬取汽车在线谷米爱车网车辆GPS数据爬虫
  7. 前端学习(1294):相对路径和绝对路径
  8. JavaScript.Remove
  9. 理解 Linux 配置文件【转】
  10. matlab 匹配滤波相位谱,信号检测与估计知识点总结(2)
  11. 网络断网远程计算机会自动修复么,网络断网不怕,教你自己动手修复
  12. 信息系统项目管理师---第十章 项目沟通管理和项目干系人管理
  13. 2021年电子元器件行业涨价趋势已现
  14. 我的世界服务器怎么无限刷红石,我的世界无限红石怎么做 无限红石BUG攻略
  15. 微信小程序 MinUI 组件库系列之 abnor 异常流组件
  16. Word表格外的第一个空行如何删除
  17. ros构建机器人运动学模型_ROS系统玩转自主移动机器人(5)-- ROS系统建模
  18. 新手搭建个人博客--详细步骤
  19. java执行sql列名无效_列名无效!java代码里的SQL语句!数据库里可以得到正确为什么放java里出错了?...
  20. 一步一步教你用CSS画爱心

热门文章

  1. 如何将计算机网络设置为家庭网络连接打印机共享,怎样设置家庭网络打印机共享...
  2. Spring Security登录用户数据获取(4)
  3. 输入你的密码来连接到_查看电脑已连接的WIFI密码
  4. Http Digest 认证
  5. Distributed System
  6. 软件技术部第一次机器学习培训
  7. 如何解释反向代理与正向代理
  8. workon 未找到命令
  9. 微型计算机之哈佛架构是什么?
  10. 搜索技术-全文检索概述