首先,我们先要获取爱奇艺的电视剧排行,网址为http://v.iqiyi.com/index/dianshiju/index.html

我们可以看到这些电视剧的排名

我们要做的是首先获取网页源代码

headers={    'User-Agent':'Mozilla/5.0(Macintosh;intel Mac OS 10_11_4)Applewebkit/537.36(KHTML,like Gecko)Chrome/52.0.2743.116 Safari/537.36' } url=('http://v.iqiyi.com/index/dianshiju/index.html')re=requests.get(url,headers=headers)         

得到源代码后,我们再进行分析源代码,这里展示一小部分源代码

<tr data-ranklist-elem="item" >              <td class="col-title">                <div class="rank_list">                  <i class="icon_top icon_top-3" data-ranklist-elem="rank">1</i>                                            <a href="http://so.iqiyi.com/so/q_亲爱的,热爱的" class="item_name" rseat="819k1_dianshiju" target='_blank'>亲爱的,热爱的</a>                                    </div>              </td>                            <td>                    <span class="item_usrInfo">                                                                        <a href="http://so.iqiyi.com/so/q_杨紫" rseat="819s1_dianshiju" target='_blank'>杨紫</a>/                                                                                                <a href="http://so.iqiyi.com/so/q_李现" rseat="819s1_dianshiju" target='_blank'>李现</a>/                                                                                                <a href="http://so.iqiyi.com/so/q_胡一天" rseat="819s1_dianshiju" target='_blank'>胡一天</a>                                                                                                                                                                                                                                                                                                                                                                                    </span>                </td>                          <td class="col-num">                <div class="rank_list" data-ranklist-yes="13059697">                  <span class="item_num">13,059,697</span>                                            <i class="tend tend_line" title="平稳"></i>                                    </div>              </td>              <td class="col-num">                <div class="rank_list" data-ranklist-week="76916679">                  <span class="item_num">76,916,679</span>                                            <i class="tend tend_line" title="平稳"></i>                                    </div>              </td>              <td class="col-num">                <div class="rank_list" data-ranklist-mon="103566713">                  <span class="item_num">103,566,713</span>                </div>              </td>            </tr>

我们在通过正侧表达式分析为

<tr data-ranklist-elem="item" .*?<i class=".*?" data-ranklist-elem="rank">(.*?)</i>.*? rseat=".*?" target=.*?>(.*?)</a>.*? <a href=".*?" rseat=".*?" target=.*?>(.*?)</a>.*?<a href=".*?" rseat=".*?" target=.*?>(.*?)</a>.*?<a href=".*?" rseat=".*?" target=.*?>(.*?)</a>.*?<div class="rank_list" data-ranklist-yes=".*?">.*?<span class="item_num">(.*?)</span>.*?<i class=".*?" title="(.*?)">.*?<div class="rank_list" data-ranklist-week=".*?">.*? <span class="item_num">(.*?)</span>.*?<i class=".*?" title="(.*?)">.*?<div class=".*?" data-ranklist-mon=".*?">.*?<span class="item_num">(.*?)</span>.*?

正则表达式

知识如下

正则表达式 - 语法

正则表达式(regular expression)描述了一种字符串匹配的模式(pattern),可以用来检查一个串是否含有某种子串、将匹配的子串替换或者从某个串中取出符合某个条件的子串等。

例如:

  • runoo+b,可以匹配 runoob、runooob、runoooooob 等,+ 号代表前面的字符必须至少出现一次(1次或多次)。

  • runoo*b,可以匹配 runob、runoob、runoooooob 等,* 号代表字符可以不出现,也可以出现一次或者多次(0次、或1次、或多次)。

  • colou?r 可以匹配 color 或者 colour,? 问号代表前面的字符最多只可以出现一次(0次、或1次)。

构造正则表达式的方法和创建数学表达式的方法一样。也就是用多种元字符与运算符可以将小的表达式结合在一起来创建更大的表达式。正则表达式的组件可以是单个的字符、字符集合、字符范围、字符间的选择或者所有这些组件的任意组合。

正则表达式是由普通字符(例如字符 a 到 z)以及特殊字符(称为"元字符")组成的文字模式。模式描述在搜索文本时要匹配的一个或多个字符串。正则表达式作为一个模板,将某个字符模式与所搜索的字符串进行匹配。

总的代码:

import requests
import re
import json
import time
from requests.exceptions import RequestExceptiondef get_one_page(url):try:headers={'User-Agent':'Mozilla/5.0(Macintosh;intel Mac OS 10_11_4)Applewebkit/537.36(KHTML,like Gecko)Chrome/52.0.2743.116 Safari/537.36'}re=requests.get(url,headers=headers)if re.status_code==200:return re.textreturn Noneexcept RequestException:return Nonedef parse_one_page(html):pattern=re.compile('<tr data-ranklist-elem="item" .*?<i class=".*?" data-ranklist-elem="rank">(.*?)</i>.*? rseat=".*?" target=.*?>(.*?)</a>.*? <a href=".*?" rseat=".*?" target=.*?>(.*?)</a>.*?<a href=".*?" rseat=".*?" target=.*?>(.*?)</a>.*?<a href=".*?" rseat=".*?" target=.*?>(.*?)</a>.*?<div class="rank_list" data-ranklist-yes=".*?">.*?<span class="item_num">(.*?)</span>.*?<i class=".*?" title="(.*?)">.*?<div class="rank_list" data-ranklist-week=".*?">.*? <span class="item_num">(.*?)</span>.*?<i class=".*?" title="(.*?)">.*?<div class=".*?" data-ranklist-mon=".*?">.*?<span class="item_num">(.*?)</span>.*?',re.S)items=re.findall(pattern,html)print(items)for item in items:yield {'index': item[0],#'image': item[1],'title': item[1],'actor': item[2:5],'data-ranklist-yes':item[5],'tend_line':item[6],'data-ranklist-week':item[7],'tend_line1': item[8],# 'data-ranklist-mon': item[9]# 'time': item[4].strip()[5:],}def save_one_page(content):with open('re.txt', 'a', encoding='utf-8')as f:print(type(json.dumps(content)))f.write(json.dumps(content, ensure_ascii=False) + '\n')def main(offest):url = ('http://v.iqiyi.com/index/dianshiju/index.html')html = get_one_page(url)for item in parse_one_page(html):save_one_page(item)if __name__ == '__main__':for i in range(1):main(offest=i * 10)
{"index": "1", "title": "亲爱的,热爱的", "actor": ["杨紫", "李现", "胡一天"], "data-ranklist-yes": "13,059,697", "tend_line": "平稳", "data-ranklist-week": "76,916,679", "tend_line1": "平稳"}
{"index": "2", "title": "请赐我一双翅膀", "actor": ["鞠婧祎", "炎亚纶", "韩栋"], "data-ranklist-yes": "2,401,901", "tend_line": "平稳", "data-ranklist-week": "15,533,356", "tend_line1": "平稳"}
{"index": "3", "title": "少年派", "actor": ["张嘉译", "闫妮", "赵今麦"], "data-ranklist-yes": "1,801,400", "tend_line": "平稳", "data-ranklist-week": "15,471,047", "tend_line1": "平稳"}
{"index": "4", "title": "神犬小七3", "actor": ["姜潮", "宋妍霏", "徐可"], "data-ranklist-yes": "1,720,858", "tend_line": "上升", "data-ranklist-week": "10,596,919", "tend_line1": "上升"}
{"index": "5", "title": "流淌的美好时光", "actor": ["马天宇", "郑爽", "柴碧云"], "data-ranklist-yes": "1,454,898", "tend_line": "下降", "data-ranklist-week": "10,499,779", "tend_line1": "下降"}
{"index": "6", "title": "爱来的刚好", "actor": ["韩栋", "江铠同", "李威"], "data-ranklist-yes": "833,886", "tend_line": "平稳", "data-ranklist-week": "7,124,209", "tend_line1": "平稳"}
{"index": "7", "title": "带着爸爸去留学", "actor": ["孙红雷", "辛芷蕾", "曾舜晞"], "data-ranklist-yes": "830,028", "tend_line": "平稳", "data-ranklist-week": "7,673,427", "tend_line1": "平稳"}
{"index": "8", "title": "追球", "actor": ["范世錡", "卜冠今", "李艺彤"], "data-ranklist-yes": "807,514", "tend_line": "平稳", "data-ranklist-week": "8,054,590", "tend_line1": "平稳"}
{"index": "9", "title": "破冰行动", "actor": ["黄景瑜", "吴刚", "王劲松"], "data-ranklist-yes": "748,406", "tend_line": "平稳", "data-ranklist-week": "5,908,357", "tend_line1": "平稳"}
{"index": "10", "title": "爱情公寓4", "actor": ["娄艺潇", "陈赫", "邓家佳"], "data-ranklist-yes": "706,075", "tend_line": "上升", "data-ranklist-week": "4,692,537", "tend_line1": "上升"}
{"index": "11", "title": "陈情令", "actor": ["肖战", "王一博", "孟子义"], "data-ranklist-yes": "700,983", "tend_line": "平稳", "data-ranklist-week": "4,699,911", "tend_line1": "平稳"}
{"index": "12", "title": "时间都知道", "actor": ["唐嫣", "窦骁", "杨烁"], "data-ranklist-yes": "699,655", "tend_line": "平稳", "data-ranklist-week": "3,460,056", "tend_line1": "平稳"}
{"index": "13", "title": "归还世界给你", "actor": ["杨烁", "古力娜扎", "徐正溪"], "data-ranklist-yes": "647,017", "tend_line": "上升", "data-ranklist-week": "1,658,061", "tend_line1": "上升"}
{"index": "14", "title": "宸汐缘", "actor": ["张震", "倪妮", "李东学"], "data-ranklist-yes": "640,628", "tend_line": "平稳", "data-ranklist-week": "4,787,852", "tend_line1": "平稳"}
{"index": "15", "title": "河神", "actor": ["李现", "张铭恩", "王紫璇CiCi"], "data-ranklist-yes": "578,966", "tend_line": "下降", "data-ranklist-week": "4,259,929", "tend_line1": "下降"}
{"index": "16", "title": "长安十二时辰", "actor": ["雷佳音", "易烊千玺", "周一围"], "data-ranklist-yes": "546,752", "tend_line": "下降", "data-ranklist-week": "4,397,608", "tend_line1": "下降"}
{"index": "17", "title": "七月与安生", "actor": ["沈月", "陈都灵", "熊梓淇"], "data-ranklist-yes": "497,474", "tend_line": "上升", "data-ranklist-week": "788,476", "tend_line1": "上升"}
{"index": "18", "title": "大宋少年志", "actor": ["张新成", "周雨彤", "郑伟"], "data-ranklist-yes": "474,354", "tend_line": "下降", "data-ranklist-week": "5,339,238", "tend_line1": "下降"}
{"index": "19", "title": "香蜜沉沉烬如霜", "actor": ["杨紫", "邓伦", "陈钰琪"], "data-ranklist-yes": "464,225", "tend_line": "下降", "data-ranklist-week": "3,242,294", "tend_line1": "下降"}
{"index": "20", "title": "李三枪", "actor": ["刘恩佑", "战菁一", "高叶"], "data-ranklist-yes": "443,685", "tend_line": "上升", "data-ranklist-week": "1,569,801", "tend_line1": "上升"}
{"index": "21", "title": "九州缥缈录", "actor": ["刘昊然", "宋祖儿", "陈若轩"], "data-ranklist-yes": "441,532", "tend_line": "下降", "data-ranklist-week": "3,274,041", "tend_line1": "下降"}
{"index": "22", "title": "老九门", "actor": ["陈伟霆", "张艺兴", "赵丽颖"], "data-ranklist-yes": "418,361", "tend_line": "下降", "data-ranklist-week": "3,374,370", "tend_line1": "下降"}
{"index": "23", "title": "我的前半生", "actor": ["靳东", "马伊琍", "袁泉"], "data-ranklist-yes": "417,136", "tend_line": "下降", "data-ranklist-week": "2,765,214", "tend_line1": "下降"}
{"index": "24", "title": "天雷一部之春花秋月", "actor": ["李宏毅", "赵露思", "吴俊余"], "data-ranklist-yes": "403,066", "tend_line": "上升", "data-ranklist-week": "2,209,010", "tend_line1": "上升"}
{"index": "25", "title": "三生三世十里桃花", "actor": ["杨幂", "赵又廷", "张智尧"], "data-ranklist-yes": "384,577", "tend_line": "下降", "data-ranklist-week": "2,621,556", "tend_line1": "下降"}
{"index": "26", "title": "我们的少年时代", "actor": ["王俊凯", "王源", "易烊千玺"], "data-ranklist-yes": "373,314", "tend_line": "下降", "data-ranklist-week": "2,571,446", "tend_line1": "下降"}
{"index": "27", "title": "我们不能是朋友", "actor": ["刘以豪", "郭雪芙", "夏若妍"], "data-ranklist-yes": "372,439", "tend_line": "下降", "data-ranklist-week": "3,307,221", "tend_line1": "下降"}
{"index": "28", "title": "我要和你在一起", "actor": ["柴碧云", "孙绍龙", "万思维"], "data-ranklist-yes": "310,126", "tend_line": "上升", "data-ranklist-week": "2,046,496", "tend_line1": "上升"}
{"index": "29", "title": "灵魂摆渡3", "actor": ["于毅", "刘智扬", "肖茵"], "data-ranklist-yes": "304,648", "tend_line": "上升", "data-ranklist-week": "2,053,201", "tend_line1": "上升"}
{"index": "30", "title": "白发", "actor": ["张雪迎", "李治廷", "经超"], "data-ranklist-yes": "303,315", "tend_line": "下降", "data-ranklist-week": "3,506,497", "tend_line1": "下降"}
{"index": "31", "title": "亮剑", "actor": ["新大头儿子和小头爸爸", "王浩宇(童星)", "陈创"], "data-ranklist-yes": "287,988", "tend_line": "下降", "data-ranklist-week": "1,873,257", "tend_line1": "下降"}
{"index": "33", "title": "娘道", "actor": ["岳丽娜", "于毅", "张少华"], "data-ranklist-yes": "285,508", "tend_line": "下降", "data-ranklist-week": "1,938,314", "tend_line1": "下降"}
{"index": "34", "title": "欢乐颂2", "actor": ["刘涛", "蒋欣", "王子文"], "data-ranklist-yes": "281,111", "tend_line": "上升", "data-ranklist-week": "1,698,263", "tend_line1": "上升"}
{"index": "35", "title": "花千骨", "actor": ["霍建华", "赵丽颖", "蒋欣"], "data-ranklist-yes": "276,161", "tend_line": "下降", "data-ranklist-week": "1,997,459", "tend_line1": "下降"}
{"index": "36", "title": "武林外传", "actor": ["闫妮", "沙溢", "姚晨"], "data-ranklist-yes": "271,923", "tend_line": "下降", "data-ranklist-week": "1,901,301", "tend_line1": "下降"}
{"index": "37", "title": "神探柯晨", "actor": ["黄志忠", "吴刚", "李倩"], "data-ranklist-yes": "245,894", "tend_line": "下降", "data-ranklist-week": "2,323,772", "tend_line1": "下降"}
{"index": "38", "title": "我是特种兵之利刃出鞘", "actor": ["吴京", "徐佳", "赵荀"], "data-ranklist-yes": "244,222", "tend_line": "下降", "data-ranklist-week": "1,831,220", "tend_line1": "下降"}
{"index": "39", "title": "芸汐传", "actor": ["鞠婧祎", "张哲瀚", "米热"], "data-ranklist-yes": "243,418", "tend_line": "下降", "data-ranklist-week": "1,755,279", "tend_line1": "下降"}
{"index": "40", "title": "哥哥姐姐的花样年华", "actor": ["王雅捷", "王挺", "周扬"], "data-ranklist-yes": "221,093", "tend_line": "下降", "data-ranklist-week": "3,032,053", "tend_line1": "下降"}
{"index": "41", "title": "奋斗吧,少年!", "actor": ["彭昱畅", "董力", "张逸杰"], "data-ranklist-yes": "220,023", "tend_line": "上升", "data-ranklist-week": "381,625", "tend_line1": "上升"}
{"index": "42", "title": "微微一笑很倾城", "actor": ["郑爽", "杨洋", "毛晓彤"], "data-ranklist-yes": "219,655", "tend_line": "下降", "data-ranklist-week": "1,605,033", "tend_line1": "下降"}
{"index": "43", "title": "三国演义", "actor": ["鲍国安", "唐国强", "孙彦军"], "data-ranklist-yes": "219,400", "tend_line": "下降", "data-ranklist-week": "1,560,102", "tend_line1": "下降"}
{"index": "44", "title": "杉杉来了", "actor": ["张翰", "赵丽颖", "黄宥明"], "data-ranklist-yes": "217,775", "tend_line": "下降", "data-ranklist-week": "1,569,643", "tend_line1": "下降"}
{"index": "45", "title": "动物管理局", "actor": ["陈赫", "王子文", "唐晓天"], "data-ranklist-yes": "213,471", "tend_line": "下降", "data-ranklist-week": "2,242,207", "tend_line1": "下降"}
{"index": "46", "title": "鸡毛飞上天", "actor": ["张译", "殷桃", "高姝瑶"], "data-ranklist-yes": "203,069", "tend_line": "下降", "data-ranklist-week": "1,744,331", "tend_line1": "下降"}
{"index": "47", "title": "都挺好", "actor": ["姚晨", "倪大红", "郭京飞"], "data-ranklist-yes": "201,766", "tend_line": "下降", "data-ranklist-week": "1,469,292", "tend_line1": "下降"}
{"index": "48", "title": "甄嬛传", "actor": ["孙俪", "陈建斌", "蔡少芬"], "data-ranklist-yes": "197,187", "tend_line": "下降", "data-ranklist-week": "1,510,209", "tend_line1": "下降"}
{"index": "49", "title": "火蓝刀锋", "actor": ["杨志刚", "郑凯", "赫子铭"], "data-ranklist-yes": "186,544", "tend_line": "平稳", "data-ranklist-week": "1,118,312", "tend_line1": "平稳"}
{"index": "50", "title": "夜空中最闪亮的星", "actor": ["黄子韬", "吴倩", "牛骏峰"], "data-ranklist-yes": "184,997", "tend_line": "平稳", "data-ranklist-week": "1,267,903", "tend_line1": "平稳"}

Python学习——三分钟分析目前最火的电视剧相关推荐

  1. python学习三-基础语法

    python学习三-基础语法(2019-12-24日晚) 1.源码文件 Python源码文件名通常采用小写的方式,常见的扩展名有: py:基本的源码扩展名. pyw:是另一种源码扩展名,跟py唯一的区 ...

  2. 手机三分钟调出INS最火青橙色调

    青橙色调在ins上是很受欢迎的,虽然现在都烂大街了,感觉到平平无奇. 但是这种配色很适合夏日风光,以及带有天空,水元素之类的图片,今天就用三分钟的时间,妥妥教会你. 一分钟下软件 我们今天要使用的软件 ...

  3. Python学习三: 爬虫高级技巧 与 模拟实战练习

    三大爬虫技巧 许多网站针对爬虫的访问都设置了一定的障碍,通过这三步技巧,轻松绕过部分的反爬虫限制. (1)设置程序休止时间 import time import random# 休止睡眠 1 秒 这里 ...

  4. Python学习三:Python开发工具

    前面我们已经讲过如何打开IDLE,接下来我们就要用它来编写代码了 (1)在IDLE的主菜单上,选择File→New File菜单项,会打开一个新窗口,在该窗口中可以直接编写Python代码. (2)在 ...

  5. python基础-三分钟搞定面试官爱问的【异常处理】

    python基础-异常处理 一.异常简介 1.异常 2.处理异常 二.异常传播 三.异常对象 四.自定义异常对象 一.异常简介 1.异常 程序在运行过程当中,不可避免的会出现一些错误,比如:使用了没有 ...

  6. Python学习三——列表

    1.列表简介 列表由一系列按特定顺序排列的元素组成,并用逗号来分隔其中的元素.如: alphabet=['a','b','c','d'] print(alphabet) 2.访问列表元素 索引从0 开 ...

  7. python学习_day7---数据分析matplotlib+pandas

    文章目录 一.matplotlib 1.柱状图 2.直方图 3.饼状图 4.雷达图 二.pandas 1.DataFrame 1>DataFrame的创建 2>Datafram的属性 2. ...

  8. Python学习——霍兰德人格分析雷达图代码问题

    实例15: 霍兰德人格分析雷达图,输入代码后执行报错,错误信息:AttributeError: 'Text' object has no property 'frac' 查阅相关资料,与第三方库的版本 ...

  9. python学习三:列表,元组

    1.列表: 1.列表的定义方式: list1 = [1,2,3,4,"hello","world"] 如上所示,list1就是一个列表,列表的内容以中括号包含起 ...

  10. python学习三 函数

    1 定义一个简单的函数和使用 def hello(): print 'hello world' hello() 2  带多参数的函数 def max(i,j): if i>j: print i, ...

最新文章

  1. 技术分享:逆向分析ATM分离器
  2. Haha!Sniffer Pro终于可以用啦~
  3. 一个奇怪的sql异常
  4. UDP协议下内网与公网IP进行发送消息,一对多.且选择不同的客户端发送消息
  5. 网络协议文档阅读笔记-Introduction to DTLS(Datagram Transport Layer Security)
  6. 4. OD-去除烦人的nag窗口(去除提醒用户购买正版的警告窗口)
  7. eclipse 如何关联git_git的相关操作
  8. WCF BasicHttpBinding 安全解析(1)BasicHttpBinding基本配置
  9. 零基础学python用哪本书好-零基础想要学习Python编程 ,不知道看哪本书?
  10. HDU 5427 A problem of sorting 水题
  11. 删除计算机用户时拒绝访问权限,如果打印机拒绝访问并且无法删除设备怎么办...
  12. Elasticsearch方案管理 Spring Boot 和 Nginx 日志
  13. PMP-项目采购管理
  14. python三维建模可视化_用Python三维可视化-一个神奇的函数
  15. 【神经网络学习】鸢尾花分类的实现
  16. 我喜欢这首歌......
  17. 一个BAT大厂面试者整理的Android面试题目!
  18. html5 ios keychain,iOS10适配之Keychain读写失败
  19. 动态规划解二维多重背包问题
  20. 基于嵌入式的无线条码扫描仪系统的设计

热门文章

  1. python 100天 pdf 最新版_GitHub - Nolan2018/Python-100-Days: Python - 100天从新手到大师
  2. html5 video断点续播,vue-video-player 断点续播的实现
  3. Retrofit原理
  4. 富有哲理的10则故事(必读经典)
  5. ArchLinux Plasma 简洁优雅桌面环境设置
  6. 原生JS实现自定义滚动条
  7. 【转】强烈推荐几个新鲜、好玩、另类的小游戏
  8. GeoHash在空间道路密度计算中的应用-以mobike骑行轨迹为例
  9. UE4 C++编程入门整理
  10. DaDa英语怎么样,给孩子报名哒哒英语上课好不好?