爬取虎牙TV全站主播信息

"""
Created by Young on 2019/1/16 17:00
"""from bs4 import BeautifulSoup
import requests
import json as js
import reheaders = {'user-agent':''}#之前爬取错了,这个只能爬取一页
def parsing_webpage(url):wb_data = requests.get(url,headers=headers)wb_data.encoding = "utf-8"  #解决乱码soup = BeautifulSoup(wb_data.text,'lxml',from_encoding="utf8")rooms = soup.find('ul',class_='live-list clearfix')single_rooms = rooms.find_all('li',class_='game-live-item')for single_room in single_rooms:room_title = single_room.find_all('a',class_='title new-clickstat')[0].get_text()nick_title = single_room.find_all('i',class_='nick')[0].get_text()room_popularity = single_room.find_all('i',class_='js-num')[0].get_text()print({"room_title":room_title,"nick_title":nick_title,"room_popularity":room_popularity})#正则爬取
def parsing_json(true_url):wb_data = requests.get(true_url, headers=headers)wb_data.encoding = "utf-8"  # 解决乱码temps = js.loads(wb_data.text)datas = str(temps)introduction = re.findall(" 'introduction': '(.*?)', 'recommendStatus': ", datas, re.S)totalCount = re.findall(" 'totalCount': '(.*?)', 'roomName': ", datas, re.S)nick = re.findall(" 'nick': '(.*?)', 'avatar180': ", datas, re.S)for introduction, totalCount, nick, in zip(introduction, totalCount, nick,):data = {'介绍': introduction,'人气': totalCount,'主播名': nick,}print(data)def main():for i in range(1,30):urls = {'https://www.huya.com/cache.php?m=LiveList&do=getLiveListByPage&gameId=1&tagAll=0&page={}'.format(i),#lol'https://www.huya.com/cache.php?m=LiveList&do=getLiveListByPage&gameId=279&tagAll=0&page={}'.format(i),#绝地求生}for url in urls:parsing_json(url)if __name__ == '__main__':main()

效果图：

有疑问下方评论，我看到就回回复

爬取虎牙TV全站主播信息相关推荐

python怎么爬虎牙_Python爬虫：爬取虎牙星秀主播图片
动态爬取思路讲解 1.简单的爬虫只需要访问网站搜索栏处的url,就可以在开发者工具(F12)处,利用正则表达式.Xpath.css等进行定位并抓取数据: 2.虎牙星秀页面不同于简单的网页,随时都在更新 ...
python原生爬虫爬取熊猫TV LOL主播人气排行
本文采取phthon原生爬虫,没有采用常用的爬虫框架,比较适合新手练手. 首先进入熊猫TV英雄联盟主页----https://www.panda.tv/cate/lol?pdt=1.24.s1.2.4 ...
Pyhotn3，爬取B站up主的信息！
今天搞一下,B站UP主前100名的数据信息~ 不要多想,不要多问,纯粹为了技术,不为数据~ 说我的都信了!! 接下来,老规矩,上代码,看看怎么实现爬取B站up主的信息: # -*- coding:ut ...
Python爬虫：爬取某鱼颜值主播MM图片,你的最爱！？
一.准备 1.创建scrapy项目 scrapy startproject douyu cd douyu scrapy genspider spider "www.douyu.com&quo ...
Python爬虫：爬取某鱼颜值主播图片并保存到本地升级版！
一.准备 1.创建scrapy项目 scrapy startproject douyucd douyuscrapy genspider spider "www.douyu.com" ...
使用scrapy爬取手机版斗鱼主播的房间图片及昵称
目的:通过fiddler在电脑上对手机版斗鱼主播进行抓包,爬取所有主播的昵称和图片链接关于使用fiddler抓取手机包的设置: 把手机和装有fiddler的电脑处在同一个网段(同一个wifi),手机 ...
nodejs爬虫爬取喜马拉雅FM 指定主播歌单并下载
最近一直在学英语,因此写了个爬虫爬取歌单并下载,然后随时都可以听. GitHub地址:https://github.com/leeseean/nodejs-crawler. 页面分析要用爬虫下载音频 ...
python爬斗鱼直播房间名和主播名_斗鱼爬虫，爬取颜值频道的主播图片和名字
在斗鱼的界面中,如果滚动条没有拉下去,那么下面的图片都只是一条鱼的图片,所以要使浏览器自动拉动滚动条,可以用到python的selenium库, 1.配置浏览器要使用selenium,还需要安装 c ...
原生爬虫（爬取熊猫直播人气主播排名）
此代码未采用任何反爬虫策略 ''''This is a module '''import refrom urllib import request # 断点调试class Spider():'''Th ...

爬取虎牙TV全站主播信息

爬取虎牙TV全站主播信息相关推荐

最新文章

热门文章