爬取碧蓝航线wiki

经常上这个网站查找角色信息什么的而且非常想把喜欢的角色全部信息立绘什么的下载下来但是角色又太多所以想到了爬虫

1、对角色进行阵营分类，便于自己查找

2、对角色台词信息下载(个人爱好)

3、对角色正常立绘和Q版立绘下载

上代码:

import requests
import re
import os
import time#获取阵营名字
def get_camp(html):patten = re.findall(r'<ul><li><b><a href=".*?" title=".*?">(.*?)</a></b>.*?</li></ul>',html,re.S)return patten#创建相对于的阵营的文件夹
def make_camp_file(list1):camp_path = []for camp_name in list1:file_path = "C:\\Users\\16609\\Desktop\\blhxS\\" + camp_name[0]camp_path.append(file_path)if not os.path.exists(file_path):os.makedirs(file_path)return camp_path#按阵营分类提取出对应的角色的url
def get_Char_url(html):Char_url = []patten = re.findall(r'<p><a href="(.*?)" title=".*?">',html)for url in patten:url = url.replace(':',':')Char_url.append(url)return Char_url#人物Q版立绘
def get_Q_Pic(response,path):demo = re.findall(r'<div class="qchar-container" data-ship-name="(.*?)">(.*?)</div>', response.text, re.S)name = demo[0][0]Q_Lipainted = demo[0][1]Q_name = re.findall(r'alt="(.*?)"', Q_Lipainted)Q_url = re.findall(r'src="(.*?)"', Q_Lipainted)Name_Url = dict(zip(Q_name, Q_url))for i in Name_Url:url = Name_Url[i]Resp_pic = requests.get(url)Q_img = Resp_pic.contentwith open(path + i,'wb') as f:f.write(Q_img)# 人物信息
def get_info(response,path,file_name):Info = re.findall(r'<tr data-key=".*?">(.*)</tr>', response.text, re.S)Info_Speak = re.findall(r'<th>(.*?)</th>.*?data-lang="zh">.(.*?)</p>', Info[0], re.S)Info_Speak_dict = {}for i in Info_Speak:Info_Speak_dict[i[0]] = i[1]for i in Info_Speak_dict:if not os.path.exists(path):os.makedirs(path)with open(path + file_name + '.txt','a+',encoding='utf-8') as f:cont = i + ":" + Info_Speak_dict[i] + '\n'f.write(cont)#获取每个角色网页的html
def get_html(char_url,path):for url in char_url:response = Session.get(url)Name = re.findall(r'http://wiki.joyme.com/blhx/(.+)',url)file_path = path + '\\' + Name[0] + '\\'get_info(response,file_path,Name[0])get_pic(response,file_path)get_Q_Pic(response,file_path)#人物图片
def get_pic(response,path):Pic = re.findall(r'<div class="tab_con.*?" style=".*?">.*?<img alt="(.*?)" src="(.*?)".*?</div>', response.text,re.S)Pic_dict = {}for i in Pic:i = list(i)if i[0] == '':i[0] = 'Q_GIF.gif'Pic_dict[i[0]] = i[1]for i in Pic_dict:url = Pic_dict[i]LB_pic_resp = requests.get(url)LB_pic = LB_pic_resp.contentwith open(path + i,'wb') as f:f.write(LB_pic)#碧蓝航线wiki阵营分类网页
url = 'http://wiki.joyme.com/blhx/%E8%88%B0%E5%A8%98%E5%9B%BE%E9%89%B4'
Session = requests.session()
response = Session.get(url)#将网页以阵营分类进行截取
camp = re.split('<img alt="分割线.png"',response.text)
list1 = []
for camp_name in camp:#获取阵营的名字 并将阵营名放入list1列表中s = get_camp(camp_name)if s != []:list1.append(s)# 将阵营list列表传入创建对应名称的文件夹
camp_path = make_camp_file(list1)
# 去掉分割首页html中的第一段  因为这一段为网页顶部代码  没有人物名称及对应url
camp.remove(camp[0])
camp_char = []
for i in camp:Char_url = get_Char_url(i)camp_char.append(Char_url)i = 0
for url in camp_char:get_html(url,camp_path[i])i = i + 1

转载于:https://www.cnblogs.com/MaGnet/p/10542857.html

爬取碧蓝航线wiki相关推荐

python爬取碧蓝航线所有战舰头像
python爬取碧蓝航线所有战舰头像 import urllib import requests from bs4 import BeautifulSoup res = requests.get('h ...
Python随记（28）爬取碧蓝航线的立绘（狗头）
碧蓝的舰娘们好漂亮啊.....不如全部爬下来吧..主要是为了学习收藏 (狗头) 当然作为萌新方法可能不是很好,,, import requests from lxml import etree fr ...
简单python爬虫爬取游戏wiki立绘
简单python爬虫爬取游戏wiki立绘玩二次元手游是感叹美少女立绘真好看啊,可惜就是抽不到,于是看到b站wiki上有角色立绘,就写了个爬虫准备将立绘趴下来欣赏(舔). 本人爬虫的技术只算是初学,代 ...
碧蓝航线内部表情包（有爱自取）
预览下载 https://ghgxj.lanzous.com/i3hI9jva28f 使用该表情仅限电脑端QQ或TIM使用.解压后,打开QQ或TIM,双击.elf文件即可导入. 微信上也其实也可以 ...
技术图文：如何爬取一个地区的气象数据（上）？
背景架空线路主要指架空明线,架设在地面之上,是用绝缘子将输电导线固定在直立于地面的杆塔上以传输电能的输电线路.架设及维修比较方便,成本较低,但容易受到气象和环境(如大风.雷击.污秽.冰雪等)的影响而 ...
用python 爬取百度百科内容-爬虫实战(一) 用Python爬取百度百科
最近博主遇到这样一个需求:当用户输入一个词语时,返回这个词语的解释我的第一个想法是做一个数据库,把常用的词语和词语的解释放到数据库里面,当用户查询时直接读取数据库结果但是自己又没有心思做这样一个数 ...
[python] 常用正则表达式爬取网页信息及分析HTML标签总结
这篇文章主要是介绍Python爬取网页信息时,经常使用的正则表达式及方法.它是一篇总结性文章,实用性比较大,主要解决自己遇到的爬虫问题,也希望对你有所帮助~ 当然如果会Selenium基于自动化测试爬 ...
[python学习] 简单爬取维基百科程序语言消息盒
文章主要讲述如何通过Python爬取维基百科的消息盒(Infobox),主要是通过正则表达式和urllib实现:后面的文章可能会讲述通过BeautifulSoup实现爬取网页知识.由于这方面的文章还是 ...
python爬取豆瓣电影top250_Python爬虫 - scrapy - 爬取豆瓣电影TOP250
0.前言新接触爬虫,经过一段时间的实践,写了几个简单爬虫,爬取豆瓣电影的爬虫例子网上有很多,但都很简单,大部分只介绍了请求页面和解析部分,对于新手而言,我希望能够有一个比较全面的实例.所以找了很多实 ...

爬取碧蓝航线wiki

经常上这个网站查找角色信息什么的而且非常想把喜欢的角色全部信息立绘什么的下载下来但是角色又太多所以想到了爬虫

爬取碧蓝航线wiki相关推荐

最新文章

热门文章