python3爬取torrent种子链接实例

本文环境是python3,采用的是urllib,BeautifulSoup搭建。

说下思路，这个项目分为管理器，url管理器，下载器，解析器，html文件生产器。各司其职，在管理器进行调度。最后将解析到的种子连接生产html文件显示。当然也可以保存在文件。最后效果如图。

首先在管理器SpiderMain()这个类的构造方法里初始化下载器，解析器，html生产器。代码如下。

def__init__(self):self.urls = url_manager.UrlManager()self.downloader = html_downloader.HtmlDownloader()self.parser = html_parser.HtmlParser()self.outputer = html_outputer.HtmlOutputer()

然后在主方法里写入主连接并开始下载解析和输出。

if __name__ == '__main__':url = "http://www.btany.com/search/桃谷绘里香-first-asc-1"# 解决中文搜索问题 对于：？=不进行转义root_url = quote(url,safe='/:?=')obj_spider = SpiderMain()obj_spider.parser(root_url)

用下载器进行下载，解析器解析下载好的网页，最后输出。管理器的框架逻辑就搭建完毕

def parser(self, root_url):  html = self.downloader.download(root_url)  datas = self.parser.parserTwo(html)  self.outputer.output_html3(datas)

downloader下载器代码如下：

def download(self, chaper_url):if chaper_url is None:return Noneheaders = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}req = urllib.request.Request(url=chaper_url, headers=headers)response = urllib.request.urlopen(req)if response.getcode() != 200:return Nonereturn response.read()

headers是模仿浏览器的请求头。不然下载不到html文件。

解析器代码如下：

# 解析种子文件
def parserTwo(self,html):if html is None:returnsoup = BeautifulSoup(html,'html.parser',from_encoding='utf-8')res_datas = self._get_data(soup)return res_datas# 将种子文件的标题，磁力链接和迅雷链接进行封装
def _get_data(self,soup):res_datas = []all_data = soup.findAll('a',href=re.compile(r"/detail"))all_data2 = soup.findAll('a', href=re.compile(r"magnet"))all_data3 = soup.findAll('a',href=re.compile(r"thunder"))for i in range(len(all_data)):res_data = {}res_data['title'] = all_data[i].get_text()res_data['cl'] = all_data2[i].get('href')res_data['xl'] = all_data3[i].get('href')res_datas.append(res_data)return res_datas

通过分析爬下来的html文件,种子链接在a标签下。然后提取magnet和thunder下的链接。

最后输出器输出html文件，代码如下：

def __init__(self):self.datas = []def collect_data(self, data):if data is None:returnself.datas.append(data)
#输出表单
def output_html3(self,datas):fout = open('output.html', 'w', encoding="utf-8")fout.write("<html>")fout.write("<head><meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\"></head>")fout.write("<body>")fout.write("<table border = 1>")for data in datas:fout.write("<tr>")fout.write("<td>%s</td>" % data['title'])fout.write("<td>%s</td>" % data['cl'])fout.write("<td>%s</td>" % data['xl'])fout.write("</tr>")fout.write("</table>")fout.write("</body>")fout.write("</html>")fout.close()

非常感谢你的阅读
大学的时候选择了自学python，工作了发现吃了计算机基础不好的亏，学历不行这是没办法的事，只能后天弥补，于是在编码之外开启了自己的逆袭之路，不断的学习python核心知识，深入的研习计算机基础知识，整理好了，我放在我们的Python学习扣qun：774711191，如果你也不甘平庸，那就与我一起在编码之外，不断成长吧！

其实这里不仅有技术，更有那些技术之外的东西，比如，如何做一个精致的程序员，而不是“屌丝”，程序员本身就是高贵的一种存在啊，难道不是吗？[点击加入]想做你自己想成为高尚人，加油！

@本文来源于公众号：csdn2299，喜欢可以关注公众号程序员学府

python3爬取torrent种子链接实例相关推荐

python3爬取视频代码_Python爬虫视频以及使用python3爬取的实例
链接: https://pan.baidu.com/s/18iRD2I9t4xHxiSqoe-hFHg 密码: afaf 使用Python3爬取小说,代码看起来有点乱,下面有截图 import req ...
Python2 Python3 爬取赶集网租房信息,带源码分析
*之前偶然看了某个腾讯公开课的视频,写的爬取赶集网的租房信息,这几天突然想起来,于是自己分析了一下赶集网的信息,然后自己写了一遍,写完又用用Python3重写了一遍.之中也遇见了少许的坑.记一下.算是 ...
python实战-HTML形式爬虫-批量爬取电影下载链接
文章目录一.前言二.思路 1.网站返回内容 2.url分页结构 3.子页面访问形式 4.多种下载链接判断三.具体代码的实现四.总结一.前言喜欢看片的小伙伴,肯定想打造属于自己的私人影院 ...
python3爬取巨潮资讯网站年报数据
python3爬取巨潮资讯网站年报数据 2018年年底巨潮资讯http://www.cninfo.com.cn改版了,之前实习生从网上找的脚本不能用了,因此重新修改了下爬取脚本.最初脚本的原链接忘了, ...
Python3 爬取豆瓣电影信息
原文链接: Python3 爬取豆瓣电影信息上一篇: python3 爬取电影信息下一篇: neo4j 查询豆瓣api https://developers.douban.com/wiki/?t ...
python3爬取数据_python3爬取巨潮资讯网站年报数据
python3爬取巨潮资讯网站年报数据 2018年年底巨潮资讯http://www.cninfo.com.cn改版了,之前实习生从网上找的脚本不能用了,因此重新修改了下爬取脚本.最初脚本的原链接忘了, ...
用python3爬取百度首页
用python3读取百度首页代码爬取百度首页 import urllib.request import urlliburl="http://www.baidu.com/" ht ...
Selenium+Python3爬取微博我发出的评论信息
Selenium+Python3爬取微博我发出的评论信息需求代码注: 需求记录对话信息:对话文本.时间.用户.被回复链接.被回复用户.被回复文本. 将数据信息持久化保存,可选择截图. 代码 # ...
[python爬虫] 正则表达式使用技巧及爬取个人博客实例
这篇博客是自己<数据挖掘与分析>课程讲到正则表达式爬虫的相关内容,主要简单介绍Python正则表达式爬虫,同时讲述常见的正则表达式分析方法,最后通过实例爬取作者的个人博客网站.希望这篇基础 ...
python3爬取博客浏览量
爬取结果代码很简单: # encoding=utf8 import requests import re import time from bs4 import BeautifulSoupfirst ...

python3爬取torrent种子链接实例

python3爬取torrent种子链接实例相关推荐

最新文章

热门文章