Scrapy爬虫爬取电影天堂

Scrapy CrawlSpider爬取

目标网址：http://www.dytt8.net
创建项目：scrapy startproject <爬虫项目文件的名字>
生成 CrawlSpider 命令：scrapy genspider -t crawl <爬虫名字> <爬虫域名>
终端运行：scrapy crawl <爬虫的名字>
Python操作Mysql数据库操作： https://www.runoob.com/python/python-reg-expressions.html

爬虫文件

# -*- coding: utf-8 -*-
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
import reclass DySpider(CrawlSpider):name = 'dy'allowed_domains = ['www.dytt8.net']start_urls = ['http://www.dytt8.net/']rules = (Rule(LinkExtractor(allow=r'dytt8\.net/html/gndy/dyzz/\d+/\d+\.html'), callback='parse_item'),)def parse_item(self, response):item = {}item['title'] = response.xpath('//div[@class="title_all"]/h1/font/text()').get()item['datetime'] = response.xpath('//div[@class="co_content8"]/ul/text()').get()item['datetime'] = re.sub('\r\n', '', item['datetime'])item['download'] = response.xpath('//div[@class="co_area2"]//tbody//td/a/@href').get()yield item

pipelines.py 管道文件

# -*- coding: utf-8 -*-# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
import pymysqlclass MoviesPipeline(object):def __init__(self):self.db = pymysql.connect('127.0.0.1', 'root', 'root', 'spider')self.cursor = self.db.cursor()def process_item(self, item, spider):sql = 'create table if not exists movies (title varchar (255) not null ,datetime varchar (255) not null , download varchar (255) not  null )'sql_insert = 'insert into movies(title, datetime, download) value (%s, %s, %s)'args = (item['title'], item['datetime'], item['download'])self.cursor.execute(sql)self.cursor.execute(sql_insert, args)return itemdef close_spider(self, spider):self.db.commit()self.db.close()

运行结果：

Scrapy爬虫爬取电影天堂相关推荐

python爬电影_使用Python多线程爬虫爬取电影天堂资源
最近花些时间学习了一下Python,并写了一个多线程的爬虫程序来获取电影天堂上资源的迅雷下载地址,代码已经上传到GitHub上了,需要的同学可以自行下载.刚开始学习python希望可以获得宝贵的意见. ...
python爬电影天堂_python爬虫爬取电影天堂电影
python爬虫爬取电影天堂电影?本项目实现一个简单的爬虫,通过requests和BeautifulSoup爬取电影天堂电影信息,包括片名.年代.产地.类别.语言.海报链接和视频链接等内容.pytho ...
scrapy初步-简单静态爬虫(爬取电影天堂所有电影)
之前用java写过一个简单的爬取电影天堂信息的爬虫,后来发现用python写这种简单的爬虫程序更简单,异步网络框架在不使用多线程和多进程的情况下也能增加爬取的速度,目前刚开始学scrapy,用这个写了 ...
多线程爬虫爬取电影天堂资源
先来简单介绍一下,网络爬虫的基本实现原理吧.一个爬虫首先要给它一个起点,所以需要精心选取一些URL作为起点,然后我们的爬虫从这些起点出发,抓取并解析所抓取到的页面,将所需要的信息提取出来,同时获得的新 ...
node.js爬虫爬取电影天堂，实现电视剧批量下载。
2019独角兽企业重金招聘Python工程师标准>>> ###一.项目描述引言:在电影天堂下电视剧的下伙伴有木有发现,它没有提供批量下载功能,美剧英剧还好,10集左右,我就多点 ...
爬虫学习（一）---爬取电影天堂下载链接
欢迎加入python学习交流群 667279387 爬虫学习爬虫学习(一)-爬取电影天堂下载链接爬虫学习(二)–爬取360应用市场app信息主要利用了python3.5 requests,Bea ...
python3爬虫：爬取电影天堂电影信息
python3爬虫:爬取电影天堂电影信息 #爬取电影天堂电影信息 #爬取电影天堂电影信息 #爬取电影天堂电影信息 from lxml import etree import requestsBASE_ ...
#爬取电影天堂的磁力链接#
#爬取电影天堂的磁力链接百度百科对网络爬虫的定义:网络爬虫(又被称为网页蜘蛛,网络机器人,在FOAF社区中间,更经常的称为网页追逐者),是一种按照一定的规则,自动地抓取万维网信息的程序或者脚本.另外 ...
Python，爬取电影天堂，你觉得怎么样？
一.爬虫的重要性: 如果把互联网比喻成一个蜘蛛网,那么Spider就是在网上爬来爬去的蜘蛛.网络蜘蛛通过网页的链接地址来寻找网页,从网站某一个页面(通常是首页)开始,读取网页的内容,找到在网页中的其它 ...

Scrapy爬虫爬取电影天堂

Scrapy CrawlSpider爬取

Scrapy爬虫爬取电影天堂相关推荐

最新文章

热门文章