python 获取当前网页_你好，想知道python scrapy 如何获取当前页面url?

这里我不解释过多，直接上代码吧：

# -*- coding: utf-8 -*-

from scrapy.spider import BaseSpider

from scrapy.selector import HtmlXPathSelector

from scrapy.utils.url import urljoin_rfc

from scrapy.http import Request

from datacrawler.items import bbsItem

class bbsSpider(BaseSpider):

name = "bbs"

allowed_domains = ["bbs.nju.edu.cn"]

start_urls = [""]

def parseContent(self,content):

#content = content.encode('utf8')

authorIndex =content.index(unicode('信区','gbk'))

author = content[4:authorIndex-2]

boardIndex = content.index(unicode('标题','gbk'))

board = content[authorIndex+4:boardIndex-2]

timeIndex = content.index(unicode('南京大学小百合站 (','gbk'))

time = content[timeIndex+10:timeIndex+34]

content = content[timeIndex+38:]

return (author,board,time,content)

def parse2(self,response):

hxs =HtmlXPathSelector(response)

item = response.meta['item']

items = []

content = hxs.select('/html/body/center/table[1]//tr[2]/td/textarea/text()').extract()[0]

parseTuple = self.parseContent(content)

item['author'] = parseTuple[0]

item['board'] =parseTuple[1]

item['time'] = parseTuple[2]

item['content'] = parseTuple[3]

return item

def parse(self, response):

hxs = HtmlXPathSelector(response)

items = []

title= hxs.select('/html/body/center/table/tr[position()>1]/td[3]/a/text()').extract()

url= hxs.select('/html/body/center/table/tr[position()>1]/td[3]/a/@href').extract()

for i in range(0, 10):

item = bbsItem()

item['link'] = urljoin_rfc('', url[i])

item['title'] = title[i][:-1]

items.append(item)

for item in items:

yield Request(item['link'],meta={'item':item},callback=self.parse2)

python 获取当前网页_你好，想知道python scrapy 如何获取当前页面url?相关推荐

用python写web网页_从零开始，使用python快速开发web站点（1） | 学步园
环境:ubuntu 12.04 python版本: 2.73 ok,首先,既然是从零开始,我们需要的是一台可以运行的python的计算机环境,并且假设你已经安装好了python, 然后,既然是快速开 ...
北京学python去哪里好_北京想学习Python应该去哪里好
统一 Executor 和 ParallelExecutor 接口,用户只需通过 CompiledProgram 将单卡模型转化多卡模型,并利用 Executor 进行训练或者预测. 正式发布 Ana ...
python怎么输入分数_我想在python中的函数中输入一个分数
您可以使用input()内置来提示用户输入值.在python2.x上,使用raw_input().在不必打印结果,只需返回即可.在def simp_frac(num,den): smallest = ...
腾讯技术官发布Python零基础就业宝典，想学习Python的朋友有福了
近几年来,互联网行业变化非常大,除了龙头企业的更替,"裁员潮""失业潮"也不断掀起,尤其是对于年纪太大的程序员真的是不太友好.但是,根据数据统计表明,自2018 ...
python和易语言抓包_抓包能获取到网页源码，用易语言却获取不到，如何解决？...
[Asm] 纯文本查看复制代码.版本 2 .支持库 spec .程序集窗口程序集_启动窗口 .子程序 _按钮1_被单击 .局部变量 a, 文本型 .局部变量网址, 文本型 .局部变量 b, 文本 ...
python selenium爬虫需要账号和密码登陆的网页_如何使用selenium和requests组合实现登录页面...
这篇文章主要介绍了如何使用selenium和requests组合实现登录页面,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下一.在这里seleniu ...
python 做网页_听过最近Python过气了？
Python过气了? 怎么可能?! 会Python的人,工作都不会太差.作为职场"新晋网红",Python在各行各业中扮演着越来越重要的角色. 曾经Excel在金融.数据岗位&qu ...
有没有可以刷python题的软件_你想要的Python面试都在这里了【315+道题】
第一部分 Python基础篇(80题) 1.为什么学习Python? 1.python是脚本语言,作为程序员至少应该掌握一本通用脚本语言,因为脚本语言与编译语言的开发测试过程不同,可以极大的提高编程效 ...
python生活中哪些运用_【想把python运用在实际生活中?那么python查询价格方法可以帮助你】- 环球网校...
[摘要]通过本次课程可以让python学员了解一下python查询价格方法,对代码编程有个感性的认知.也好让大家能够理性选择,不要盲目跟从,选择适合自己当前阶段的学习内容,循序渐进,以兴趣自我探索为向 ...

python 获取当前网页_你好，想知道python scrapy 如何获取当前页面url?

python 获取当前网页_你好，想知道python scrapy 如何获取当前页面url?相关推荐

最新文章

热门文章