python爬虫如何从一个页面进入另一个页面-Python爬虫怎么获取下一页的URL和网页内容？...

用BeautifulSoup爬取了第一页的内容，但是不知道剩下的页面怎么爬。

首页链接是长这样的：

http://gdemba.gicp.net:82/interunit/ListMain.asp?FirstEnter=Yes&Style=0000100003&UID={A270A117-76A7-4059-AB8F-B11AC370240B}&TimeID=39116.81

通过点击一个“后翻一页”的gif图片按钮跳转到下一页：

第二页的链接是长这样的：

http://gdemba.gicp.net:82/interunit/ListMain.asp?Keywords=&Style=0000100003&DateLowerLimit=

2000-1-1&DateUpperLimit= 2015-9-11&DateLowerLimitModify=

2000-1-1&DateUpperLimitModify=

2015-9-11&Classification1=0&Classification2=0&Classification3=0&Classification4=0&Classification6=0&Classification7=0&Classification8=0&Class=&Department=001&CreatorName=&CreatorTypeID=&UID={A270A117-76A7-4059-AB8F-B11AC370240B}&SortField=&CustormCondition=&PageNo=2&TimeID=39453.14

这里怎么观察出URL的规律呢？

那个“后翻一页”的链接如下：

οnclick=”javascript:window.location.href =

"ListMain.asp?Keywords=&Style=0000100003&DateLowerLimit=

2000-1-1&DateUpperLimit= 2015-9-11&DateLowerLimitModify=

2000-1-1&DateUpperLimitModify=

; “>

WIDTH=”16″ HEIGHT=”16″>

要怎么获取下一页的URL和网页内容呢？

有需要更多信息我可以补充上来。

补充代码：

60import urllib

import urllib2

import cookielib

import re

import csv

import codecs

from bs4 import BeautifulSoup

cookie = cookielib.CookieJar()

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))

postdata = urllib.urlencode({

'LoginName':'02',

'Password':'dc20150820if'

})

req = urllib2.Request(

url = 'http://gdemba.gicp.net:82/VerifyUser.asp',

data = postdata

)

result = opener.open(req)

for item in cookie:

print 'Cookie：Name = '+item.name

print 'Cookie：Value = '+item.value

result = opener.open('http://gdemba.gicp.net:82/interunit/ListMain.asp?FirstEnter=Yes&Style=0000100003&UID={4C10B953-C0F3-4114-8341-81EF93DE7C55}&TimeID=49252.53')

info = result.read()

soup = BeautifulSoup(info, from_encoding="gb18030")

table = soup.find(id='Table11')

print table

client = ""

tag = ""

tel = ""

catalogue = ""

region = ""

client_type = ""

email = ""

creater = ""

department = ""

action = ""

f = open('table.csv', 'w')

csv_writer = csv.writer(f)

td = re.compile('td')

for row in table.find_all("tr"):

cells = row.find_all("td")

if len(cells) == 10:

client = cells[0].text

tag = cells[1].text

tel = cells[2].text

catalogue = cells[3].text

region = cells[4].text

client_type = cells[5].text

email = cells[6].text

creater = cells[7].text

department = cells[8].text

action = cells[9].text

csv_writer.writerow([x.encode('utf-8') for x in [client, tag, tel, catalogue, region, client_type, email, creater, department, action]])

f.close()

1<span>

里面不是有

1οnclick="javascript:window.location.href=xxxx"

吗？

这句就是跳转啊，你给的例子里就是跳转到

1List.asp?Keywords=....

写爬虫的话，建议你学学HTML和JS。

更新：抓取下一页的URL

4next_page_tag = soup.find(title='后翻一页')

next_page_onclick = next_page_tag['onclick']

next_page_url = re.search("'(.+)'", next_page_onclick).group(1)

next_page_url = 'http://gdemba.gicp.net:82/interunit/' + next_page_url

PageNo就是页码啊！！

就像楼上说的，那个onclick里面已经有了下一页的地址，使用beautifulsoup提取出来，加上host，应该就可以了

我是先抓到总的页数，然后用while循环来做。。。。

python爬虫如何从一个页面进入另一个页面-Python爬虫怎么获取下一页的URL和网页内容？...相关推荐

python爬虫下一页_python爬虫怎么获取下一页的url
如何用python实现爬虫抓取网页时自动翻页在你没有任何喜欢的人的时候,你过得是最轻松快乐的,尽管偶尔会觉得孤单了点. 小编把网页的第一篇内容抓取好了,但是用python怎么抓取后面的又如何停止那天 ...
python爬虫点击下一页_python爬虫实现获取下一页代码
我们首先来看下实例代码: from time import sleep import faker import requests from lxml import etree fake = faker ...
scrapy配合selenium爬取需要反复操作同一个动态页面的方法，解决点击“下一页”但是URL相同的网站
首先这种网站一定要设置爬取的速率,目标网站用这种方式写入网页估计是被爬虫搞怕了,大概率有更简单的反爬方法,因此爬取速率要注意.博主要爬的网站是一个电影网站:艺恩,点击下一页可以看到其实执行了一个js拿 ...
python爬虫获取下一页url_Python爬虫获取页面所有URL链接过程详解
如何获取一个页面内所有URL链接?在python中可以使用urllib对网页进行爬取,然后利用Beautiful Soup对爬取的页面进行解析,提取出所有的URL. 什么是Beautiful Soup ...
python爬虫获取下一页_Python Scrapy 自动抓取下一页内容
最近在学下Scrapy,抓取下一页的时候遇到了问题 import scrapy from crawlAll.items import CrawlallItem class ToutiaoEssayJo ...
用pycharm进行python爬虫的步骤_使用Pycharm写一个网络爬虫
在初步了解网络爬虫之后,我们接下来就要动手运用Python来爬取网页了. 我们知道,网络爬虫应用一般分为两个步骤: 1.通过网页链接获取内容: 2.对获得的网页内容进行处理这两个步骤需要分别使用不同 ...
爬虫分页爬取猎聘_想把python爬虫了解透彻吗？一起盘它 ! !
原理传统的爬虫程序从初始web页面的一个或多个url开始,并获取初始web页面的url.在抓取web页面的过程中,它不断地从当前页面中提取新的url并将其放入队列中,直到满足系统的某些停止条件.聚焦 ...
爬虫快速入门（一）：静态页面爬取
在这个数据为王的时代,掌握一手好的模型炼丹技巧还远远不够,有时候就是那么一小撮数据,就会对模型性能产生至关重要的影响.虽说大一点的公司一般都有专门负责爬虫的同学,但求人不如求己,每一位炼丹师都应该掌握 ...
python网络爬虫网易云音乐_手把手教你写网络爬虫（1）：网易云音乐歌单
大家好,<手把手教你写网络爬虫>连载开始了!在笔者的职业生涯中,几乎没有发现像网络爬虫这样的编程实践,可以同时吸引程序员和门外汉的注意.本文由浅入深的把爬虫技术和盘托出,为初学者提供一种轻 ...

python爬虫如何从一个页面进入另一个页面-Python爬虫怎么获取下一页的URL和网页内容？...

python爬虫如何从一个页面进入另一个页面-Python爬虫怎么获取下一页的URL和网页内容？...相关推荐

最新文章

热门文章