scrapy 抓取网页并存入 mongodb的完整示例：

https://github.com/rmax/scrapy-redis

https://github.com/geekan/scrapy-examples # Multifarious(多样的) Scrapy examples.

https://github.com/DormyMo/scrappy # scrapy best practice，这个库用了 https://github.com/rmax/scrapy-redis，但用的不是最新版本

https://realpython.com/blog/python/web-scraping-with-scrapy-and-mongodb/

https://realpython.com/blog/python/web-scraping-and-crawling-with-scrapy-and-mongodb/

https://github.com/sebdah/scrapy-mongodb

https://github.com/xiyouMc/WebHubBot

https://github.com/Chyroc/WechatSogou # 基于搜狗微信搜索的微信公众号爬虫接口

https://github.com/gnemoug/distribute_crawler # 使用scrapy,redis, mongodb,graphite实现的一个分布式网络爬虫,底层存储mongodb集群,分布式使用redis实现,爬虫状态显示使用graphite实现

https://github.com/aivarsk/scrapy-proxies # Random proxy middleware for Scrapy

https://github.com/scrapinghub/portia # 可视化界面的 scrapy

HttpUnit：是一个集成测试工具，主要关注Web应用的测试，提供的帮助类让测试者可以通过Java类和服务器进行交互，并且将服务器端的响应当作文本或者DOM对象进行处理。HttpUnit还提供了一个模拟Servlet容器，让你可以不需要发布Servlet，就可以对Servlet的内部代码进行测试。

Selenium WebDriver：是一个可以模拟浏览器(会对html文本进行渲染执行，即会执行文本中的js脚本)执行的测试框架，还可以抓取里面的DOM元素，它本身已包含 HttpUnit。

Jsoup：只是获取网页的静态html文本，并不渲染，因此不会执行文本中的js。如果只是抓取网页文本中的元素，可以使用jsoup。

selendroid: Selendroid 是一个 Android 原生应用的 UI 自动化测试框架。测试使用 Selenium 2 客户端 API 编写。Selendroid 可以在模拟器和实际设备上使用，也可以集成网格节点作为缩放和并行测试。使用Selenium还可以获取节点，填充表单，选择元素等交互操作。 http://selendroid.io; https://github.com/selendroid/selendroid .

如果不使用浏览器模拟方式抓取网页，建议使用scrapy + BeautifulSoup4 作为爬虫和分析工具，Scrapy原生不支持js渲染，需要单独下载[scrapy-splash](GitHub - scrapy-plugins/scrapy-splash: Scrapy+Splash for JavaScript integration)。 #### 如何用 PyCharm 调试 scrapy 项目，详见我的另一篇文章。

不过还有更高级的用法，用 scrapy + Selenium+berserkJS+BeautifulSoup4 一起可以拼凑成一个动态爬虫,实现抓取、渲染、页面自动交互的功能，但不建议使用，太难集成，用上面说的scrapy-splash足够。

Selenium针对android系统也推出了android版的 AndroidDriver，可以区看看。但似乎已经停止更新了？不能确定。

selenium自己不带浏览器，它需要与第三方浏览器结合一起使用。这里使用phantomjs的工具代替真实的浏览器。但是有一个叫berserkJS的(是基于Phantomjs的改进版本)。

PhantomJS 是一个基于 WebKit 的服务器端 JavaScript API。它全面支持web而不需浏览器支持，其快速，原生支持各种Web标准： DOM 处理, CSS 选择器, JSON, Canvas, 和 SVG。 PhantomJS 可以用于页面自动化，网络监测，网页截屏，以及无界面测试等。

把selenium和phantomjs结合在一起，就可以运行一个非常强大的爬虫了，可以处理cookie，js，header，以及任何需要你做的事。

安装：

selenium有Python库，可以用pip等安装；phantomjs是一个功能完善的“无头“浏览器，并非一个python库，所以它不需要想python的其他库一样安装，也不能用pip安装。

有人问，为什么不直接用浏览器而用一个没界面的 PhantomJS 呢？答案是：效率高！

安装selenium 和 phantomjs:

$ pip install selenium

然后从这里( http://phantomjs.org/download.html ) 下载 phantomjs，然后继续阅读下面的文档查看怎么使用它：

如何在python中使用phantomjs:

https://stackoverflow.com/questions/13287490/is-there-a-way-to-use-phantomjs-in-python

The easiest way to use PhantomJS in python is via Selenium. The simplest installation method is

Install NodeJS
Using Node's package manager install phantomjs: npm -g install phantomjs-prebuilt
install selenium (in your virtualenv, if you are using that)

After installation, you may use phantom as simple as:

from selenium import webdriverdriver = webdriver.PhantomJS() # or add to your PATH
driver.set_window_size(1024, 768) # optional
driver.get('https://google.com/')
driver.save_screenshot('screen.png') # save a screenshot to disk
sbtn = driver.find_element_by_css_selector('button.gbqfba')
sbtn.click()

If your system path environment variable isn't set correctly, you'll need to specify the exact path as an argument to webdriver.PhantomJS(). Replace this:

driver = webdriver.PhantomJS() # or add to your PATH

... with the following:

driver = webdriver.PhantomJS(executable_path='/usr/local/lib/node_modules/phantomjs/lib/phantom/bin/phantomjs')

References:

http://selenium-python.readthedocs.org/en/latest/api.html
How do I set a proxy for phantomjs/ghostdriver in python webdriver?
http://python.dzone.com/articles/python-testing-phantomjs

我自己的一段使用PhantomJs的example代码：

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keysbrowser = webdriver.PhantomJS(executable_path='/home/hzh/hzh/soft/phantomjs/bin/phantomjs')
browser.set_window_size(1120, 720)
browser.get("https://baidu.com/")browser.find_element_by_xpath(".//*[@id='kw']").send_keys("hzh")
# browser.find_element_by_xpath(".//*[@id='kw']").send_keys(Keys.ENTER)
browser.find_element_by_xpath(".//*[@id='su']").click()delay = 5 # seconds
try:myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.XPATH, ".//*[@id='help']/a[1]")))print("success get page")
except TimeoutException:print("Loading took too much time!")print(browser.current_url)
browser.save_screenshot('/home/hzh/screen.png')browser.quit()

修改 PhantomJS 的 user agent：

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keysfrom selenium import webdriverdef init_phantomjs_driver(*args, **kwargs):headers = { 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Language':'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',"Accept-Encoding": "gzip",'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36','Connection': 'keep-alive'}for key, value in enumerate(headers):webdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.customHeaders.{}'.format(key)] = valuewebdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.settings.userAgent'] = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'driver =  webdriver.PhantomJS(executable_path='/home/hzh/hzh/soft/phantomjs/bin/phantomjs')driver.set_window_size(1120, 720)return driverservice_args = ['--proxy=127.0.0.1:9999','--proxy-type=http','--ignore-ssl-errors=true']
browser = init_phantomjs_driver(service_args=service_args)
browser.get("https://www.huobi.com/")# browser.find_element_by_xpath(".//*[@id='kw']").send_keys("hzh")
# browser.find_element_by_xpath(".//*[@id='kw']").send_keys(Keys.ENTER)
# browser.find_element_by_xpath(".//*[@id='su']").click()
delay = 5 # seconds
try:myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.XPATH, ".//*[@id='doc_body']/div[7]/div/div[1]/div[1]")))print("success get page")
except TimeoutException:print("Loading took too much time!")print(browser.current_url)
browser.save_screenshot('/home/hzh/screen.png')browser.quit()

selenium 使用 firefox:

1,  Download geckodriver
2,  Copy geckodriver in /usr/local/bin

然后这样使用：

from selenium.webdriver.common.desired_capabilities import DesiredCapabilitiesfirefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
firefox_capabilities['binary'] = '/home/hzh/hzh/soft/firefox'

profile = webdriver.FirefoxProfile('/home/hzh/.mozilla/firefox/f3dcxoyp.default')profile.add_extension("/home/hzh/.mozilla/firefox/f3dcxoyp.default/extensions/xpath_finder@xpath_finder.com.xpi")profile.add_extension("/home/hzh/.mozilla/firefox/f3dcxoyp.default/extensions/FireXPath@pierre.tholence.com.xpi")profile.add_extension("/home/hzh/.mozilla/firefox/f3dcxoyp.default/extensions/firefinder@robertnyman.com.xpi")driver = webdriver.Firefox(firefox_profile=profile, capabilities=firefox_capabilities)
driver = webdriver.Firefox(capabilities=firefox_capabilities)

selenium 使用 firefox 的多tab功能(建议用第一种方法，第二种方法没有验证过)：

1、可以这样：

test_link = browser.find_element_by_xpath(".//*[@id='doc_head']/div/div[3]/ul/li[1]/a")
# Save the window opener (current window, do not mistaken with tab... not the same)
main_window = browser.current_window_handle
time.sleep(2)# Open the link in a new tab by sending key strokes on the element
# Use: Keys.CONTROL + Keys.SHIFT + Keys.RETURN to open tab on top of the stack
test_link.send_keys(Keys.CONTROL + Keys.RETURN)  # 在某个连接上使用 ctrl+enter 键在新的tab中打开该链接
time.sleep(2)# Get the list of window handles
tabs = browser.window_handles
print(len(tabs))
# Use the list of window handles to switch between windows
browser.switch_to_window(tabs[1])
test_link2 = browser.find_element_by_xpath(".//*[@id='doc_body']/div[4]/div[2]/div[1]/h2")
print(test_link2.text)
time.sleep(2)# Switch back to original window
browser.switch_to_window(main_window)

2、也可以这样：

browser = webdriver.Firefox()
browser.get('https://www.google.com?q=python#q=python')
first_result = ui.WebDriverWait(browser, 15).until(lambda browser: browser.find_element_by_class_name('rc'))
first_link = first_result.find_element_by_tag_name('a')# Save the window opener (current window, do not mistaken with tab... not the same)
main_window = browser.current_window_handle# Open the link in a new tab by sending key strokes on the element
# Use: Keys.CONTROL + Keys.SHIFT + Keys.RETURN to open tab on top of the stack
first_link.send_keys(Keys.CONTROL + Keys.RETURN)                # 再某个连接上使用 ctrl+enter 键在新的tab中打开该链接# Switch tab to the new tab, which we will assume is the next one on the right
browser.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.TAB)       # 在第一个窗口上使用 ctrl+tab 键切换到下一个tab# Put focus on current window which will, in fact, put focus on the current visible tab
browser.switch_to_window(main_window)           # 再切换回第一个tab# do whatever you have to do on this page, we will just got to sleep for now
sleep(2)# Close current tab
browser.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 'w')      # 关闭这个tab# Put focus on current window which will be the window opener
browser.switch_to_window(main_window)

如果CTRL+W不能关闭tab的话，可以这样：

curWindowHndl = browser.current_window_handle
elem.send_keys(Keys.CONTROL + Keys.ENTER) #open link in new tab keyboard shortcut
sleep(2) #wait until new tab finishes loading
browser.switch_to_window(browser.window_handles[1]) #assuming new tab is at index 1
browser.close() #closes new tab
browser.switch_to_window(curWindowHndl)

scrapy-splash的使用

http://scrapy-cookbook.readthedocs.io/zh_CN/latest/scrapy-12.html

前面我们介绍的都是去抓取静态的网站页面，也就是说我们打开某个链接，它的内容全部呈现出来。但是如今的互联网大部分的web页面都是动态的，经常逛的网站例如京东、淘宝等，商品列表都是js，并有Ajax渲染，下载某个链接得到的页面里面含有异步加载的内容，这样再使用之前的方式我们根本获取不到异步加载的这些网页内容。

使用Javascript渲染和处理网页是种非常常见的做法，如何处理一个大量使用Javascript的页面是Scrapy爬虫开发中一个常见的问题，这篇文章将说明如何在Scrapy爬虫中使用scrapy-splash来处理页面中得Javascript。

scrapy-splash简介

scrapy-splash利用Splash将javascript和Scrapy集成起来，使得Scrapy可以抓取动态网页。

Splash是一个javascript渲染服务，是实现了HTTP API的轻量级浏览器，底层基于Twisted和QT框架，Python语言编写。所以首先你得安装Splash实例

安装docker

官网建议使用docker容器安装方式Splash。那么首先你得先安装docker

参考官方安装文档，这里我选择Ubuntu 12.04 LTS版本安装

升级内核版本，docker需要3.13内核

$ sudo apt-get update
$ sudo apt-get install linux-image-generic-lts-trusty
$ sudo reboot

安装CA认证

$ sudo apt-get install apt-transport-https ca-certificates

增加新的GPGkey

$ sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D

打开/etc/apt/sources.list.d/docker.list，如果没有就创建一个，然后删除任何已存在的内容，再增加下面一句

deb https://apt.dockerproject.org/repo ubuntu-precise main

更新APT

$ sudo apt-get update
$ sudo apt-get purge lxc-docker
$ apt-cache policy docker-engine

安装

$ sudo apt-get install docker-engine

启动docker服务

$ sudo service docker start

验证是否启动成功

$ sudo docker run hello-world

上面这条命令会下载一个测试镜像并在容器中运行它，它会打印一个消息，然后退出。

安装Splash

拉取镜像下来

$ sudo docker pull scrapinghub/splash

启动容器

$ sudo docker run -p 5023:5023 -p 8050:8050 -p 8051:8051 scrapinghub/splash

现在可以通过0.0.0.0:8050(http),8051(https),5023 (telnet)来访问Splash了。

安装scrapy-splash

使用pip安装

$ pip install scrapy-splash

配置scrapy-splash

在你的scrapy工程的配置文件settings.py中添加

SPLASH_URL = 'http://192.168.203.92:8050'

添加Splash中间件，还是在settings.py中通过DOWNLOADER_MIDDLEWARES指定，并且修改HttpCompressionMiddleware的优先级

DOWNLOADER_MIDDLEWARES = {'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, }

默认情况下，HttpProxyMiddleware的优先级是750，要把它放在Splash中间件后面

设置Splash自己的去重过滤器

DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'

如果你使用Splash的Http缓存，那么还要指定一个自定义的缓存后台存储介质，scrapy-splash提供了一个scrapy.contrib.httpcache.FilesystemCacheStorage的子类

HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'

如果你要使用其他的缓存存储，那么需要继承这个类并且将所有的scrapy.util.request.request_fingerprint调用替换成scrapy_splash.splash_request_fingerprint

使用scrapy-splash

SplashRequest

最简单的渲染请求的方式是使用scrapy_splash.SplashRequest，通常你应该选择使用这个

yield SplashRequest(url, self.parse_result, args={ # optional; parameters passed to Splash HTTP API 'wait': 0.5, # 'url' is prefilled from request url # 'http_method' is set to 'POST' for POST requests # 'body' is set to request body for POST requests }, endpoint='render.json', # optional; default is render.html splash_url='<url>', # optional; overrides SPLASH_URL slot_policy=scrapy_splash.SlotPolicy.PER_DOMAIN, # optional )

另外，你还可以在普通的scrapy请求中传递splash请求meta关键字达到同样的效果

yield scrapy.Request(url, self.parse_result, meta={ 'splash': { 'args': { # set rendering arguments here 'html': 1, 'png': 1, # 'url' is prefilled from request url # 'http_method' is set to 'POST' for POST requests # 'body' is set to request body for POST requests }, # optional parameters 'endpoint': 'render.json', # optional; default is render.json 'splash_url': '<url>', # optional; overrides SPLASH_URL 'slot_policy': scrapy_splash.SlotPolicy.PER_DOMAIN, 'splash_headers': {}, # optional; a dict with headers sent to Splash 'dont_process_response': True, # optional, default is False 'dont_send_headers': True, # optional, default is False 'magic_response': False, # optional, default is True } })

Splash API说明，使用SplashRequest是一个非常便利的工具来填充request.meta['splash']里的数据

meta[‘splash’][‘args’] 包含了发往Splash的参数。
meta[‘splash’][‘endpoint’] 指定了Splash所使用的endpoint，默认是render.html
meta[‘splash’][‘splash_url’] 覆盖了settings.py文件中配置的Splash URL
meta[‘splash’][‘splash_headers’] 运行你增加或修改发往Splash服务器的HTTP头部信息，注意这个不是修改发往远程web站点的HTTP头部
meta[‘splash’][‘dont_send_headers’] 如果你不想传递headers给Splash，将它设置成True
meta[‘splash’][‘slot_policy’] 让你自定义Splash请求的同步设置
meta[‘splash’][‘dont_process_response’] 当你设置成True后，SplashMiddleware不会修改默认的scrapy.Response请求。默认是会返回SplashResponse子类响应比如SplashTextResponse
meta[‘splash’][‘magic_response’] 默认为True，Splash会自动设置Response的一些属性，比如response.headers,response.body等

如果你想通过Splash来提交Form请求，可以使用scrapy_splash.SplashFormRequest，它跟SplashRequest使用是一样的。

Responses

对于不同的Splash请求，scrapy-splash返回不同的Response子类

SplashResponse 二进制响应，比如对/render.png的响应
SplashTextResponse 文本响应，比如对/render.html的响应
SplashJsonResponse JSON响应，比如对/render.json或使用Lua脚本的/execute的响应

如果你只想使用标准的Response对象，就设置meta['splash']['dont_process_response']=True

所有这些Response会把response.url设置成原始请求URL(也就是你要渲染的页面URL)，而不是Splash endpoint的URL地址。实际地址通过response.real_url得到

Session的处理

Splash本身是无状态的，那么为了支持scrapy-splash的session必须编写Lua脚本，使用/execute

function main(splash)splash:init_cookies(splash.args.cookies) -- ... your script return { cookies = splash:get_cookies(), -- ... other results, e.g. html } end

而标准的scrapy session参数可以使用SplashRequest将cookie添加到当前Splash cookiejar中

使用实例

接下来我通过一个实际的例子来演示怎样使用，我选择爬取京东网首页的异步加载内容。

京东网打开首页的时候只会将导航菜单加载出来，其他具体首页内容都是异步加载的，下面有个”猜你喜欢”这个内容也是异步加载的，我现在就通过爬取这个”猜你喜欢”这四个字来说明下普通的Scrapy爬取和通过使用了Splash加载异步内容的区别。

首先我们写个简单的测试Spider，不使用splash：

class TestSpider(scrapy.Spider): name = "test" allowed_domains = ["jd.com"] start_urls = [ "http://www.jd.com/" ] def parse(self, response): logging.info(u'---------我这个是简单的直接获取京东网首页测试---------') guessyou = response.xpath('//div[@id="guessyou"]/div[1]/h2/text()').extract_first() logging.info(u"find：%s" % guessyou) logging.info(u'---------------success----------------')

然后运行结果：

2016-04-18 14:42:44 test_spider.py[line:20] INFO ---------我这个是简单的直接获取京东网首页测试---------
2016-04-18 14:42:44 test_spider.py[line:22] INFO find：None
2016-04-18 14:42:44 test_spider.py[line:23] INFO ---------------success----------------

我找不到那个”猜你喜欢”这四个字

接下来我使用splash来爬取

import scrapy
from scrapy_splash import SplashRequest class JsSpider(scrapy.Spider): name = "jd" allowed_domains = ["jd.com"] start_urls = [ "http://www.jd.com/" ] def start_requests(self): splash_args = { 'wait': 0.5, } for url in self.start_urls: yield SplashRequest(url, self.parse_result, endpoint='render.html', args=splash_args) def parse_result(self, response): logging.info(u'----------使用splash爬取京东网首页异步加载内容-----------') guessyou = response.xpath('//div[@id="guessyou"]/div[1]/h2/text()').extract_first() logging.info(u"find：%s" % guessyou) logging.info(u'---------------success----------------')

运行结果：

2016-04-18 14:42:51 js_spider.py[line:36] INFO ----------使用splash爬取京东网首页异步加载内容-----------
2016-04-18 14:42:51 js_spider.py[line:38] INFO find：猜你喜欢
2016-04-18 14:42:51 js_spider.py[line:39] INFO ---------------success----------------

可以看出结果里面已经找到了这个”猜你喜欢”，说明异步加载内容爬取成功！

爬虫、网页测试及 java servlet 测试框架等介绍相关推荐

Java开源图像处理框架Marvin介绍
Java对图像的处理框架比较少,目前比较流行的有Jmagick以及Marvin,但Jmagick只能处理图像(上篇Java清除图片中的恶意信息(利用Jmagick)中对Jmagick已做过简略介绍), ...
Java Servlet 编程，重定向介绍
文章目录什么是重定向重定向流程演示图如何重定向注意的问题重定向的特点重定向演示什么是重定向服务器向浏览器发送一个状态码 302 及一个消息头 location,浏览器收到后,会立即向 ...
[转]Java——Servlet的配置和测试
本文转自:http://blog.csdn.net/makefish/article/details/6904807 本文以一个实例介绍如何用Java开发Servlet. 主要内容有: 配置和验证To ...
sqlmap能测试java么_ibatis框架如何测试？ibatis入门解析
想要学会一个框架,仅仅只有足够的理论知识一定是不够的,实践才是唯一的道理,下面就来看看ibatis框架该怎么测试使用吧. 首先当然是准备我们的基础配置文件啦,缺什么都不能缺配置文件<?xml ...
java架构师培训:java最佳测试框架JBehave的基本介绍
我们都知道行为驱动开发(BDD).此类测试以对业务用户透明的方式描述了验收测试.JBehave是另一个用于BDD测试的Java测试框架,主要与SeleniumWebDriverforJava一起使用. ...
java 中常用框架、intell idea简单使用、爬虫系统
学习:http://www.ityouknow.com/spring-boot.html http://blog.didispace.com/spring-boot-learning-1/ ***in ...
java开发测试工具
JUnit是由 Erich Gamma 和 Kent Beck 编写的一个回归测试框架(regression testing framework).Junit测试是程序员测试,即所谓白盒测试,因为程序 ...
JAVA基础测试企业面试题
元享利贞科技有限公司 1.final.finally.finalize区别 final 修饰长量变成常量(声明的时候实例代码块) 修饰变量表示这个变量的值不能被修改只能初始化一次(声明,实例代 ...
软件测试基础理论体系学习9-什么是网页测试？什么是网站测试？如何开展测试？
9-什么是网页测试?什么是网站测试?如何开展测试? 1 网页测试 1.1 链接测试 1.2 表单测试 1.3 数据校验 1.4 Cookies测试 1.5 导航测试 1.6 图形测试 1.7 内容测试 ...

爬虫、网页测试及 java servlet 测试框架等介绍

Selenium针对android系统也推出了android版的 AndroidDriver，可以区看看。但似乎已经停止更新了？不能确定。

安装：

scrapy-splash的使用

scrapy-splash简介

安装docker

安装Splash

安装scrapy-splash

配置scrapy-splash

使用scrapy-splash

SplashRequest

Responses

Session的处理

使用实例

爬虫、网页测试及 java servlet 测试框架等介绍相关推荐

最新文章

热门文章

爬虫、网页测试 及 java servlet 测试框架等介绍

Selenium针对android系统也推出了android版的 AndroidDriver， 可以区看看。但似乎已经停止更新了？不能确定。

安装：

scrapy-splash的使用

scrapy-splash简介

安装docker

安装Splash

安装scrapy-splash

配置scrapy-splash

使用scrapy-splash

SplashRequest

Responses

Session的处理

使用实例

爬虫、网页测试 及 java servlet 测试框架等介绍相关推荐

最新文章

热门文章

爬虫、网页测试及 java servlet 测试框架等介绍

Selenium针对android系统也推出了android版的 AndroidDriver，可以区看看。但似乎已经停止更新了？不能确定。

爬虫、网页测试及 java servlet 测试框架等介绍相关推荐