Python爬虫_selenium

环境安装

下载安装selenuim：pip install selenuim
下载浏览器驱动程序：
- http://chromedriver.storage.googleapis.com/index.html
查看驱动和浏览器版本的映射关系：
- http://blog.csdn.net/huilan_same/article/details/51896672

应用

from selenium importwebdriverfrom time importsleep#实例化浏览器插件
bro = webdriver.Chrome(executable_path='./chromedriver.exe')
bro.get('https://www.baidu.com')
sleep(2)#标签定位
tag_input = bro.find_element_by_id('kw')
tag_input.send_keys('人民币')
sleep(2)btn= bro.find_element_by_id('su')
btn.click()
sleep(2)#关闭浏览器
bro.quit()

雪球网应用

from selenium importwebdriverfrom time importsleep
bro= webdriver.Chrome(executable_path='./chromedriver.exe')bro.get('https://xueqiu.com/')
sleep(5)#执行js实现滚轮向下滑动
js = 'window.scrollTo(0,document.body.scrollHeight)'bro.execute_script(js)
sleep(2)
bro.execute_script(js)
sleep(2)
bro.execute_script(js)
sleep(2)
bro.execute_script(js)
sleep(2)#定位到加载更多按钮
a_tag = bro.find_element_by_xpath('//*[@id="app"]/div[3]/div/div[1]/div[2]/div[2]/a')
a_tag.click()
sleep(5)#获取当前浏览器页面数据(动态)
print(bro.page_source)bro.quit()

PhantomJs是一款无可视化界面的浏览器（免安装）已停止更新不建议使用

from selenium importwebdriverfrom time importsleep
bro= webdriver.PhantomJS(executable_path=r'\phantomjs-2.1.1-windows\bin\phantomjs.exe')bro.get('https://xueqiu.com/')
sleep(2)#截屏
bro.save_screenshot('./1.png')#执行js实现滚轮向下滑动
js = 'window.scrollTo(0,document.body.scrollHeight)'bro.execute_script(js)
sleep(2)
bro.execute_script(js)
sleep(2)
bro.execute_script(js)
sleep(2)
bro.execute_script(js)
sleep(2)
bro.save_screenshot('./2.png')#a_tag = bro.find_element_by_xpath('//*[@id="app"]/div[3]/div/div[1]/div[2]/div[2]/a')#bro.save_screenshot('./2.png')#a_tag.click()
sleep(2)#获取当前浏览器页面数据(动态)
print(bro.page_source)bro.quit()

谷歌无头浏览器

from selenium importwebdriverfrom time importsleepfrom selenium.webdriver.chrome.options importOptions#创建一个参数对象，用来控制chrome以无界面模式打开
chrome_options =Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')bro= webdriver.Chrome(executable_path='./chromedriver.exe',options=chrome_options)
bro.get('https://www.baidu.com')
sleep(2)
bro.save_screenshot('1.png')#标签定位
tag_input = bro.find_element_by_id('kw')
tag_input.send_keys('人民币')
sleep(2)btn= bro.find_element_by_id('su')
btn.click()
sleep(2)print(bro.page_source)
bro.quit()

动作链

from selenium importwebdriverfrom time importsleepfrom selenium.webdriver importActionChains
bro= webdriver.Chrome(executable_path='./chromedriver.exe')
url= 'https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable'bro.get(url=url)#如果定位的标签存在于iframe标签之中，则必须经过switch_to操作在进行标签定位
bro.switch_to.frame('iframeResult')
source_tag= bro.find_element_by_id('draggable')#创建一个动作连的对象
action =ActionChains(bro)
action.click_and_hold(source_tag)for i in range(4):#perform表示开始执行动作链action.move_by_offset(20,0).perform()sleep(1)
bro.quit()

selenium规避被检测识别

现在不少大网站有对selenium采取监测机制。比如正常情况下我们用浏览器访问淘宝等网站的 window.navigator.webdriver的值为undefined。而使用selenium访问则该值为true。

只需要设置Chromedriver的启动参数即可解决问题。在启动Chromedriver之前，为Chrome开启实验性功能参数 excludeSwitches，它的值为['enable-automation']

from selenium.webdriver importChromefrom selenium.webdriver importChromeOptionsoption=ChromeOptions()
option.add_experimental_option('excludeSwitches',['enable-automation'])
driver=Chrome(options=option)

转载于:https://www.cnblogs.com/z1115230598/p/10987165.html

Python爬虫_selenium相关推荐

Python爬虫——Selenium安装配置FireFox浏览器
Selenium安装配置FireFox浏览器前置需求: 1. 下载安装FireFox浏览器和Geckodriver 1.1 Windows下载 1.2 Linux下载 2. 使用Selenium调用 ...
python遇到天猫反爬虫_selenium 淘宝登入反爬虫解决方案（亲测有效）
前言目前在对淘宝进行数据爬取的时候都会碰到,登入时的滑块问题,无论是手动还是脚本都不成功.这里的很重要一个原因是很多的网站都对selenium做了反爬虫机制.接下来是笔者参考网上的网友们的方法亲自测 ...
关于Python爬虫原理和数据抓取1.1
为什么要做爬虫? 首先请问:都说现在是"大数据时代",那数据从何而来? 企业产生的用户数据:百度指数.阿里指数.TBI腾讯浏览指数.新浪微博指数数据平台购买数据:数据堂.国云数据 ...
python爬虫之Scrapy框架的post请求和核心组件的工作流程
python爬虫之Scrapy框架的post请求和核心组件的工作流程一 Scrapy的post请求的实现在爬虫文件中的爬虫类继承了Spider父类中的start_urls,该方法就可以对star ...
python爬虫抓取信息_python爬虫爬取网上药品信息并且存入数据库
我最近在学习python爬虫,然后正好碰上数据库课设,我就选了一个连锁药店的,所以就把网上的药品信息爬取了下来. 1,首先分析网页 2,我想要的是评论数比较多的,毕竟好东西大概是买的人多才好.然后你会 ...
python爬虫案例_推荐上百个github上Python爬虫案例
现在学生都对爬虫感兴趣,这里发现一些好的github开源的代码,分享给各位 1.awesome-spider 该网站提供了近上百个爬虫案例代码,这是ID为facert的一个知乎工程师开源的,star6 ...
Python培训分享：python爬虫可以用来做什么?
爬虫又被称为网络蜘蛛,它可以抓取我们页面的一些相关数据,近几年Python技术的到来,让我们对爬虫有了一个新的认知,那就是Python爬虫,下面我们就来看看python爬虫可以用来做什么? Pytho ...
玩转 Python 爬虫，需要先知道这些
作者 | 叶庭云来源 | 修炼Python 头图 | 下载于视觉中国爬虫基本原理 1. URI 和 URL URI 的全称为 Uniform Resource Identifier,即统一资源标志 ...
买不到口罩怎么办？Python爬虫帮你时刻盯着自动下单！| 原力计划
作者 | 菜园子哇编辑 | 唐小引来源 | CSDN 博客马上上班了,回来的路上,上班地铁上都是非常急需口罩的. 目前也非常难买到正品.发货快的口罩,许多药店都售完了. 并且,淘宝上一些新店口罩 ...

Python爬虫_selenium

环境安装

应用

PhantomJs是一款无可视化界面的浏览器（免安装）已停止更新不建议使用

谷歌无头浏览器

动作链

selenium规避被检测识别

Python爬虫_selenium相关推荐

最新文章

热门文章

Python爬虫_selenium

环境安装

应用

PhantomJs是一款无可视化界面的浏览器（免安装） 已停止更新 不建议使用

谷歌无头浏览器

动作链

selenium规避被检测识别

Python爬虫_selenium相关推荐

最新文章

热门文章

PhantomJs是一款无可视化界面的浏览器（免安装）已停止更新不建议使用