python 爬虫学习之 selenium.webdriver学习

适用：爬取动态页面数据

谷歌浏览器驱动程序下载地址：
http://chromedriver.storage.googleapis.com/index.html

1 浏览器创建

实例化一款浏览器

browser = webdriver.Chrome(executable_path=‘chromedriver.exe’)

from selenium import webdriverbrowser = webdriver.Chrome()
browser = webdriver.Firefox()
browser = webdriver.Edge()
browser = webdriver.PhantomJS()
browser = webdriver.Safari()

2 元素定位

 注： find_element_by_xxx找的是第一个符合条件的标签，find_elements_by_xxx找的是所有符合条件的标签。

from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keysclass Douban(object):def __init__(self):self.url = 'https://accounts.douban.com/passport/login?source=book'#创建浏览器self.driver = webdriver.Chrome()def LogIn(self):#通过浏览器向服务器发送URL请求self.driver.get(self.url)time.sleep(3)# 生成一张网页快照self.driver.save_screenshot('123.png')# 点击账号密码登录（通过class元素定位）self.driver.find_element_by_class_name('account-tab-account').click()# 手机账号输入（通过id元素定位）self.driver.find_element_by_id('username').send_keys('手机号码')time.sleep(2)self.driver.find_element_by_id('username').send_keys(Keys.TAB)#（通过name元素定位）self.driver.find_element_by_name('password').send_keys('密码')#（通过"文本链接"定位）self.driver.find_element_by_link_text('登录豆瓣').click()'''#（通过xpath定位）self.driver.find_element_by_xpath("//*[@]")self.driver.find_element_by_xpath("//*[@name='wd']")self.driver.find_element_by_xpath("//input[@]")self.driver.find_element_by_xpath("/html/body/form/span/input")self.driver.find_element_by_xpath("//span[@]/input")self.driver.find_element_by_xpath("//form[@]/span/input")self.driver.find_element_by_xpath("//input[@ and @name='wd']")#（通过tag元素定位）self.driver.find_element_by_tag_name("input")'''time.sleep(3)self.driver.save_screenshot('345.png')#打印cookies信息print(self.driver.get_cookies())if __name__ == '__main__':douban = Douban()douban.LogIn()

3 控制浏览器

from selenium import webdriver
from time import sleep#1.创建Chrome浏览器对象，这会在电脑上在打开一个浏览器窗口
browser = webdriver.Chrome(executable_path= "chromedriver.exe")
#2.通过浏览器向服务器发送URL请求
browser.get("https://www.baidu.com/")
sleep(3)
#3.刷新浏览器
browser.refresh()
#4.设置浏览器的大小
browser.set_window_size(1400,800)
#5.设置链接内容
element=browser.find_element_by_link_text("新闻")
element.click()

4 调用JavaScript代码

from selenium import webdriver
from time import sleep# 1.访问百度
drive = webdriver.Chrome(executable_path='chromedriver.exe')
drive.get('https://www.baidu.com')# 2.搜索
drive.find_element_by_id('kw').send_keys('python')
drive.find_element_by_id('su').click()# 3.休眠2s,获取服务器的响应内容
sleep(2)# 4.通过javascript设置浏览器窗口的滚动条位置
drive.execute_script('window.scrollTo(0, 500)')
# drive.execute_script('window.scrollTo(0, document.body.scrollHeight)') #滑到最底部sleep(2)
drive.close()

5 获取页面源码数据

from selenium import webdriver
from time import sleep# 1.访问百度
drive = webdriver.Chrome(executable_path='chromedriver.exe')
drive.get('https://www.baidu.com')# 2.搜索
drive.find_element_by_id('kw').send_keys('python')
drive.find_element_by_id('su').click()# 3.休眠2s,获取服务器的响应内容
sleep(2)# 4.获取页面源码数据
text = drive.page_source
print(text)drive.close()

6 cookie操作

from selenium import webdriver
drive = webdriver.Chrome(executable_path='chromedriver.exe')
drive.get('https://www.cnblogs.com/')# 1.打印cookie信息
print(drive.get_cookies())# 2.添加cookie信息
dic = {'name':'name', 'value':'python'}
drive.add_cookie(dic)
print(drive.get_cookies())# 3.遍历打印cookie信息
for cookie in drive.get_cookies():print(f"{cookie['name']}---f{cookie['value']}\n")drive.close()

7 谷歌无头浏览器

from selenium import webdriver
from selenium.webdriver.chrome.options import Options# 1.创建一个参数对象，用来控制chrome以无界面模式打开
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')# 2.创建浏览器对象
drive = webdriver.Chrome(executable_path='chromedriver.exe', chrome_options=chrome_options)# 3.发起请求获取数据
drive.get('https://www.cnblogs.com/')page_text = drive.page_source
print(page_text)drive.close()

8 selenium规避被检测识别

from selenium import webdriver
from selenium.webdriver import ChromeOptions# 1.实例化一个ChromeOptions对象
option = ChromeOptions()
option.add_experimental_option('excludeSwitches', ['enable-automation'])# 2.将ChromeOptions实例化的对象option作为参数传给Crhome对象
driver = webdriver.Chrome(executable_path='chromedriver.exe', options=option)# 3.发起请求
driver.get('https://www.taobao.com/')

python 爬虫学习之 selenium.webdriver学习相关推荐

用 python selenium 爬简书，Python自动化领域之 Selenium WebDriver 学习第2篇
本篇博客使用 selenium 实现对简书官网的操作. 文章目录通过 selenium 执行 JS selenium 实现简书搜索 selenium 隐式与显式等待 selenium 采集京东图书 ...
通过简书网学习 ActionChains,selenium webdriver 学习第3篇
本篇博客学习 selenium webdriver 控制窗口句柄,以及模拟鼠标键盘操作等内容. 控制 BOSS 直聘网站窗口句柄本次先通过 BOSS 直聘网进行测试,打开网站首页的头图. 切换句柄, ...
selenium webdriver学习（一）------------快速开始(转载JARVI)
selenium webdriver学习(一)------------快速开始博客分类: Selenium-webdriver selenium webdriver 学习 selenium webd ...
selenium webdriver学习（八）------------如何操作select下拉框(转)
selenium webdriver学习(八)------------如何操作select下拉框博客分类: Selenium-webdriver 下面我们来看一下selenium webdriver ...
Python爬虫4.4 — selenium高级用法教程
Python爬虫4.4 - selenium高级用法教程综述 Headless Chrome 设置请求头设置代理IP 常用启动项参数options设置 Cookie操作 selenium设置coo ...
python爬虫之初恋 selenium
selenium 是一个web应用测试工具,能够真正的模拟人去操作浏览器. 用她来爬数据比较直观,灵活,和传统的爬虫不同的是, 她真的是打开浏览器,输入表单,点击按钮,模拟登陆,获得数据,样样行.完全 ...
python爬虫代理和selenium
python爬虫代理和selenium 1.代理ip的使用 1.1 获取蘑菇代理中的代理ip def get_ip():response=requests.get('http://piping.mog ...
Python爬虫 | 一条高效的学习路径
不推课程,直接上干货!(文末附python爬虫学习资料,都是我之前用过的,免费的) 从环境配置,到基础知识了解,再到爬虫实战,手把手带你入门Python爬虫. 本文主要针对入门,如果寻求进阶,或者在爬 ...
python爬虫实训日志_Python学习学习日志——爬虫《第一篇》（BeautifulSoup）
爬虫简介(学习日志第一篇) 一.爬虫介绍爬虫:一段自动抓取互联网信息的程序,从互联网上抓取对于我们有价值的信息. 二.Pyyhon爬虫架构 Python 爬虫架构主要由五个部分组成,分别是调度器.U ...
python爬虫有趣的应用软件_Python学习，爬虫不一定非要抓数据，也可以做自己喜欢的应用程序...
写在前面的话最近各种负面消息,对爬虫er来说,并不是很友好,当然这个是对于从业者来说的,对像我这样的正在学习python的个人来说,python爬虫的学习只需要保持以下几点,基本不会出现大的问题:遵 ...

python 爬虫学习之 selenium.webdriver学习

python 爬虫学习之 selenium.webdriver学习

1 浏览器创建

实例化一款浏览器

2 元素定位

3 控制浏览器

4 调用JavaScript代码

5 获取页面源码数据

6 cookie操作

7 谷歌无头浏览器

8 selenium规避被检测识别

python 爬虫学习之 selenium.webdriver学习相关推荐

最新文章

热门文章