python 爬虫 selenium

Selenium 个人学习笔记

准备工作

一: 安装Chrome浏览器

二: 根据你浏览器的版本号下载 ChromeDriver （笔者的版本号是 76.0.3809.100）

ChromeDriver下载地址

笔者的版本下载

三:环境变量配置

（windows下）直接将ChromDriver可执行文件复制到python的Script目录下

四:验证安装

在cmd中直接执行 chromedriver 命令

如图

1.selenium基本使用

from selenium import webdriverbrowser = webdriver.Chrome()
browser.get('https://www.baidu.com')
input_ = browser.find_element_by_id('kw')
input_.send_keys('Python')
browser.close()

2.声名浏览器对象

from selenium import  webdriverbrowser = webdriver.Chrome()
browser = webdriver.Firefox()
browser = webdriver.Edge()
browser = webdriver.PhantomJS()
browser = webdriver.Safari()

3.访问页面

from selenium import webdriverbrowser = webdriver.Chrome()
browser.get('https://www.taobao.com')
print(browser.page_source)
browser.close()

4.查找节点

单个节点

from selenium import webdriver
# 查找节点
browser = webdriver.Chrome()
browser.get('https://www.taobao.com')
input_first = browser.find_element_by_id('q')
# input_first_1 = browser.find_element(By.ID, 'q')input_second = browser.find_element_by_css_selector('#q')
input_third = browser.find_element_by_xpath('//*[@id="q"]')
print(input_first, input_second, input_third, sep='\n')
browser.close()

获取单个节点的方法

find_element_by_id()
find_element_by_name()
find_element_by_xpath()
find_element_by_link_text()
find_element_by_partial_link_text()
find_element_by_tag_name()
find_element_by_class_name()
find_element_by_css_selector()find_element()  #通用方法
需要传入两个参数
如：
find_element_by_id == find_element(By.ID, id)

多个节点

from selenium import webdriverbrowser = webdriver.Chrome()
browser.get('https://www.taobao.com')
lis = browser.find_elements_by_css_selector('.service-bd li')
print(lis)
browser.close()#在单节点的基础上，element 加一个 s

5节点交互

from selenium import webdriver
import timebrowser = webdriver.Chrome()
browser.get('https://www.taobao.com')
input_ = browser.find_element_by_id('q')
input_.send_keys('跳蛛') #输入文字
time.sleep(1)
input_.clear() # 清空文字
input_.send_keys('蜥蜴')
button = browser.find_element_by_class_name('btn-search')
button.click()

6动作链

from selenium import webdriver
from  selenium.webdriver import ActionChainsbrowser = webdriver.Chrome()
url = 'http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable'
browser.get(url)
browser.switch_to.frame('iframeResult')
source = browser.find_element_by_css_selector('#draggable')
target = browser.find_element_by_css_selector('#droppable')
actions = ActionChains(browser)
actions.drag_and_drop(source, target)
actions.perform()

7.执行JavaScript

from selenium import webdriverbrowser = webdriver.Chrome()
browser.get('https://www.zhihu.com/explore')
browser.execute_script('window.scrollTo(0, document.body.scrollHeight)')
browser.execute_script('alert("To Bottom")')

*8.获取节点信息

from selenium import webdriver#获取属性
browser = webdriver.Chrome()
browser.get('https://www.baidu.com')
logo = browser.find_element_by_id('su')
print(logo)
print(logo.get_attribute('class'))# 获取文本值
browser = webdriver.Chrome()
url = 'https://www.baidu.com'
browser.get(url)
input_ = browser.find_element_by_class_name('mnav')
print(input_.text)# 获取id、位置、标签名和大小from selenium import webdriverbrowser = webdriver.Chrome()
url= 'https://www.zhihu.com/explore'
browser.get(url)
input_ = browser.find_element_by_xpath('//*[@id="Popover1-toggle"]')
print(input_.tag_name)
print(input_.location)
print(input_.size)
print(input_.id)
print(input_.__class__)

9.切换Frame

Selenium打开页面后，默认是在父级Frame里面操作的，而此时如果页面中还有子Frame，是不能够获取到子Frame里面的节点的。这时需要使用Switch_to_frame()方法

ex:

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementExceptionbrowser = webdriver.Chrome()
url = 'http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable'
browser.get(url)
browser.switch_to.frame('iframeResult') # 切换到子Frame
try:logo = browser.find_element_by_class_name('logo') # 尝试获取Frame里的logo节点
except NoSuchElementException:print('NO LOGO')
browser.switch_to.parent_frame()
logo = browser.find_element_by_class_name('logo')
print(logo)
print(logo.text)

10.延时等待

get()方法会在网页框架加载结束后结束执行，此时获取 page_source,可能并不是浏览器完全加载完成的页面，如果某些页面有额外的Ajax请求，我们在网页源码中也不一定能成功获取到，所以，这里需要延时等待一定时间，确保节点已经加载出来。

隐式等待
当查找节点而节点并没有立即出现的时候，隐式等待将等待一段时间再查找DOM,默认时间是0

ex:

from selenium import webdriverbrowser = webdriver.Chrome()
browser.implicitly_wait(10)     # 隐式等待
url = 'https://pixabay.com/zh/images/search/%E8%B7%B3%E8%9B%9B/'
browser.get(url)
browser.close()

显示等待

指定要查找的节点，然后指定一个最长等待时间。如果规定时间内加载出来了该节点，返回查找的节点；反之，抛出超时异常。

ex:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ECbrowser = webdriver.Chrome()
url = 'https://www.taobao.com'
browser.get(url)
wait = WebDriverWait(browser, 10) # 显式等待
input_ = wait.until(EC.presence_of_element_located((By.ID, 'q'))) # 等待条件
botton_ = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '.btn-search')))
print(input_, botton_, sep='\n')

等待条件极其含义

等待条件	含义
title_is	标题是某内容
title_contains	标题包含某内容
presence_of_element_located	节点加载出来，传入定位元组，如（By.ID, 'p'）
visibility_of_element_located	节点可见，传入定位元组
visibility_of	可见，传入节点对象
presence_of_all_elements_located	所有节点加载出来
text_to_be_present_in_element	某个节点文本包含某文字
text_to_be_present_in_element_value	某个节点值包含某文字
frame_to_be_available_and_switch_to_it	加载并切换
invisibility_of_element_located	节点不可见
element_to_be_clickable	节点可点击
staleness_of	判断一个节点是否仍在DOM,可判断网页是否已经刷新
element_to_be_selected	节点可选择，传节点对象
element_located_to_be_selected	节点可选择，传入定位元组
element_selection_state_to_be	传入节点对象及状态，相等返回True,否则返回False
element_located_selection_state_to_be	传入定位元组及状态，相等返回True,否则返回False
alert_is_present	是否出现警告

11.前进和后退

import time
from selenium import webdriverbrowser = webdriver.Chrome()
browser.get('https://www.baidu.com')
browser.get('https://www.taobao.com')
browser.get('https://www.python.org')
browser.back() # 后退
time.sleep(1)
browser.forward() # 前进
browser.close()

12.Cookies

from selenium import webdriverbrowser = webdriver.Chrome()
browser.get('https://www.zhihu.com/explore')
print(browser.get_cookies())
browser.add_cookie({'name': 'duoban', 'domain': 'www.zhihu.com', 'value': 'germey'})
print(browser.get_cookies())
browser.delete_all_cookies()
print(browser.get_cookies())

13.选项卡管理

import time
from selenium import webdriverbrowser = webdriver.Chrome()
browser.get('https://baidu.com')
browser.execute_script('window.open()')
print(browser.window_handles) # 获取当前开启的所有选项卡
browser.switch_to.window(browser.window_handles[1])  # 参数为 选项卡代号
browser.get('https://mail.qq.com')
time.sleep(1)
browser.switch_to.window(browser.window_handles[0])
browser.get('https://translate.google.cn/')

14异常处理

from selenium import webdriver
from selenium.common.exceptions import TimeoutException, NoSuchElementExceptionbrowser = webdriver.Chrome()
try:browser.get('https://www.baidu.com')
except TimeoutException:print('Time out')
try:browser.find_element_by_id('help')
except NoSuchElementException:print('No Element')
finally:browser.close()

转载于:https://www.cnblogs.com/duoban/p/11366570.html

python 爬虫 selenium相关推荐

[Python爬虫] Selenium获取百度百科旅游景点的InfoBox消息盒
前面我讲述过如何通过BeautifulSoup获取维基百科的消息盒,同样可以通过Spider获取网站内容,最近学习了Selenium+Phantomjs后,准备利用它们获取百度百科的旅游景点消息盒(I ...
[Python爬虫] Selenium+Phantomjs动态获取CSDN下载资源信息和评论
前面几篇文章介绍了Selenium.PhantomJS的基础知识及安装过程,这篇文章是一篇应用.通过Selenium调用Phantomjs获取CSDN下载资源的信息,最重要的是动态获取资源的评论,它是 ...
[Python爬虫] Selenium实现自动登录163邮箱和Locating Elements介绍
前三篇文章介绍了安装过程和通过Selenium实现访问Firefox浏览器并自动搜索"Eastmount"关键字及截图的功能.而这篇文章主要简单介绍如何实现自动登录163邮箱,同时 ...
[python爬虫] Selenium常见元素定位方法和操作的学习介绍(转载)
转载地址:[python爬虫] Selenium常见元素定位方法和操作的学习介绍一. 定位元素方法官网地址:http://selenium-python.readthedocs.org/locat ...
Python爬虫 Selenium实现自动登录163邮箱和Locating Elements介绍
Python爬虫视频教程零基础小白到scrapy爬虫高手-轻松入门 https://item.taobao.com/item.htm?spm=a1z38n.10677092.0.0.482434a6E ...
Python 爬虫 Selenium 基本使用
Python 爬虫 Selenium 基本使用 1. 基础知识 1.1 下载浏览器驱动 1.2 帮助文档 2. 浏览器操作 2.1 浏览器导航 2.2 窗口和选项卡 2.3 Frames and If ...
Python爬虫-Selenium（1）
Python爬虫-Selenium(1) @(博客)[python, 爬虫, selenium, Python] Python爬虫-Selenium(1) 前言前期准备基础使用进阶使用浏览器操 ...
[Python爬虫] Selenium实现自己主动登录163邮箱和Locating Elements介绍
前三篇文章介绍了安装过程和通过Selenium实现訪问Firefox浏览器并自己主动搜索"Eastmount"keyword及截图的功能.而这篇文章主要简介怎样实现 ...
python爬虫selenium爬不到frame 的tag标记下#document==0的内容解决
python爬虫selenium爬不到frame 的tag标记下#document==0的内容解决前言按理来说,selenium可以获取当前页面的所有源代码,但却爬不到frame 的tag标记下# ...
python 爬虫+selenium 全自动化下载JS动态加载漫画
** python 爬虫+selenium 全自动化下载JS动态加载漫画最近刚学的python,代码可能并不规范,希望大家见谅! 爬取之前,我们首先需要做一些准备工作,因为很多网站都有反爬检索,为了 ...

python 爬虫 selenium

准备工作

python 爬虫 selenium相关推荐

最新文章

热门文章