python爬虫 selenium模块的学习

爬取流程

导入selenium模块中的webdriver包
实例化webdriver
准备url
打开网页
定位标签元素
执行动作
获取需要的信息
关闭浏览器

新实例

# 新版本
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
import time# 创建一个对象
web = Chrome()
# 打开浏览器
web.get("https://www.baidu.com")
# 获取元素
el = web.find_element(By.CSS_SELECTOR,'#s-top-left>a')  # css选择器定位
print(el)
time.sleep(1)       # 延时一秒钟
web.quit()          # 关闭浏览器

旧实例

# 旧版本
from selenium import webdriver
import time# 实例化驱动
web = webdriver.Chrome()
# 打开百度
web.get('https://www.baidu.com/')
title = web.find_element_by_xpath('//*[@id="s-top-left"]/a[1]')
print(title.text)   # 输出
time.sleep(1)       # 延时一秒
web.close()   # 关闭浏览器

常用的元素定位方法

新方法

web.find_element(By.ID, 's-top-left')  # 元素id定位
web.find_element(By.CLASS_NAME,'mnav')    # 类选择器定位
web.find_element(By.TAG_NAME, 'a')  # 元素标签定位
web.find_element(By.NAME,'description')   # name属性选择器
web.find_element(By.LINK_TEXT,'新闻')   # 通过文字连接选择
web.find_element(By.PARTIAL_LINK_TEXT, '新')  # 通过部分文字连接选择
web.find_elements(By.XPATH, '//*[@id="s-top-left"]/a')    # xpath选择器
web.find_element(By.CSS_SELECTOR,'#s-top-left>a')  # css选择器定位

旧方法

find_element_by_id(id的值)    # 通过元素的id来定位
find_element_class_name(class的值)    # 通过class属性来定位元素
find_element_by_tag_name(标签的名字)    # 通过标签的名字来定位
find_element_by_css_selector(css选择器)    # 通过css样式定位元素
find_element_by_name(节点中的name的值)    # 通过标签的name来定位
find_element_by_link_text(文字链接)    # 通过文字链接来定位元素
find_element_by_partial_link_text()    # 通过部分文字链接来定位素
driver.find_element_by_xpath(x_path)    # 通过xpth()来定位

执行动作

# 定位元素
input = el = web.find_elements(By.CSS_SELECTOR, '#s-top-left>input')[0]
# 给元素发送消息
input.send_keys('Python')
# 回车操作
input.send_keys(Keys.ENTER)
# 点击事件
input.click()
# 清除消息
input.clear()

执行JavaScript代码

# 可以直接执行js代码
driver.execute_script('alert("直接执行js")')

获取元素文本值和属性

# 获取元素
name = browser.find_element_by_id('s')
# 获取元素的文本值
print(name.text)
# 获取元素的属性
print（name.get_attribute('href')）

打卡第64天，对python大数据感兴趣的朋友欢迎一起讨论、交流，请多指教！

python爬虫 selenium模块的学习相关推荐

[python爬虫] Selenium常见元素定位方法和操作的学习介绍(转载)
转载地址:[python爬虫] Selenium常见元素定位方法和操作的学习介绍一. 定位元素方法官网地址:http://selenium-python.readthedocs.org/locat ...
利用python的selenium模块向Plant-mPLoc提交数据
利用python的selenium模块向Plant-mPLoc提交数据流程一般步骤 1.对数据的预处理 2. 环境的配置 3.代码分析及流程思想回顾和展望流程一般步骤首先我们对得到的序列预 ...
[Python爬虫] Selenium获取百度百科旅游景点的InfoBox消息盒
前面我讲述过如何通过BeautifulSoup获取维基百科的消息盒,同样可以通过Spider获取网站内容,最近学习了Selenium+Phantomjs后,准备利用它们获取百度百科的旅游景点消息盒(I ...
[Python爬虫] Selenium+Phantomjs动态获取CSDN下载资源信息和评论
前面几篇文章介绍了Selenium.PhantomJS的基础知识及安装过程,这篇文章是一篇应用.通过Selenium调用Phantomjs获取CSDN下载资源的信息,最重要的是动态获取资源的评论,它是 ...
[Python爬虫] Selenium实现自动登录163邮箱和Locating Elements介绍
前三篇文章介绍了安装过程和通过Selenium实现访问Firefox浏览器并自动搜索"Eastmount"关键字及截图的功能.而这篇文章主要简单介绍如何实现自动登录163邮箱,同时 ...
数据采集与清洗基础习题（二）Python爬虫常用模块，头歌参考答案
数据采集习题参考答案,会持续更新,点个关注防丢失.为了方便查找,已按照头歌重新排版,朋友们按照头歌所属门类查找实训哦,该篇为Python爬虫常用模块. 创作不易,一键三连给博主一个支持呗. 文章目录 ...
Python爬虫 Selenium实现自动登录163邮箱和Locating Elements介绍
Python爬虫视频教程零基础小白到scrapy爬虫高手-轻松入门 https://item.taobao.com/item.htm?spm=a1z38n.10677092.0.0.482434a6E ...
Python 爬虫 Selenium 基本使用
Python 爬虫 Selenium 基本使用 1. 基础知识 1.1 下载浏览器驱动 1.2 帮助文档 2. 浏览器操作 2.1 浏览器导航 2.2 窗口和选项卡 2.3 Frames and If ...
[Python爬虫] Selenium实现自己主动登录163邮箱和Locating Elements介绍
前三篇文章介绍了安装过程和通过Selenium实现訪问Firefox浏览器并自己主动搜索"Eastmount"keyword及截图的功能.而这篇文章主要简介怎样实现 ...

python爬虫 selenium模块的学习

爬取流程

常用的元素定位方法

执行动作

执行JavaScript代码

获取元素文本值和属性

python爬虫 selenium模块的学习相关推荐

最新文章

热门文章