考虑到Mr.E.和Arran的评论,我在CSS选择器上完全遍历了列表。棘手的部分是关于我自己的列表结构和标记(更改类等),以及动态创建所需的选择器并在遍历期间将它们保存在内存中。

我通过搜索任何未加载状态的内容来处理等待几个元素的问题。您也可以使用“:nth child”选择器,如下所示:#in for loop with enumerate for i

selector.append(' > li:nth-child(%i)' % (i + 1)) # identify child

by its order pos

这是我的硬注释代码解决方案,例如:def parse_crippled_shifted_list(driver, frame, selector, level=1, parent_id=0, path=None):

"""

Traversal of html list of special structure (you can't know if element has sub list unless you enter it).

Supports start from remembered list element.

Nested lists have classes "closed" and "last closed" when closed and "open" and "last open" when opened (on

).

Elements themselves have classes "leaf" and "last leaf" in both cases.

Nested lists situate in

element as

  • list. Each

    • appears after clicking

in each .

If you click

driver - WebDriver; frame - frame of the list; selector - selector to current list (

);

level - level of depth, just for console output formatting, parent_id - id of parent category (in DB),

path - remained path in categories (ORM objects) to target category to start with.

"""

# Add current level list elements

# This method selects all but loading. Just what is needed to exclude.

selector.append(' > li > a:not([class=loading])')

# Wait for child list to load

try:

query = WebDriverWait(driver, WAIT_LONG_TIME).until(

EC.presence_of_all_elements_located((By.CSS_SELECTOR, ''.join(selector))))

except TimeoutException:

print "%s timed out" % ''.join(selector)

else:

# List is loaded

del selector[-1] # selector correction: delete last part aimed to get loaded content

selector.append(' > li')

children = driver.find_elements_by_css_selector(''.join(selector)) # fetch list elements

# Walk the whole list

for i, child in enumerate(children):

del selector[-1] # delete non-unique li tag selector

if selector[-1] != ' > ul' and selector[-1] != 'ul.ltr':

del selector[-1]

selector.append(' > li:nth-child(%i)' % (i + 1)) # identify child

by its order pos

selector.append(' > a') # add 'li > a' reference to click

child_link = driver.find_element_by_css_selector(''.join(selector))

# If we parse freely further (no need to start from remembered position)

if not path:

# Open child

try:

double_click(driver, child_link)

except InvalidElementStateException:

print "\n\nERROR\n", InvalidElementStateException.message(), '\n\n'

else:

# Determine its type

del selector[-1] # delete changed and already useless link reference

# If

is category, it would have as child now and class="open"

# Check by class is priority, because

exists for sure.

current_li = driver.find_element_by_css_selector(''.join(selector))

# Category case - BRANCH

if current_li.get_attribute('class') == 'open' or current_li.get_attribute('class') == 'last open':

new_parent_id = process_category_case(child_link, parent_id, level) # add category to DB

selector.append(' > ul') # forward to nested list

# Wait for nested list to load

try:

query = WebDriverWait(driver, WAIT_LONG_TIME).until(

EC.presence_of_all_elements_located((By.CSS_SELECTOR, ''.join(selector))))

except TimeoutException:

print "\t" * level, "%s timed out (%i secs). Failed to load nested list." %\

''.join(selector), WAIT_LONG_TIME

# Parse nested list

else:

parse_crippled_shifted_list(driver, frame, selector, level + 1, new_parent_id)

# Page case - LEAF

elif current_li.get_attribute('class') == 'leaf' or current_li.get_attribute('class') == 'last leaf':

process_page_case(driver, child_link, level)

else:

raise Exception('Damn! Alien class: %s' % current_li.get_attribute('class'))

# If it's required to continue from specified category

else:

# Check if it's required category

if child_link.text == path[0].name:

# Open required category

try:

double_click(driver, child_link)

except InvalidElementStateException:

print "\n\nERROR\n", InvalidElementStateException.msg, '\n\n'

else:

# This element of list must be always category (have nested list)

del selector[-1] # delete changed and already useless link reference

# If

is category, it would have as child now and class="open"

# Check by class is priority, because

exists for sure.

current_li = driver.find_element_by_css_selector(''.join(selector))

# Category case - BRANCH

if current_li.get_attribute('class') == 'open' or current_li.get_attribute('class') == 'last open':

selector.append(' > ul') # forward to nested list

# Wait for nested list to load

try:

query = WebDriverWait(driver, WAIT_LONG_TIME).until(

EC.presence_of_all_elements_located((By.CSS_SELECTOR, ''.join(selector))))

except TimeoutException:

print "\t" * level, "%s timed out (%i secs). Failed to load nested list." %\

''.join(selector), WAIT_LONG_TIME

# Process this nested list

else:

last = path.pop(0)

if len(path) > 0: # If more to parse

print "\t" * level, "Going deeper to: %s" % ''.join(selector)

parse_crippled_shifted_list(driver, frame, selector, level + 1,

parent_id=last.id, path=path)

else: # Current is required

print "\t" * level, "Returning target category: ", ''.join(selector)

path = None

parse_crippled_shifted_list(driver, frame, selector, level + 1, last.id, path=None)

# Page case - LEAF

elif current_li.get_attribute('class') == 'leaf':

pass

else:

print "dummy"

del selector[-2:]

python selenium 等待元素出现_Python Selenium等待加载几个元素相关推荐

  1. python控制浏览器不上下滚动失灵_浅谈selenium如何应对网页内容需要鼠标滚动加载的问题...

    相信大家在selenium爬取网页的时候都遇到过这样的问题:就是网页内容需要用鼠标滚动加载剩余内容,而不是一次全部加载出网页的全部内容,这个时候如果要模拟翻页的时候就必须加载出全部的内容,不然定位元素 ...

  2. word2vec实例详解python_在python下实现word2vec词向量训练与加载实例

    项目中要对短文本进行相似度估计,word2vec是一个很火的工具.本文就word2vec的训练以及加载进行了总结. word2vec的原理就不描述了,word2vec词向量工具是由google开发的, ...

  3. jQuery页面滚动 动态加载图片等元素

    相信大家见过好多随着页面滚动,动态加载图片等元素的网站,我也是,以前见了好多,只是没时间去研究,今天晚上有空,百度了一下找了一个jquery插件,作者张鑫旭,效果挺好,代码也很简单,使用更方便,废话不 ...

  4. 利用Python进行数据分析(四):数据加载、存储与文件格式

    标题利用Python进行数据分析(四):数据加载.存储与文件格式 学习笔记来源于:简书https://www.jianshu.com/p/047d8c1c7e14 输入输出通常可以划分为几个大类:读取 ...

  5. python 按需加载_基于python的opcode优化和模块按需加载机制研究(学习与个人思路)(原创)...

    基于python的opcode优化和模块按需加载机制研究(学习与思考) 姓名:XXX 学校信息:XXX 主用编程语言:python3.5 文档转换为PDF有些图片无法完全显示,请移步我的博客查看 完成 ...

  6. python selenium 获取元素下的元素个数_Python + Selenium,分分钟搭建 Web 自动化框架!(送自动化测试书籍)...

    前言 在程序员的世界中,一切重复性的工作,都应该通过程序自动执行.「自动化测试」就是一个最好的例子.随着互联网应用开发周期越来越短,迭代速度越来越快,只会点点点,不懂开发的手工测试,已经无法满足如今的 ...

  7. python xpath定位不到_Python+Selenium定位不到元素常见原因及解决办法(报:NoSuchElementException)...

    #coding=utf-8 '''Created on 2016-7-20 @author: Jennifer Project:登录百度账号 ''' from selenium importwebdr ...

  8. python刷b站教程_python + selenium 刷B站播放量的实例代码

    B站UP主的主要收益来源(播放量获取的奖励.用户充电.广告等等) 首先做up主最直接的就是做视频,当你的粉丝过1000或者视频总播放超过10万时可以申请创造激励,申请创造激励之后,你的原创视频播放会给 ...

  9. python selenium鼠标点击_Python+Selenium学习--鼠标事件

    场景 前景讲解了鼠标的click()事件,而我们在实际的web产品测试中,有关鼠标的操作,不仅仅只有单击,有时候还包括右击,双击,拖动等操作,这些操作包含在ActionChains类中. Action ...

最新文章

  1. step3 . day8数据结构之算法
  2. kafka消费的三种模式_kafka消费者的三种模式(最多/最少/恰好消费一次)
  3. 服务器重装系统网站打不开怎么办,网页打不开,小编教你网页打不开怎么办
  4. dell服务器装不了无线网卡,DELL E6420无线网卡装不上去
  5. 高中计算机编程语句,高中信息技术程序设计知识点.doc
  6. OBS Studio录制腾讯会议
  7. DirectX SDK 各版本开发包下载地址合集
  8. 合并报表口诀_《中级会计实务》合并报表学不会?据说把他的讲义抄6遍就能过!...
  9. 大数据竞赛MR培训与题型
  10. 微信小程序--实现按钮跳转另一个页面
  11. 有关shape文件的说明
  12. 《守望先锋》中网络脚本化的武器和技能系统
  13. 力盟科技冲刺上市:主要通过力盟传媒展业,木瓜移动等亦在努力
  14. 读书笔记之张潇雨个人投资课
  15. linux 串口 数量限制,linux – 你的Unix的TTY主要数量是多少?
  16. mysql 1264_关于MySQL的1264错误处理及sql_mode设置
  17. Vegas中的Vignette暗角视频特效如何使用?
  18. android 显示图片和文字,android TextView显示文字和图片
  19. Python requests爬虫实例
  20. BUUCTF SimpleRev(涉及大小端序存储的问题)

热门文章

  1. 该虚拟机似乎正在使用
  2. 光端机的物理接口类型及传输系统原理
  3. 【渝粤教育】电大中专新媒体营销实务 (6)作业 题库
  4. [渝粤教育] 西南科技大学 电子商务原理及应用 在线考试复习资料
  5. 【渝粤题库】陕西师范大学202012 刑事诉讼法专论 作业
  6. 【渝粤教育】广东开放大学 知识产权法 形成性考核 (34)
  7. 小白系列之51单片机的入门速成法
  8. linux 安装mongodb 64,在CentOS 6.x 64bit上安装MongoDB 3.2社区版
  9. oracle 调整dbw0,求助:DBW的内存占用率高,可能是什么原因?
  10. ace unlck工具下载_压缩工具:WinRAR 曝出代码执行漏洞,该升级了