python扇贝单词书_Python脚本扇贝单词书爬取

这是一个·用于爬取扇贝单词书的脚本

将在.py文件目录得到一个名为out.txt的输出文件

主要使用了selenium库（webdriver）

使用方式：

更改

13行中指向webdriver驱动器代码中使用了firefox提供的驱动器

Path = r’C:\Users\pc\Downloads\geckodriver-v0.19.1-win64\geckodriver.exe’

15行中的单词书网页根目录

rootdir=“https://www.shanbay.com/wordbook/6403/”

运行，并且赞美太阳

source code：

# coding=utf-8

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.common.keys import Keys

from selenium.webdriver.support.ui import Select

from selenium.common.exceptions import NoSuchElementException

from selenium.common.exceptions import NoAlertPresentException

import unittest, time, re

import sys

reload(sys)

sys.setdefaultencoding('utf-8')

Path = r'C:\Users\pc\Downloads\geckodriver-v0.19.1-win64\geckodriver.exe'

f = open("out.txt", "w")

rootdir=“https://www.shanbay.com/wordbook/6403/”

class ShanbeiWord(unittest.TestCase):

def setUp(self):

self.driver = webdriver.Firefox(executable_path=Path)

self.driver.implicitly_wait(30)

self.verificationErrors = []

self.accept_next_alert = True

def test_shanbei_word(self):

s = " "

driver = self.driver

i = 1

while i<12:

driver.get(

driver.find_element_by_xpath("/html/body/div[3]/div/div[1]/div/div[4]/div[7]/div["+str(i)+"]/div[1]/table/tbody/tr/td[1]/a").click()

i=i+1

j=1

while j<10:

#f.write(driver.page_source)

s = str(s)

s=s+str(driver.page_source)

#f.write(str(i)+"+++"+str(j))

driver.find_element_by_link_text(">").click()

j=j+1

print(str(i) + "+++" + str(j))

#f.write(driver.page_source)

s=str(s)

s = s + str(driver.page_source)

s = str(re.findall(r'g>.*', s, flags=0))

s = str(re.findall(r'>.*?<', s, flags=0))

f.write(s)

def is_element_present(self, how, what):

try:

self.driver.find_element(by=how, value=what)

except NoSuchElementException as e:

return False

return True

def is_alert_present(self):

try:

self.driver.switch_to_alert()

except NoAlertPresentException as e:

return False

return True

def close_alert_and_get_its_text(self):

try:

alert = self.driver.switch_to_alert()

alert_text = alert.text

if self.accept_next_alert:

alert.accept()

else:

alert.dismiss()

return alert_text

finally:

self.accept_next_alert = True

def tearDown(self):

self.driver.quit()

self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":

unittest.main()

赞过：

赞正在加载……

python扇贝单词书_Python脚本扇贝单词书爬取相关推荐

python爬虫模块排名_Python爬虫使用lxml模块爬取豆瓣读书排行榜并分析
上次使用了beautifulsoup库爬取电影排行榜,爬取相对来说有点麻烦,爬取的速度也较慢.本次使用的lxml库,我个人是最喜欢的,爬取的语法很简单,爬取速度也快. 本次爬取的豆瓣书籍排行榜的首页地 ...
python交通调查数据处理_Python突破高德API限制爬取交通态势数据+GIS可视化（超详细）...
一.需求: 爬取高德的交通态势API,将数据可视化为含有交通态势信息的矢量路网数据. 二.使用的工具: Python IDLE.记事本编辑器.ArcGIS 10.2.申请的高德开发者KEY(免费). ...
基于python的音乐数据分析_Python对QQ音乐进行爬取并进行数据分析
三方包引入使用到了以下包: 爬虫 scrapy 网络测试 requests 数据分析 numpy和pandas 绘图 matplotlib和wordcloud pip install scrapy ...
python功能性爬虫案例_Python爬虫实现使用beautifulSoup4爬取名言网功能案例
本文实例讲述了Python爬虫实现使用beautifulSoup4爬取名言网功能.分享给大家供大家参考,具体如下: 爬取名言网top10标签对应的名言,并存储到mysql中,字段(名言,作者,标签) ...
python爬虫经典段子_Python爬虫实战一之爬取糗事百科段子
大家好,前面入门已经说了那么多基础知识了,下面我们做几个实战项目来挑战一下吧.那么这次为大家带来,Python爬取糗事百科的小段子的例子. 首先,糗事百科大家都听说过吧?糗友们发的搞笑的段子一抓一大把 ...
python爬虫进程池_python爬虫之进程池爬取（世纪佳缘案例）
from concurrent.futures import ProcessPoolExecutor import requests import time,re,json from lxml.htm ...
python爬虫urllib 数据处理_python爬虫学习笔记(三)-爬取数据之urllib库
1. 小试牛刀怎样扒网页呢? 其实就是根据URL来获取它的网页信息,虽然我们在浏览器中看到的是一幅幅优美的画面,但是其实是由浏览器解释才呈现出来的,实质它是一段HTML代码,加 JS.CSS,如果把 ...
python scrapy爬虫电影_Python爬虫Scrapy框架(2) -- 爬取优酷电影进阶
爬取更多的items,例如名字,主演,播放次数,电影海报,并进行多页爬取. items.py 1 importscrapy2 3 classYoukumoiveItem(scrapy.Item):4 ...
python 爬虫热搜_python百度热搜榜爬取
# terminal中安装库 bs4 requests # pip install bs4 requests import requests from bs4 import BeautifulSoup ...

python扇贝单词书_Python脚本扇贝单词书爬取

python扇贝单词书_Python脚本扇贝单词书爬取相关推荐

最新文章

热门文章

python扇贝单词书_Python脚本 扇贝单词书爬取

python扇贝单词书_Python脚本 扇贝单词书爬取相关推荐

最新文章

热门文章

python扇贝单词书_Python脚本扇贝单词书爬取

python扇贝单词书_Python脚本扇贝单词书爬取相关推荐