对于爬虫，现在网上的教程也越来越多，方法也是五花八门，甚至出现了APP等爬虫软件。

下面我们将爬取著名的企查查

对于企查查网，我们需要做一些营业执照等信息的话，就需要公司的具体信息，例如公司名称，法定代表人，成立日期，注册金额等基础信息，如下图。

`本文最最最要`：

def main():while True:option = webdriver.ChromeOptions()option.add_experimental_option('excludeSwitches', ['enable-automation'])  # webdriver防检测option.add_argument("--disable-blink-features=AutomationControlled")option.add_argument("--no-sandbox")option.add_argument("--disable-dev-usage")option.add_experimental_option("prefs", {"profile.managed_default_content_settings.images": 2})driver = webdriver.Chrome(executable_path=r'D:\chromedriver.exe',options=option)

设置参数如下：

ChromeOptions:是一个配置 chrome 启动是属性的类,就是初始化
binary_location：设置 chrome 二进制文件位置
add_argument：添加启动参数
add_extension、add_encoded_extension：添加扩展应用
add_experimental_option：添加实验性质的设置参数
debugger_address：设置调试器地址

下载chromedriver.exe驱动

根据你的谷歌浏览器的版本下载chromedriver，注意这里我们下载的版本要与自己的谷歌浏览器的版本相对应，再引入他的路径

点击事件

针对事件点击的话，这个就需要定位到你要模拟点击的标签，不然会报错，或者找不到。
注意：

反应等待时间设置，不然再下一个事件点击之前，定位不了。
标签精准定位

driver.delete_all_cookies()
url = "https://www.qcc.com/weblogin?back=%2F" #https://www.qcc.com/weblogin?back=%2F
driver.get(url)
time.sleep(10)# 点击密码登入
driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div[2]/div[1]/div[2]/a').click()
time.sleep(1)# 输入账号密码
driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div[2]/div[3]/form/div[1]/input').send_keys(username)
driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div[2]/div[3]/form/div[2]/input').send_keys(password)

滑块验证

滑块验证，我们需要定位到滑块的具体标签，因为本人再学习的时候定位到了滑块所在的大盒子标签，这样就会导致验证失败

对于滑动距离这个一般是308

button = driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div[2]/div[3]/form/div[3]/div/div/div[1]/span')
#滑动滑块
ActionChains(driver).click_and_hold(button).perform()
ActionChains(driver).move_by_offset(xoffset=308, yoffset=0).perform()
ActionChains(driver).release().perform()

话不多说，上代码

完整代码

import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver import ActionChainsa = []
def login(driver):driver.delete_all_cookies()url = "https://www.qcc.com/weblogin?back=%2F" #https://www.qcc.com/weblogin?back=%2Fdriver.get(url)time.sleep(10)# 点击密码登入driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div[2]/div[1]/div[2]/a').click()time.sleep(1)# 输入账号密码#driver.find_element_by_id('nameNormal').send_keys(username)  # /html/body/div[1]/div[3]/div/div[2]/div[3]/form/div[1]/inputdriver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div[2]/div[3]/form/div[1]/input').send_keys(username)driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div[2]/div[3]/form/div[2]/input').send_keys(password)button = driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div[2]/div[3]/form/div[3]/div/div/div[1]/span')#滑动滑块ActionChains(driver).click_and_hold(button).perform()ActionChains(driver).move_by_offset(xoffset=308, yoffset=0).perform()ActionChains(driver).release().perform()time.sleep(2)driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div[2]/div[3]/form/div[4]/button/strong').click()# 点击登录time.sleep(0.5)url_a = ['https://www.qcc.com/web/search?key=it&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22GD%22%7D%5D%7D','https://www.qcc.com/web/search?key=it&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22BJ%22%7D%5D%7D','https://www.qcc.com/web/search?key=it&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22JS%22%7D%5D%7D','https://www.qcc.com/web/search?key=it&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22SH%22%7D%5D%7D','https://www.qcc.com/web/search?key=it&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22ZJ%22%7D%5D%7D','https://www.qcc.com/web/search?key=it&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22SC%22%7D%5D%7D','https://www.qcc.com/web/search?key=it&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22SD%22%7D%5D%7D','https://www.qcc.com/web/search?key=it&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22HB%22%7D%5D%7D','https://www.qcc.com/web/search?key=it&p={}&searchIndex=%7B%22scope%22%3A%22it%22%7D']num = 1for r in url_a:for j in range(1,6):driver.get(r.format(j)) #https://www.qcc.com/web/search?key=itfor i in range(1,20):d = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[2]/div[4]/div/div[2]/div/table/tr[{}]/td[3]/div/a[1]'.format(i))print('第{}条----->>>'.format(num),d.get_attribute("href"))num += 1a.append(d.get_attribute("href"))time.sleep(5)da = pd.DataFrame(a)da.to_csv('./data.csv')def main():while True:option = webdriver.ChromeOptions()option.add_experimental_option('excludeSwitches', ['enable-automation'])  # webdriver防检测option.add_argument("--disable-blink-features=AutomationControlled")option.add_argument("--no-sandbox")option.add_argument("--disable-dev-usage")option.add_experimental_option("prefs", {"profile.managed_default_content_settings.images": 2})driver = webdriver.Chrome(executable_path=r'根据你的谷歌浏览器的版本下载chromedriver，引入他的路径',options=option)driver.set_page_load_timeout(15)login(driver)# jugesd(driver)if __name__ == '__main__':username = '账号'password = '密码'headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36'}main()

结果展示

希望这篇文章对你有用！
谢谢点赞评论！

selenium 轻松模拟登录企查查，获取企业详细信息链接相关推荐

日常学习|从企查查获取企业工商信息
这里临时快速写出来的简陋代码,能实现简单的功能我找到了天眼查.企查查和国家企业信用信息公示系统这三个可以获取企业工商信息的网站一开始是想要从'国家企业信用信息公示系统'入手的,可惜我对js的代码不 ...
python爬虫从企查查获取企业信息-手工绕开企查查的登录验证
想要从企查查爬取企业信息,如果没有登录直接检索,邮箱.电话都被隐藏了: 上面的图片是之前截的图,今天再次检索,好像又可见了: 不过单击查看详情时,还是会被隐藏: 不管怎么说,只要企查查想限制登录,总会 ...
【CURL】模拟登录网站并获取用户信息
模拟登录网站并获取用户信息前言这次来做的是通过代码的方式,模拟登录秒拍官网,获取登录用户的信息. 前后端分离本文的标题是CURL,传统的网站是PHP代码直接渲染项目视图,通过表单提交到控制器直接 ...
使用selenium+chrome模拟登录天眼查将数据保存至excel
之前的博客中,为大家带来了使用selenium进行模拟登录,这次呢,为大家带来的是使用selenium进行模拟登录天眼查,然后抓取天眼查的前五页信息,由于不是会员,只能抓取前五页,你要是想要抓取更多, ...
解决selenium + chromedriver模拟登录被淘宝反爬
解决selenium + chromedriver模拟登录被淘宝反爬问题使用 selenium + webdriver 模拟淘宝登录时,出现滑动验证码反爬,尝试程序模拟滑动滑块,以及在程序打开的 ...
12.通过微信小程序端访问企查查（采集工商信息）
12.通过微信小程序端访问企查查(采集工商信息) 需要注意的问题: 一.1.微信端访问企查查小程序需要登录.2.访问抓包获取的url是有时效性的过一段时间就不能用了. http://xcx.qicha ...
python 模拟浏览器selenium_Python使用Selenium模块模拟浏览器抓取斗鱼直播间信息示例...
本文实例讲述了Python使用Selenium模块模拟浏览器抓取斗鱼直播间信息.分享给大家供大家参考,具体如下: import time from multiprocessing import Poo ...
python采集直播间数据_Python使用Selenium模块模拟浏览器抓取斗鱼直播间信息示例...
本文实例讲述了Python使用Selenium模块模拟浏览器抓取斗鱼直播间信息.分享给大家供大家参考,具体如下: import time from multiprocessing import Poo ...
如何批量获取企业工商信息？
java,python,PHP如何采集企业工商信息,销售团队如何批量获取企业工商信息,包括:营业执照信息.主要人员信息.股东及出资信息.变更信息.股权出质等级信息.企业经营异常信息.分支机构信息等打 ...
利用爬虫快速获取企业招聘信息
基于JSoup的HTML解析(以58同城为例) House.java package com.newer.spider;public class House {String room;String d ...

selenium 轻松模拟登录企查查，获取企业详细信息链接

`本文最最最要`：

下载chromedriver.exe驱动

点击事件

滑块验证

完整代码

结果展示

selenium 轻松模拟登录企查查，获取企业详细信息链接相关推荐

最新文章

热门文章

selenium 轻松模拟登录企查查，获取企业详细信息链接

本文最最最要：

下载chromedriver.exe驱动

点击事件

滑块验证

完整代码

结果展示

selenium 轻松模拟登录企查查，获取企业详细信息链接相关推荐

最新文章

热门文章

`本文最最最要`：