使用selenium爬取fofa中的网站链接

上一次爬取fafo给的HTML是被处理过的，不能用，这次我们直接爬取他们给到网站链接，然后自己去爬ip的网站

由于登录后只能爬5页，所以我们只爬5页。

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import os,time
import pandas as pd
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC#app表示我们要搜索的标题
app=["海康威视-视频监控","NUUO-视频监控","北京银河伟业-视频监控","大华-视频监控","Brickcom-视频监控","DASAN_Networks-视频监控","ACTi-视频监控","Vicon-视频监控","雄迈-视频监控"]
#这是启动chromedriver，这东西网上有，下载过来放Chrome里
browser=webdriver.Chrome(executable_path='C:\\Program Files (x86)\\Google\\Chrome\\Application\\chromedriver.exe')
#先随便打开一个网址
browser.get("https://fofa.so/result?q=app%3D%22%E6%B5%B7%E5%BA%B7%E5%A8%81%E8%A7%86-%E8%A7%86%E9%A2%91%E7%9B%91%E6%8E%A7%22&qbase64=YXBwPSLmtbflurflqIHop4Yt6KeG6aKR55uR5o6nIg%3D%3D")

接下来需要在打开的网址登录，主要是登录后才可以爬5页，如果你有会员可以爬更多，

然后

def search():html_text=[]for j in range(5):input_name=browser.find_elements_by_xpath('//div[@class="re-domain"]/a[@target="_blank"]') #获得全部网站链接#print(len(input_name)) #链接数量for i in range(len(input_name)):s=input_name[i].get_attribute("href")html_text.append(str(s))#把整页爬完后换页，我的账号只能爬5页，所有设置成了5time.sleep(2)#休息一下，爬太快会报错if(j!=4):browser.find_element_by_xpath('//div[@id="will_page"]/a[@class="next_page"]').click() #点击换页图标try:WebDriverWait(browser,10).until(EC.text_to_be_present_in_element(("class name" ,"current"),str(j+2))) #判断是否已经换页，没换就等会except:passreturn html_text

html_text=[]
for i in range(len(app)):input_box=browser.find_element_by_id("q")input_box.clear()input_box.send_keys("app=\""+app[i]+"\"")browser.find_element_by_id("search_button").click()WebDriverWait(browser, 10).until(EC.presence_of_element_located(("class name", "left-total-item")))text_0=search()html_text.append(text_0)time.sleep(10)final=[]
for i in range(len(html_text)):for j in range(len(html_text[i])):final.append(html_text[i][j])
df = pd.DataFrame(final,columns=["text"])
df.to_csv("watchdog_ip.csv",index=False)
browser.close()

使用selenium爬取fofa中的网站链接相关推荐

使用selenium爬取fofa中链接的网站正文
如题,可能有人不明白啥意思,看下图,本文就是爬取fofa中红色圈圈的内容,红色圈圈是fofa存储的其他链接的HTML,点击这个图标就会弹出界面,界面里面就是HTML,我们就是爬这个东西. 由于登录后只 ...
python+selenium爬取蜻蜓FM有声小说链接
爬取蜻蜓FM有声小说链接 1 为什么说是链接而不是音频呢? 原因是我只是一个刚开始学习的小白,因为这个是动态加载的数据,我还不会提取这个Network中的链接,写这个只是为了记录学习过程, 由于是第一 ...
[python爬虫] selenium爬取局部动态刷新网站（URL始终固定）
在爬取网站过程中,通常会遇到局部动态刷新情况,当你点击"下一页"或某一页时,它的数据就进行刷新,但其顶部的URL始终不变.这种局部动态刷新的网站,怎么爬取数据呢?某网站数据显示如下 ...
Php使用selenium爬虫,selenium,python爬虫_使用selenium爬取网站时输出结果不正确，selenium,python爬虫 - phpStudy...
使用selenium爬取网站时输出结果不正确网站链接:http://www.ncbi.nlm.nih.gov/pubmed?term=(%222013%22%5BDate%20-%20Publica ...
用 Python selenium爬取股票新闻并存入mysql数据库中带翻页功能demo可下载
用 Python selenium爬取实时股票新闻并存入mysql数据库中 1.分析需求 2.创建表 3.分析需要爬取的网页内容 4.python里面selenium进行爬虫操作 1.添加包 2.连接 ...
使用Selenium爬取网站表格类数据
本文转载自一下网站:Python爬虫(5):Selenium 爬取东方财富网股票财务报表 https://www.makcyun.top/web_scraping_withpython5.html 需 ...
爬虫之selenium爬取斗鱼网站
爬虫之selenium爬取斗鱼网站示例代码: from selenium import webdriver import timeclass Douyu(object):def __init__(s ...
使用python中的Selenium爬取百度文库word文章
参考文章:Python3网络爬虫(九):使用Selenium爬取百度文库word文章,链接为: https://blog.csdn.net/c406495762/article/details/723 ...
练习：selenium 爬取京东的电脑商品100页的数据并保存到csv文件中
练习:selenium 爬取京东的电脑商品100页的数据并保存到csv文件中 from selenium.webdriver import Chrome, ChromeOptions import t ...

使用selenium爬取fofa中的网站链接

上一次爬取fafo给的HTML是被处理过的，不能用，这次我们直接爬取他们给到网站链接，然后自己去爬ip的网站

由于登录后只能爬5页，所以我们只爬5页。

接下来需要在打开的网址登录，主要是登录后才可以爬5页，如果你有会员可以爬更多，

然后

使用selenium爬取fofa中的网站链接相关推荐

最新文章

热门文章