seo必备网站分析工具，关键词百度搜索结果查询导出源码

两个简单的版本，关于百度搜索结果的采集抓取，可以获取到竞争对手的网站，加以分析和研究，只需输入关键词和搜索页码，即可完成对于竞争对手的获取和研究，给出两个版本，希望可以起到参考和帮助！

版本一

特点

cookies读取，随机选取一个访问网页
导出结果排除了百度自家产品
excel导出数据
简单多线程案例可参考

#百度搜索结果抓取
#author/微信：huguo00289
# -*- coding: utf-8 -*-import requests,time,random
from fake_useragent import UserAgent
from lxml import etree
import threading
import xlsxwriterclass Baidu_search():def __init__(self):self.url="https://www.baidu.com/s?wd="self.ua=UserAgent()self.search_datas=[]#获取cookiesdef get_cookies(self):with open("cookie.txt", "r", encoding="utf-8") as f:cookies = f.readlines()cookie=random.choice(cookies)cookie=cookie.strip()return cookie#获取搜索结果def get_search_objects(self,search_url):headers={"User-Agent":self.ua.random,'Cookie':self.get_cookies(),}html=requests.get(search_url,headers=headers,timeout=8).content.decode("utf-8")time.sleep(2)req=etree.HTML(html)h3s=req.xpath('//div[@class="result c-container new-pmd"]/h3[@class="t"]/a')hrefs=req.xpath('//div[@class="result c-container new-pmd"]/h3[@class="t"]/a/@href')for h3,href in zip(h3s,hrefs):h3=h3.xpath('.//text()')h3=''.join(h3)href=self.get_website_url(href)data=h3,hrefself.search_datas.append(data)print(data)# 获取真实地址def get_website_url(self,baidu_url):r = requests.head(baidu_url, stream=True)website_url = r.headers['Location']# print(website_url)return website_url#插入exceldef write_to_xlsx(self, file_name):workbook = xlsxwriter.Workbook(f'{file_name}_{time.strftime("%Y-%m-%d ", time.localtime())}.xlsx')  # 创建一个Excel文件worksheet = workbook.add_worksheet(file_name)title = ['标题', '网址']  # 表格titleworksheet.write_row('A1', title)for index, data in enumerate(self.search_datas):# content = content.rstrip()# keyword, rank, include_num, chart_url, title, game_id, company_num, long_words_num = datanum0 = str(index + 2)row = 'A' + num0# data = [name, size, game_id]worksheet.write_row(row, data)workbook.close()print("搜索结果数据插入excel表格成功！")def main(self,keyword,num):for i in range(0, num):print(f'正在查询第{i+1}页百度搜索结果数据..')ym = i * 10search_url = f"{self.url}{keyword}&ie=UTF-8&pn={ym}"self.get_search_objects(search_url)self.write_to_xlsx(keyword)#多线程def Thread_main(self,keyword,num):threadings=[]for i in range(0, num):print(f'正在查询第{i+1}页百度搜索结果数据..')ym = i * 10search_url = f"{self.url}{keyword}&ie=UTF-8&pn={ym}"t=threading.Thread(target=self.get_search_objects,args=(search_url,))threadings.append(t)t.start()for x in threadings:x.join()print("多线程查询百度搜索结果完成")print(self.search_datas)if __name__=='__main__':keyword="工业设计"num=10spider=Baidu_search()spider.main(keyword,num)#spider.Thread_main(keyword, num)

版本二

特点

cookies 固定，不可变
数据几乎全部导出，排名也已经写入

#关键词百度搜索结果查询
#20191121 by 微信：huguo00289
# -*- coding: UTF-8 -*-import requests,time
import urllib.parse
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
import jsondef ua():ua = UserAgent()return ua.randomheaders={'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3','Accept-Encoding': 'gzip, deflate, br','Accept-Language': 'zh-CN,zh;q=0.9','Cache-Control': 'max-age=0','Connection': 'keep-alive','Cookie':Cookie ,'Host': 'www.baidu.com','Referer': 'https://www.baidu.com/?tn=48021271_6_hao_pg','Upgrade-Insecure-Requests': '1','User-Agent':ua()#'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36',
}#获取百度跳转真实网址
def get_trueurl(url):try:r = requests.head(url, stream=True)zsurl = r.headers['Location']except:zsurl=urlreturn zsurl#获取网页信息
def get_response(url):"""#代理ipproxy = '120.83.105.195:9999'proxies = {'http': 'http://' + proxy,'https': 'https://' + proxy}response=requests.get(url,headers=ua(),proxies=proxies,timeout=10)"""#response = requests.get(url, headers=ua(),timeout=10)response = requests.get(url, headers=headers, timeout=10)print(f'状态码：{response.status_code}')time.sleep(2)response.encoding='utf-8'req=response.textreturn req#查询搜索结果
def get_bdpm(keyword,num):"""#转换为utf-8编码key_word = urllib.parse.quote(keyword)print(key_word)"""for i in range(0,int(num)):print(f'正在查询{i + 1}页搜索结果...')ym=i * 10url=f"https://www.baidu.com/s?wd={keyword}&ie=UTF-8&pn={ym}"#print(url)req=get_response(url)#print(req)soup=BeautifulSoup(req,'lxml')divs=soup.find('div',id="content_left").find_all('div')for div in divs:if 'class="result'in str(div):try:pm=div['id']except:pm=''title=div.find('a').get_text()title=title.strip()href=div.find('a')['href']zsurl=get_trueurl(href)print(pm,title,zsurl)time.sleep(5)if __name__ == '__main__':while True:keyword =input('请输入要查询的关键词：')num = input('请输入要查询的页码数：')try:get_bdpm(keyword,num)except IndexError as e:print(e)print("查询结果失败！")

微信公众号：二爷记

不定时分享python源码及工具

seo必备网站分析工具，关键词百度搜索结果查询导出源码相关推荐

【php毕业设计】基于php+mysql+apache的网络数据包分析工具设计与实现（毕业论文+程序源码）——网络数据包分析工具
基于php+mysql+apache的网络数据包分析工具设计与实现(毕业论文+程序源码) 大家好,今天给大家介绍基于php+mysql+apache的网络数据包分析工具设计与实现,文章末尾附有本毕业设 ...
PHP百度收录量查询接口源码,百度收录量API查询PHP源码
百度收录量API查询PHP源码 /* Plugin Name:百度收录量 Version:1.0 Description:根据域名返回百度收录量 Author:绿游 Author URL:http:/ ...
HTML百度搜索框实现(附源码带注释)
1.百度类效果源码: <!DOCTYPE html> <html lang="en"> <head><meta charset=&quo ...
MyBatis原理分析之四：一次SQL查询的源码分析
上回我们讲到Mybatis加载相关的配置文件进行初始化,这回我们讲一下一次SQL查询怎么进行的. 准备工作 Mybatis完成一次SQL查询需要使用的代码如下: Java代码 String res ...
原理分析之四：一次SQL查询的源码分析
上回我们讲到Mybatis加载相关的配置文件进行初始化,这回我们讲一下一次SQL查询怎么进行的. 准备工作 Mybatis完成一次SQL查询需要使用的代码如下: Java代码 String res ...
PHP百度收录量查询接口源码,PHP百度收录量查询接口源码
怕小白不懂,使用实例:域名/1.php?domain=emoliang.com $domain = (isset($_GET['domain']))?$_GET['domain']:$_POST['d ...
GA/百度统计/Piwik/JYC:网站分析工具的Cookie设置和访次切分规则
目前主流的网站分析工具一般都是通过JavaScript检测的方式进行访问监控的,包括商业产品(Omniture/JYC统计).免费产品(GA/百度统计/CNZZ).开源产品(Piwik).虽然说原理大 ...
开发工具总结（7）之多年珍藏的Android开发必备网站和工具
今天早上在简书上瞎逛,看到了这个,干货很多,这肯定是出自一个经验丰富的程序员之手,作为小小白,学习路上难免有需要帮助的和通过一些捷径来提高开发效率,所以收藏了这篇文章,同时也增加了一些自己平时收藏的内 ...
网站分析工具使用方法的介绍，快速高效提高网站分析效率
网站运营离不开数据分析,有分析就需要借助工具来实现,你真的会用网站分析工具吗? 目前市面上有很多不同类型的网站分析工具,有免费的和付费的,常见的工具比如GoogleAnalytics.百度统计.99c ...

seo必备网站分析工具，关键词百度搜索结果查询导出源码

seo必备网站分析工具，关键词百度搜索结果查询导出源码

版本一

特点

版本二

特点

seo必备网站分析工具，关键词百度搜索结果查询导出源码相关推荐

最新文章

热门文章