【Python爬虫】写个爬虫爬取自己的博客，可以刷访问量

预备工作

添加外部包：

pip install bs4
pip install requests
pip install virtualenv（这个好像没有必要）
pip install lxml

第一步：爬取自己首页的博客链接

代码

# coding: utf-8
import re
import requests
from bs4 import BeautifulSoupdef get_blog_info():headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) ''AppleWebKit/537.36 (KHTML, like Gecko) ''Ubuntu Chromium/44.0.2403.89 ''Chrome/44.0.2403.89 ''Safari/537.36'}html = get_page(blog_url)soup = BeautifulSoup(html, 'lxml')article_list = soup.find('main')article_item = article_list.find_all('p', attrs={'class': 'content'})for ai in article_item:title = ai.a.textlink = ai.a['href']#print(title)print(link)#write_to_file(title+'\t')#write_to_file(link+'\n')def get_page(url):try:headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) ''AppleWebKit/537.36 (KHTML, like Gecko) ''Ubuntu Chromium/44.0.2403.89 ''Chrome/44.0.2403.89 ''Safari/537.36'}response = requests.get(blog_url, headers=headers, timeout=10)return response.textexcept:return ""def write_to_file(content):with open('article.txt', 'a', encoding='utf-8') as f:f.write(content)if __name__ == '__main__':blog_url = "https://blog.csdn.net/sinat_42483341?t=1"get_blog_info()

第二步：通过request访问这些链接

只是request请求，不进行任何操作，相当于访问自己的每一篇博客

代码

# coding: utf-8
import re
import requests
from bs4 import BeautifulSoupdef get_page():try:headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) ''AppleWebKit/537.36 (KHTML, like Gecko) ''Ubuntu Chromium/44.0.2403.89 ''Chrome/44.0.2403.89 ''Safari/537.36'}for blog_url in blog_urls:response = requests.get(blog_url, headers=headers, timeout=10)print("url="+blog_url)#return response.textexcept:return ""def write_to_file(content):with open('article.txt', 'a', encoding='utf-8') as f:f.write(content)if __name__ == '__main__':blog_urls = ["https://blog.csdn.net/sinat_42483341/article/details/91826523","https://blog.csdn.net/sinat_42483341/article/details/89931215",'https://blog.csdn.net/sinat_42483341/article/details/89034286',
'https://blog.csdn.net/sinat_42483341/article/details/88849892',
'https://blog.csdn.net/sinat_42483341/article/details/95871910',
'https://blog.csdn.net/sinat_42483341/article/details/95768679',
'https://blog.csdn.net/sinat_42483341/article/details/95495296',
'https://blog.csdn.net/sinat_42483341/article/details/95043847',
'https://blog.csdn.net/sinat_42483341/article/details/95014941',
'https://blog.csdn.net/sinat_42483341/article/details/94969983',
'https://blog.csdn.net/sinat_42483341/article/details/94492282',
'https://blog.csdn.net/sinat_42483341/article/details/94443619',
'https://blog.csdn.net/sinat_42483341/article/details/94388710',
'https://blog.csdn.net/sinat_42483341/article/details/94296696',
'https://blog.csdn.net/sinat_42483341/article/details/94133323',
'https://blog.csdn.net/sinat_42483341/article/details/94053208',
'https://blog.csdn.net/sinat_42483341/article/details/94050774',
'https://blog.csdn.net/sinat_42483341/article/details/93769801',
'https://blog.csdn.net/sinat_42483341/article/details/93746360',
'https://blog.csdn.net/sinat_42483341/article/details/93739451']get_page()

第三步：（待写）

想要达到自动刷访问量的效果，当然不可能每次都自己手动把字符串存到数组中。应该把所有链接自动存进数组里，逐个访问即可，懒得写了，不过已经写好了Java版，可以参考：

【Java爬虫】自己写爬虫练手，刷CSDN访问量

【Python爬虫】写个爬虫爬取自己的博客，可以刷访问量相关推荐

Python爬虫小实践：爬取任意CSDN博客所有文章的文字内容（或可改写为保存其他的元素），间接增加博客访问量...
Python并不是我的主业,当初学Python主要是为了学爬虫,以为自己觉得能够从网上爬东西是一件非常神奇又是一件非常有用的事情,因为我们可以获取一些方面的数据或者其他的东西,反正各有用处. 这两天闲 ...
python requests爬虫——爬取李开复博客信息（包括javascript部分信息）
今天是国庆假期第二天,已经玩了一天了,今天整理一下前两天写的数据分析作业思路,给实验报告打一下底稿.供对爬虫有兴趣的小伙伴们参考,也希望给实验没完成的同学提供一点小小的帮助. 任务要求. 1)分析页面 ...
使用Python爬取CSDN历史博客文章列表，并生成目录
使用Python爬取CSDN历史博客文章列表,并生成目录这篇博客将介绍如何使用Python爬取CSDN历史博客文章列表,并生成目录. 2020年 2020年04月 cv2.threshold() 阈 ...
Selenium3+python3自动化（四十三）--爬取我的博客园粉丝的名称，并写入.text文件...
爬取目标 1.爬取目标网站:我的博客:https://home.cnblogs.com/u/canglongdao/followers/ 爬取内容:爬取我的博客的所有粉丝的名称,并保存到txt 3.由 ...
开发记录_自学Python写爬虫程序爬取csdn个人博客信息
每天刷开csdn的博客,看到一整个页面,其实对我而言,我只想看看访问量有没有上涨而已... 于是萌生了一个想法: 想写一个爬虫程序把csdn博客上边的访问量和评论数都爬下来. 打算通过网络各种搜集资料 ...
python爬虫教程：基于python框架Scrapy爬取自己的博客内容过程详解
前言 python中常用的写爬虫的库常有urllib2.requests,对于大多数比较简单的场景或者以学习为目的,可以用这两个库实现.这里有一篇我之前写过的用urllib2+BeautifulSou ...
Python Ajax爬取微博个人博客数据
文章目录利用request.pyquery.xlwings等库抓取微博个人博客数据. (1)抓取[目标网址](https://m.weibo.cn/u/2830678474) (2)用 Chrome ...
爬取李开复博客并导入mongodb数据库
1.实验目的 l 掌握使用Scrapy等爬虫工具编写爬虫程序的基本思路: l 掌握抓取列表+详情的静态组合页面的方法 2.实验要求抓取目标.可以选择以下网站作为抓取目标,也可以自行寻找自己感兴趣的抓 ...
爬虫小实战（selenium）数据小分析（pywebio、pyecharts）python分析写在网页爬取2021年世界500强企业
爬取数据通过selenium爬取2021年世界500强企业数据 import time import requests import csv from selenium import webdri ...

【Python爬虫】写个爬虫爬取自己的博客，可以刷访问量

预备工作

第一步：爬取自己首页的博客链接

第二步：通过request访问这些链接

第三步：（待写）

【Python爬虫】写个爬虫爬取自己的博客，可以刷访问量相关推荐

最新文章

热门文章