(一)基础:通过简单HTTP请求和正则进行数据爬取解析
(一)基础:通过简单HTTP请求和正则进行数据爬取解析
文章目录
- (一)基础:通过简单HTTP请求和正则进行数据爬取解析
- 发送简单http请求
- 如何使用代理ip发送请求
- 常用的User-Agent
发送简单http请求
package main import ( "fmt" "io/ioutil" "net/http" ) func main() { url := "https://www.kuaidaili.com/free/" //新建一个默认的http客户端 client := http.DefaultClient req, _ := http.NewRequest("GET", url, nil) //设置user-agent req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36")req.Close = true //发送请求,获取响应 resp, _ := client.Do(req) if resp.StatusCode!=200{ fmt.Println("Http status code:", resp.StatusCode) return } defer resp.Body.Close() byteList,_:=ioutil.ReadAll(resp.Body) frameHtml:=string(byteList) fmt.Println(frameHtml)//然后就可以用正则提取需要的内容了 }
如何使用代理ip发送请求
在进行爬虫时,我们通常不会使用自己的IP,毕竟还不想吃国家饭哈哈,而且一些网站有IP限制会需要多个IP换着使用,这个时候就需要使用IP代理了。
IP:="http://163.XXX.XXX.XXX:9999" url,_:=url.Parse(IP)client:=&http.Client{Transport:&http.Transport{Proxy:http.ProxyURL(url),},}
常用的User-Agent
User-Agent又叫用户代理,在请求头中,提供你使用的浏览器的信息。由四部分组成:【Mozilla/5.0】【操作系统版本】【引擎版本】【浏览器版本】,为什么大家都使用
Mozilla/5.0
呢,这要从很久很久说起…当一个小趣闻吧https://www.zhihu.com/question/19553117。有一些网络收集的常用的User-Agent,大家直接使用就好。
func GetRandomUserAgent() string {r := rand.New(rand.NewSource(time.Now().UnixNano()))return userAgentList[r.Intn(len(userAgentList))] }var userAgentList = []string{"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/603.2.5 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.5","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063","Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 6.3; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36","Mozilla/5.0 (iPad; CPU OS 10_3_2 like Mac OS X) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.0 Mobile/14F89 Safari/602.1","Mozilla/5.0 (Windows NT 6.1; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36","Mozilla/5.0 (X11; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0","Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.1 Safari/603.1.30","Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 5.1; rv:52.0) Gecko/20100101 Firefox/52.0","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/58.0.3029.110 Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/603.2.5 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.5","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36","Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36 OPR/46.0.2597.32","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/59.0.3071.109 Chrome/59.0.3071.109 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:53.0) Gecko/20100101 Firefox/53.0","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36","Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 OPR/45.0.2552.898","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 OPR/46.0.2597.39","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.7","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.8","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko","Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36", }
(一)基础:通过简单HTTP请求和正则进行数据爬取解析相关推荐
- 16-爬虫之scrapy框架手动请求发送实现全站数据爬取03
scrapy的手动请求发送实现全站数据爬取 yield scrapy.Reques(url,callback) 发起的get请求 callback指定解析函数用于解析数据 yield scrapy.F ...
- python如何爬虫股票数据_简单爬虫:东方财富网股票数据爬取(python_017)
需求:将东方财富网行情中心的股票数据爬取下来,包括上证指数.深圳指数.上证A股.深圳A股.新股.中小板.创业板 等 一.目标站点分析 东方财富网的行情中心页面包含了所有股票信息.在左侧的菜单栏中包含了 ...
- python爬虫基础Ⅱ——Ajax数据爬取、带参请求:QQ音乐歌单、QQ音乐评论
文章目录 基础爬虫部分Ⅱ Ajax技术 json 1. Network 2. XHR怎么请求? 3. 什么是json? 4. json数据如何解析? 带参数请求 1. 复习 2. params 3. ...
- 简单爬虫Ajax数据爬取——今日头条图片爬取
一.Ajax简介 什么是Ajax? Ajax 即"Asynchronous Javascript And XML"(异步 JavaScript 和 XML),是指一种创建交互式网页 ...
- 爬虫入门—数据解析基础 bs4库使用之红楼梦全文文本爬取
爬虫入门-数据解析基础 bs4库使用之红楼梦全文文本爬取 Author: Labyrinthine Leo Init_time: 2021.02.23 Key Words: Spider.Beau ...
- 豆瓣新书速递数据爬取与简单数据处理 | 豆瓣爬虫 python pandas
豆瓣新书速递数据爬取与简单数据处理 概要 数据爬取 爬取豆瓣平台提供的数据,存储到本地 json 文件. 数据说明 URL 豆瓣新书速推 HTML https://book.douban.com/la ...
- requests库(正则提取)爬取千图网
requests库(正则提取)爬取千图网 首先分析网页结构 打开千图网的网址搜索春节 打开网页源代码,发现跳转链接存在网页源代码里 接下来我们就利用正则表达式去提取 正则表达式最主要的就是找到你想要信 ...
- Python数据爬取超详细讲解(零基础入门,老年人都看的懂)
转载于:https://www.bilibili.com/video/BV12E411A7ZQ?spm_id_from=333.337.search-card.all.click 本文是根据视频教程记 ...
- 免费python课程排行榜-Python基础练习(一)中国大学定向排名爬取
说好的要从练习中学习爬虫的基础操作,所以就先从容易爬取的静态网页开始吧! 今天要爬取的是最好大学网上的2018年中国大学排名.我个人认为这个是刚接触爬虫时用来练习的一个很不错的网页了. 在说这个练习之 ...
最新文章
- k8s:koolshare软路由安装及k8s基本环境配置
- linux之systemctl设置自定义服务
- 详解3D点云分割网络 Cylindrical and Asymmetrical 3D Convolution Networksfor LiDAR Segmentation
- 每天10分钟就能练出流利口语
- c语言手机教程,【图片】【教程】手机c语言入门与手机c编程【mrp吧】_百度贴吧...
- c语言入门经典课后作业,C语言入门经典习题答案.doc
- Percona XtraBackup 安装介绍篇
- 《Python预测之美》送书活动,拿走不谢~
- Graphical Model(概率图模型)的浅见
- linux数字小键盘,银行工作者必备!小郭数字小键盘练习软件:免费数字键小键盘指法练习...
- c语言自动安装软件,VC++(c语言程序下载安装)
- java把date转化成yyyymmdd_jquery 将当前时间转换成yyyymmdd格式的实现方法
- 在OpenWrt中使用SmartDNS加速DNS解析
- linux cat命令什么意思
- 分析方法论_用户生命周期的建立
- java实现getch_Java中是否有C++中的getch()等效项? - java
- 05JS实现弹性相册
- [转]在计算机领域做研究的一些想法
- 2022西式面点师(高级)操作证考试题模拟考试平台操作
- 微信公众号开发 - 开发环境搭建