(一)基础:通过简单HTTP请求和正则进行数据爬取解析

文章目录

  • (一)基础:通过简单HTTP请求和正则进行数据爬取解析
    • 发送简单http请求
    • 如何使用代理ip发送请求
    • 常用的User-Agent
  • 发送简单http请求

    package main
    import (                                                                               "fmt"                                                                             "io/ioutil"                                                                       "net/http"
    )       func main() {                                                                     url := "https://www.kuaidaili.com/free/"                                     //新建一个默认的http客户端                                                       client := http.DefaultClient                                                 req, _ := http.NewRequest("GET", url, nil)                                   //设置user-agent                                                               req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36")req.Close = true                                                             //发送请求,获取响应                                                           resp, _ := client.Do(req)                                                                                                                          if resp.StatusCode!=200{                                                     fmt.Println("Http status code:", resp.StatusCode)                         return                                                                  }                                                                                                                                                  defer resp.Body.Close()                                                      byteList,_:=ioutil.ReadAll(resp.Body)                                         frameHtml:=string(byteList)      fmt.Println(frameHtml)//然后就可以用正则提取需要的内容了
    }
    
  • 如何使用代理ip发送请求

    在进行爬虫时,我们通常不会使用自己的IP,毕竟还不想吃国家饭哈哈,而且一些网站有IP限制会需要多个IP换着使用,这个时候就需要使用IP代理了。

     IP:="http://163.XXX.XXX.XXX:9999"    url,_:=url.Parse(IP)client:=&http.Client{Transport:&http.Transport{Proxy:http.ProxyURL(url),},}
    
  • 常用的User-Agent

    User-Agent又叫用户代理,在请求头中,提供你使用的浏览器的信息。由四部分组成:【Mozilla/5.0】【操作系统版本】【引擎版本】【浏览器版本】,为什么大家都使用Mozilla/5.0呢,这要从很久很久说起…当一个小趣闻吧https://www.zhihu.com/question/19553117。

    有一些网络收集的常用的User-Agent,大家直接使用就好。

    func GetRandomUserAgent() string {r := rand.New(rand.NewSource(time.Now().UnixNano()))return userAgentList[r.Intn(len(userAgentList))]
    }var userAgentList = []string{"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/603.2.5 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.5","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063","Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 6.3; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36","Mozilla/5.0 (iPad; CPU OS 10_3_2 like Mac OS X) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.0 Mobile/14F89 Safari/602.1","Mozilla/5.0 (Windows NT 6.1; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36","Mozilla/5.0 (X11; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0","Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.1 Safari/603.1.30","Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 5.1; rv:52.0) Gecko/20100101 Firefox/52.0","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/58.0.3029.110 Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/603.2.5 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.5","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36","Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36 OPR/46.0.2597.32","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/59.0.3071.109 Chrome/59.0.3071.109 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:53.0) Gecko/20100101 Firefox/53.0","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36","Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 OPR/45.0.2552.898","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 OPR/46.0.2597.39","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:54.0) Gecko/20100101 Firefox/54.0","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.7","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.8","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko","Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0","Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36",
    }
    

(一)基础:通过简单HTTP请求和正则进行数据爬取解析相关推荐

  1. 16-爬虫之scrapy框架手动请求发送实现全站数据爬取03

    scrapy的手动请求发送实现全站数据爬取 yield scrapy.Reques(url,callback) 发起的get请求 callback指定解析函数用于解析数据 yield scrapy.F ...

  2. python如何爬虫股票数据_简单爬虫:东方财富网股票数据爬取(python_017)

    需求:将东方财富网行情中心的股票数据爬取下来,包括上证指数.深圳指数.上证A股.深圳A股.新股.中小板.创业板 等 一.目标站点分析 东方财富网的行情中心页面包含了所有股票信息.在左侧的菜单栏中包含了 ...

  3. python爬虫基础Ⅱ——Ajax数据爬取、带参请求:QQ音乐歌单、QQ音乐评论

    文章目录 基础爬虫部分Ⅱ Ajax技术 json 1. Network 2. XHR怎么请求? 3. 什么是json? 4. json数据如何解析? 带参数请求 1. 复习 2. params 3. ...

  4. 简单爬虫Ajax数据爬取——今日头条图片爬取

    一.Ajax简介 什么是Ajax? Ajax 即"Asynchronous Javascript And XML"(异步 JavaScript 和 XML),是指一种创建交互式网页 ...

  5. 爬虫入门—数据解析基础 bs4库使用之红楼梦全文文本爬取

    爬虫入门-数据解析基础 bs4库使用之红楼梦全文文本爬取 Author: Labyrinthine Leo   Init_time: 2021.02.23 Key Words: Spider.Beau ...

  6. 豆瓣新书速递数据爬取与简单数据处理 | 豆瓣爬虫 python pandas

    豆瓣新书速递数据爬取与简单数据处理 概要 数据爬取 爬取豆瓣平台提供的数据,存储到本地 json 文件. 数据说明 URL 豆瓣新书速推 HTML https://book.douban.com/la ...

  7. requests库(正则提取)爬取千图网

    requests库(正则提取)爬取千图网 首先分析网页结构 打开千图网的网址搜索春节 打开网页源代码,发现跳转链接存在网页源代码里 接下来我们就利用正则表达式去提取 正则表达式最主要的就是找到你想要信 ...

  8. Python数据爬取超详细讲解(零基础入门,老年人都看的懂)

    转载于:https://www.bilibili.com/video/BV12E411A7ZQ?spm_id_from=333.337.search-card.all.click 本文是根据视频教程记 ...

  9. 免费python课程排行榜-Python基础练习(一)中国大学定向排名爬取

    说好的要从练习中学习爬虫的基础操作,所以就先从容易爬取的静态网页开始吧! 今天要爬取的是最好大学网上的2018年中国大学排名.我个人认为这个是刚接触爬虫时用来练习的一个很不错的网页了. 在说这个练习之 ...

最新文章

  1. k8s:koolshare软路由安装及k8s基本环境配置
  2. linux之systemctl设置自定义服务
  3. 详解3D点云分割网络 Cylindrical and Asymmetrical 3D Convolution Networksfor LiDAR Segmentation
  4. 每天10分钟就能练出流利口语
  5. c语言手机教程,【图片】【教程】手机c语言入门与手机c编程【mrp吧】_百度贴吧...
  6. c语言入门经典课后作业,C语言入门经典习题答案.doc
  7. Percona XtraBackup 安装介绍篇
  8. 《Python预测之美》送书活动,拿走不谢~
  9. Graphical Model(概率图模型)的浅见
  10. linux数字小键盘,银行工作者必备!小郭数字小键盘练习软件:免费数字键小键盘指法练习...
  11. c语言自动安装软件,VC++(c语言程序下载安装)
  12. java把date转化成yyyymmdd_jquery 将当前时间转换成yyyymmdd格式的实现方法
  13. 在OpenWrt中使用SmartDNS加速DNS解析
  14. linux cat命令什么意思
  15. 分析方法论_用户生命周期的建立
  16. java实现getch_Java中是否有C++中的getch()等效项? - java
  17. 05JS实现弹性相册
  18. [转]在计算机领域做研究的一些想法
  19. 2022西式面点师(高级)操作证考试题模拟考试平台操作
  20. 微信公众号开发 - 开发环境搭建

热门文章

  1. 全面了解MAC OS X系统(以 Mac OS 9为例)
  2. 数据采集的特点有哪些?
  3. 亲子关系-《非暴力亲子沟通》书中的精髓:父母如何用正确的沟通方法与孩子交流,从而改善亲子关系,促进孩子的健康成长。
  4. 荣耀v30没用鸿蒙,荣耀V30Pro:没赶上鸿蒙系统,但还能再战!
  5. 新浪微博商业数据API、通用API
  6. 平台推广服务团队有哪些角色
  7. 讲体育教师资格证注意事项
  8. Python模块PyAutoIt调用AutoIT
  9. 微信小程序实现黑白化
  10. Android引入Apollo(阿波罗)