python爬虫案例-乌托家家具公司数据爬取

　　这个案例主要是在乌托家网站上爬取家具公司的数据，用的方法是requests模块和xpath语法。代码如下：

 1 # Author:K
 2 import requests
 3 from lxml import etree
 4 import os
 5
 6 HEADERS = {
 7     'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36'
 8 }
 9
10 def parse_page(url):
11     response = requests.get(url=url, headers=HEADERS)
12     page_text = response.text
13     tree = etree.HTML(page_text)
14     li_list = tree.xpath('//ul[@class="rec-commodity-ul targetElement"]/li')
15     for li in li_list:
16         merchant_href = li.xpath('.//div[@class="impression"]/a/@href')[0]
17         merchant_name = li.xpath('.//div[@class="impression"]/a/text()')[0]
18         commodity_name = li.xpath('.//div[@class="material"]/a/text()')[0]
19         # print(merchant_href,merchant_name,commodity_name)
20         detail_page_text = requests.get(url=merchant_href, headers=HEADERS).text
21         tree = etree.HTML(detail_page_text)
22         div_infos = tree.xpath('//div[@class="brand-r"]')
23         for div in div_infos:
24             try:
25                 brand_name = div.xpath('./div[4]/dl/dd/text()')[0]
26                 addr = div.xpath('.//p/text()')[0]
27                 phone = div.xpath('.//dd[2]/text()')[0]
28                 # print(brand_name, addr, phone)
29
30                 # 持久化存储
31                 file_path = 'H:/乌托家/乌托家家具公司.txt'
32                 fp = open(file_path, 'r+', encoding='utf-8')
33                 if brand_name not in fp.read():
34                     if str(addr).__contains__('广东'):
35                         fp.write(brand_name+'   '+addr+'    '+phone+'\n\n')
36                         print(brand_name,'爬取成功！！！')
37                         fp.close()
38             except Exception as e:
39                 print(e)
40
41
42 def get_page():
43     for page in range(1,413):
44         url = 'http://www.wutuojia.com/item/list.html?page=' + str(page)
45         parse_page(url)
46
47
48
49 def main():
50     get_page()
51
52
53 if __name__ == '__main__':
54     # 持久化存储
55     if not os.path.exists('H:/乌托家'):
56         os.mkdir('H:/乌托家')
57     main()

转载于:https://www.cnblogs.com/KisInfinite/p/10952938.html

python爬虫案例-乌托家家具公司数据爬取相关推荐

python爬虫入门------王者荣耀英雄及皮肤数据爬取项目
王者荣耀英雄及皮肤数据爬取项目一:做前需知笔者这段学习了一些爬虫的知识,然后做了一个小项目.接下来,我会把项目的流程展示出来. 运行环境:python 3.6.3.pycharm 2019-3-3 ...
python爬虫之汽车之家论坛帖子内容爬取
Datawhale爬虫第五期 Day7 实战项目:汽车之家车型论坛帖子信息作为国内目前第一大汽车论坛,反爬虫很恶心,中间很多坑. 新手,第一次搞这么复杂的爬虫,前期没有排查,都是遇到坑的时候再返回 ...
Python爬虫--智联招聘职位和公司信息爬取
用scrapy框架进行爬取目录结构: 1. items.py 文件事先定义好我们要进行爬取的信息 # -*- coding: utf-8 -*-# Define here the models f ...
爬虫实例：链家网房源数据爬取
初接触python爬虫,跟着视频学习一些很基础的内容,小小尝试了一下,如有错误感谢指正. 库和方法介绍: (1)requests requests是python的工具包,用于发出请求,,是用来获取网站 ...
Python爬虫入门教程 3-100 美空网数据爬取 1
1.美空网数据-简介从今天开始,我们尝试用2篇博客的内容量,搞定一个网站叫做"美空网"网址为:http://www.moko.cc/, 这个网站我分析了一下,我们要爬取的图片在 ...
Python爬虫入门教程 3-100 美空网数据爬取
1.美空网数据-简介从今天开始,我们尝试用2篇博客的内容量,搞定一个网站叫做"美空网"网址为:http://www.moko.cc/, 这个网站我分析了一下,我们要爬取的图片在 ...
python爬虫案例（有缺陷文末说明）爬取初中英语练习题
应表哥要求给侄子爬取一点英语练习题作为寒假附加作业爬取内容如下网址如下 # https://xiaoxue.hujiang.com/xsc/yingyu/p342265/ # https://xi ...
python爬虫苏宁易购店铺商品数据爬取
苏宁易购店铺商品数据爬取 #!coding=utf-8 ##苏宁易购店铺商品数据爬取 import requests import re import math import random impor ...
python爬虫案例-陶瓷公司数据爬取
用requests爬取要注意HTTPConnectionPool(host=xxx, port=xxx): Max retries exceeded with url...异常,出现这个异常的解决方法 ...

python爬虫案例-乌托家家具公司数据爬取

python爬虫案例-乌托家家具公司数据爬取相关推荐

最新文章

热门文章