selenium/requess爬取京东手机商品的详细信息1~selenium!!

前言

因为我也是个学生,所以代码可能会有点繁琐,我们都是超能100,一点点积累进步,其实有很多的地方可以简化,因为我是要去了解每一个语法的应用场景,所以会尽可能的使用更多的语法,可以简化的地方我会给大家在注释标注,希望大家也能自己多动手,因为我是学大数据的,所以相信一个道理,所有的东西都要化繁为简,能把100浓缩到99直到1是一个很艰难的过程,要坚持住,多思考。

数据信息

## 爬取商品介绍用于后续分析

import csv
import json
import re
import timefrom selenium import webdriver#声明插件的路径
driver_path = r'F:\Pycharm\chromedriver.exe'
#声明一个谷歌插件,不加载图片#不加载图片节省时间
options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {'profile.managed_default_content_settings.images': 2})
#设置属性
driver = webdriver.Chrome(executable_path=driver_path,options=options)
#url
#声明一个list,用来存储dict
data_list = []#定义
def start_s ():with open('data_csv.csv', 'r', encoding='utf-8') as f:reader = csv.reader(f)next(reader)for i in reader:st = i[2]url = stprint('====='+url+'====')driver.get(url)#sleep睡眠对于很多网页是必须的,加上headers有时候也是需要的,因为很多的大型网站都是由反爬措施的。time.sleep(5)lits = driver.find_elements_by_class_name('p-parameter')print(lits)for li in lits:# goods_name = li.find_elements_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li')[0].texttry:goods_name = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "商品名称")]').textpattern = r"商品名称:"goods_name = re.sub(pattern, "", goods_name, flags=re.S)except Exception as e:      goods_name = Nonetry:goods_id = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "商品编号")]').textpattern = r"商品编号:"goods_id = re.sub(pattern,"",goods_id,flags=re.S)except Exception as e:      goods_id = Nonetry:goods_clear_weight = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "商品毛重")]').textpattern = r"商品毛重:"goods_clear_weight = re.sub(pattern, "", goods_clear_weight, flags=re.S)except Exception as e:      goods_id = Nonetry:goods_creat_place = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "商品产地")]').textpattern = r"商品产地:"goods_creat_place = re.sub(pattern, "", goods_creat_place, flags=re.S)except Exception as e:      goods_creat_place = Nonetry:goods_cpu_type = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "CPU型号")]').textpattern = r"CPU型号:"goods_cpu_type = re.sub(pattern, "", goods_cpu_type, flags=re.S)except Exception as e:      goods_cpu_typetry:goods_run_ram_size = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "运行内存")]').textpattern = r"运行内存:"goods_run_ram_size = re.sub(pattern, "", goods_run_ram_size, flags=re.S)except Exception as e:      goods_run_ram_sizetry:goods_phone_ram_size = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "机身存储")]').textpattern = r"机身存储:"goods_phone_ram_size = re.sub(pattern, "", goods_phone_ram_size, flags=re.S)except Exception as e:      goods_phone_ram_sizetry:goods_cd_type = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "存储卡")]').textpattern = r"存储卡:"goods_cd_type = re.sub(pattern, "", goods_cd_type, flags=re.S)except Exception as e:      goods_cd_typetry:goods_camera_number = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "摄像头数量")]').textpattern = r"摄像头数量:"goods_camera_number = re.sub(pattern, "", goods_camera_number, flags=re.S)except Exception as e:      goods_camera_numbertry:goods_back_camera_pixel = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "后摄主摄像素")]').textpattern = r"后摄主摄像素:"goods_back_camera_pixel = re.sub(pattern, "", goods_back_camera_pixel, flags=re.S)except Exception as e:      goods_back_camera_pixeltry:goods_front_camera_pixel = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "前摄主摄像素")]').textpattern = r"前摄主摄像素:"goods_front_camera_pixel = re.sub(pattern, "", goods_front_camera_pixel, flags=re.S)except Exception as e:      goods_front_camera_pixeltry:goods_screen_size = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "主屏幕尺寸")]').textpattern = r"主屏幕尺寸(英寸):"goods_screen_size = re.sub(pattern, "", goods_screen_size, flags=re.S)except Exception as e:      goods_screen_size = Nonetry:goods_resolving_power = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "分辨率")]').textpattern = r"分辨率:"goods_resolving_power = re.sub(pattern, "", goods_resolving_power, flags=re.S)except Exception as e:      goods_resolving_power = Nonetry:goods_screen_scala = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "屏幕比例")]').textpattern = r"屏幕比例:"goods_screen_scala = re.sub(pattern, "", goods_screen_scala, flags=re.S)except Exception as e:      goods_screen_scala = Nonetry:goods_front_camera_compose = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "屏幕前摄组合")]').textpattern = r"屏幕前摄组合:"goods_front_camera_compose = re.sub(pattern, "", goods_front_camera_compose, flags=re.S)except Exception as e:      goods_front_camera_compose = Nonetry:goods_charge_type = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "充电器")]').textpattern = r"充电器:"goods_charge_type = re.sub(pattern, "", goods_charge_type, flags=re.S)except Exception as e:      goods_charge_type = Nonetry:goods_wifi = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "热点")]').textpattern = r"热点:"goods_wifi = re.sub(pattern, "", goods_wifi, flags=re.S)except Exception as e:      goods_wifi = Nonetry:goods_operating_system = li.find_element_by_xpath('.//ul[@class="parameter2 p-parameter-list"]/li[contains(text(), "操作系统")]').textpattern = r"操作系统:"goods_operating_system = re.sub(pattern, "", goods_operating_system, flags=re.S)except Exception as e:      goods_operating_system = Nonetry:goods_phone_name = li.find_element_by_xpath('.//ul[@class="p-parameter-list"]/li[contains(text(), "品牌")]').textpattern = r"品牌:"goods_phone_name = re.sub(pattern, "", goods_phone_name, flags=re.S)except Exception as e:      goods_phone_name = Noneprint(goods_name)data_dict = {}#这里希望大家能养成用英文的习惯,很重要哦data_dict['name'] = goods_namedata_dict['goods_id'] = goods_iddata_dict['goods_clear_weight'] = goods_clear_weightdata_dict['goods_creat_place'] = goods_creat_placedata_dict['goods_cpu_type'] = goods_cpu_typedata_dict['goods_run_ram_size'] = goods_run_ram_sizedata_dict['goods_phone_ram_size'] = goods_phone_ram_sizedata_dict['goods_cd_type'] = goods_cd_type# data_dict['goods_id'] = goods_iddata_dict['goods_camera_number'] = goods_camera_numberdata_dict['goods_back_camera_pixel'] = goods_back_camera_pixeldata_dict['goods_front_camera_pixel'] = goods_front_camera_pixeldata_dict['goods_screen_size'] = goods_screen_sizedata_dict['goods_resolving_power'] = goods_resolving_powerdata_dict['goods_screen_scala'] = goods_screen_scaladata_dict['goods_front_camera_compose'] = goods_front_camera_composedata_dict['goods_charge_type'] = goods_charge_typedata_dict['goods_wifi'] = goods_wifidata_dict['goods_operating_syste'] = goods_operating_systemdata_dict['goods_phone_name'] = goods_phone_namedata_list.append(data_dict)print(data_dict)print('我来过这里')breakdef main():start_s()
# 将数据写入jsonwenjwith open('data_detils_json.json', 'a+', encoding='utf-8') as f:json.dump(data_list, f, ensure_ascii=False, indent=4)print('json文件写入完成')with open('data_detils_csv.csv', 'w', encoding='utf-8', newline='') as f:# 表头title = data_list[0].keys()# 声明writerwriter = csv.DictWriter(f, title)# 写入表头writer.writeheader()# 批量写入数据writer.writerows(data_list)print('csv文件写入完成')
if  __name__ == '__main__':main()driver.quit()

总结

这样爬取的话速度是很慢的,能力所限,多线程和scrapy正在学习,等我学完了和大家一起共勉,一起进步~~~

selenium/requess爬取京东手机商品的详细信息1~selenium练习版相关推荐

  1. selenium自动化爬取京东电脑商品信息用于数据分析

    今天使用selenium给别人写的一个自动化爬虫程序 from selenium import webdriver from selenium.webdriver.common.by import B ...

  2. python爬取京东手机数据_用scrapy爬取京东的数据

    本文目的是使用scrapy爬取京东上所有的手机数据,并将数据保存到MongoDB中. 一.项目介绍 主要目标 1.使用scrapy爬取京东上所有的手机数据 2.将爬取的数据存储到MongoDB 环境 ...

  3. python爬取京东手机数据_实例解析Python如何实现爬取京东手机图片

    本文主要为大家分享一篇Python如何实现爬取京东手机图片的方法,具有很好的参考价值,希望对大家有所帮助.一起跟随小编过来看看吧,希望能帮助到大家. 运行环境Python3.6.4#爬取京东手机图片i ...

  4. 用selenium爬取京东平台商品列表,爬取商品名称、价格、店铺信息

    #用selenium爬取京东平台商品列表,爬取商品名称.价格.店铺信息from selenium import webdriver from selenium.webdriver.common.by ...

  5. python爬取京东商品图片_python利用urllib实现爬取京东网站商品图片的爬虫实例

    本例程使用urlib实现的,基于python2.7版本,采用beautifulsoup进行网页分析,没有第三方库的应该安装上之后才能运行,我用的IDE是pycharm,闲话少说,直接上代码! # -* ...

  6. Python爬取京东任意商品数据实战总结

    利用Python爬取京东任意商品数据 今天给大家展示爬取京东商品数据 首先呢还是要分思路的,我分为以下几个步骤: 第一步:得到搜索指定商的url 第二步:获得搜索商品列表信息 第三步:对得到的商品数据 ...

  7. Java实现爬取京东手机数据

    Java实现爬取京东手机数据 最近看了某马的Java爬虫视频,看完后自己上手操作了下,基本达到了爬数据的要求,HTML页面源码也刚好复习了下,之前发布两篇关于简单爬虫的文章,也刚好用得上.项目没什么太 ...

  8. python爬虫实例手机_Python爬虫实现爬取京东手机页面的图片(实例代码)

    实例如下所示: __author__ = 'Fred Zhao' import requests from bs4 import BeautifulSoup import os from urllib ...

  9. python爬取京东网页商品实例(一)

    # Copyright (c)2018, 东北大学软件学院学生 # All rightsreserved # 文件名称:justForTest.py # 作 者:孔云 #问题描述:打开京东页面,选取一 ...

最新文章

  1. w​i​n​8​.1​无​线​上​网​ ​B​r​o​a​d​c​o​m​ ​8​0​2​.​1​1​n​ ​受​限​问​题
  2. java有画图的库吗_Java画图
  3. 什么都不必说 Gradle--buildTypes--productFlavors
  4. Python编程专属骚技巧5
  5. QPW 系统管理后台用户表(tm_user_info)
  6. java自动的废料收集_Java 垃圾收集机制
  7. np完全问题的例子_MIT开发光子算法,试图解决世界7大数学难题的“NP完全问题”...
  8. (2)Linux进程调度器-CPU负载
  9. ASP.NET Web API 2 中的属性路由使用(转载)
  10. 脚本程序gdb 脚本
  11. b+tree索引在MyIsam和InnoDB的不同实现方式
  12. 中兴新支点操作系统上的文件小贴士
  13. java读取文件之BufferedReader
  14. 有名管道与无名管道之间的区别
  15. 苹果手机手机用数据线连接苹果电脑时为何会一直断开无法连接
  16. 使用R语言实现的城市空气质量分析模型
  17. 多屏互动之Windows与Mac下的非自带的远程桌面应用
  18. [leetcode] 309. Best Time to Buy and Sell Stock with Cooldown 解题报告
  19. dcs与plc与c语言的联系,PLC 与DCS的通讯方式,举例讲解
  20. 20221014 复数、双曲复数、对偶数

热门文章

  1. 树莓派机器视觉环境搭建
  2. 用户使用移动支付的风险与防范策略
  3. kafka是什么?主要用在什么场景
  4. 软件工程-第五章-总体设计
  5. ES启动报错error downloading geoip database [GeoLite2-ASN.mmdb]
  6. matlab cui,阻力汽车论文,关于基于Matlab-CUI的汽车动力性相关参考文献资料-免费论文范文...
  7. Spark论文思想之-基于RDD构建的模型(Shark的来龙去脉)
  8. 巴东县黄土坡滑坡GNSS自动化位移监测解决方案
  9. torch和torchvision对应版本(最新版,含有torchvision 0.13.0版本)
  10. java MacBook air,macbook pro 与 macbook air 的区别!(前者是高配?java中如何读取主板序列号、硬盘序列号、MAC地址...