对于做用户研究的同事经常需要去各个平台搜索“竞品”、“近品”等的信息,特别是用户购买后的评论信息,然后做研究分析,笔者的一个朋友曾为了搜集这些数据,在网页上一条条的去搜集,花费大量的时间。于是笔者写了这个demo供朋友使用。

首先,需要引入几个jar包

gson.jar 和 poi.jar包

<!-- 解析返回的评论信息-->
<dependency><groupId>com.google.code.gson</groupId><artifactId>gson</artifactId><version>2.2.4</version>
</dependency><!-- 把信息写入到excel中使用-->
<dependency><groupId>org.apache.poi</groupId><artifactId>poi</artifactId><version>4.0.1</version>
</dependency>

下面我们给出实现代码

首先定义接口:  excel输出的简单封装

package com.zybank.spring.gson.framework.service;import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;import com.zybank.spring.gson.framework.bean.ExcelBean;/**
* @author:zhangfd
* @version 1.0.0
* @date  2018年12月5日 上午9:42:40
* @description
*/
public interface ExcelInterface {public HSSFSheet createSheet(ExcelBean excelBean);public void writeExcel(String destUrl,HSSFWorkbook workbook);
}
package com.zybank.spring.gson.framework.bean;import java.io.Serializable;
import java.util.List;import org.apache.poi.hssf.usermodel.HSSFWorkbook;/**
* @author:zhangfd
* @version 1.0.0
* @date  2018年12月5日 上午9:43:30
* @description
*/
public class ExcelBean implements Serializable{private static final long serialVersionUID = 1L;private String sheetName;//所定义的shell的名称private List<String> columnName;//列名private HSSFWorkbook workbook; //该sheet所属的workbookpublic String getSheetName() {return sheetName;}public void setSheetName(String sheetName) {this.sheetName = sheetName;}public List<String> getColumnName() {return columnName;}public void setColumnName(List<String> columnName) {this.columnName = columnName;}public HSSFWorkbook getWorkbook() {return workbook;}public void setWorkbook(HSSFWorkbook workbook) {this.workbook = workbook;}}

接口的实现类:

package com.zybank.spring.gson.framework.service.imp;import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;import com.zybank.spring.gson.framework.bean.ExcelBean;
import com.zybank.spring.gson.framework.service.ExcelInterface;
/*** * @author zhangfd* excel文件的简单整合*/
public  class ExcelService implements ExcelInterface{/*** 创建excel文件里的一个sheet,* 并且根据传入的参数,把第一行赋值标题,设置每列的长度大小* @return 返回这个sheet,后面需要对具体的内容赋值*/public HSSFSheet createSheet(ExcelBean excelBean) {if(null == excelBean) throw new RuntimeException("请求参数不能为空");if(null == excelBean.getWorkbook());HSSFWorkbook workbook = excelBean.getWorkbook();HSSFSheet sheet = workbook.createSheet(excelBean.getSheetName());HSSFRow row = sheet.createRow(0);List<String> columnNameList = excelBean.getColumnName();for(int i=0;i<columnNameList.size();i++){sheet.setColumnWidth(i, 20 * 256);HSSFCell cell = row.createCell(i);cell.setCellValue(columnNameList.get(i));}return sheet;}/*** @param destUrl 输出excel文件的路径,包括文件名* @param workbook 所有创建的sheet对应的workbook*/@Overridepublic void writeExcel(String destUrl, HSSFWorkbook workbook) {//将文件保存到指定的位置try {FileOutputStream fos = new FileOutputStream(destUrl);workbook.write(fos);System.out.println("写入成功");fos.close();} catch (IOException e) {e.printStackTrace();}}public static void main(String[] args) {HSSFWorkbook workbook = new HSSFWorkbook();ExcelInterface excel = new ExcelService();ExcelBean excelBean = new ExcelBean();excelBean.setWorkbook(workbook);excelBean.setSheetName("sheet1");String[] s = {"colum1","colum2","colum1","colum3"};excelBean.setColumnName(Arrays.asList(s));HSSFSheet sheet = excel.createSheet(excelBean);//仅仅写一行用与测试HSSFRow row1 = sheet.createRow(1);row1.createCell(0).setCellValue("zhansan");row1.createCell(1).setCellValue("zhansan1");row1.createCell(2).setCellValue("zhansan2");row1.createCell(3).setCellValue("zhansan3");excelBean.setSheetName("sheet2");String[] s1 = {"colum11","colum12","colum11","colum13"};excelBean.setColumnName(Arrays.asList(s1));sheet = excel.createSheet(excelBean);//仅仅写一行用与测试row1 = sheet.createRow(1);row1.createCell(0).setCellValue("lisi");row1.createCell(1).setCellValue("lisi1");row1.createCell(2).setCellValue("lisi2");row1.createCell(3).setCellValue("lisi3");excel.writeExcel("C:\\Users\\lenovo\\Desktop\\fileName.xls", workbook);}
}

定义单个产品页面的解析方法接口

package com.zybank.spring.gson.framework.service;import java.io.IOException;
import java.util.List;import org.apache.http.HttpException;import com.zybank.spring.gson.framework.bean.DJCommentSummary;
import com.zybank.spring.gson.framework.bean.DJCommentsBean;/**
* @author:zhangfd
* @version 1.0.0
* @date  2018年12月6日 上午11:21:53
* @description
*/
public interface AnalySiteInfoInterface {public List<DJCommentsBean> analySiteCommentsInfo(String path) throws HttpException, IOException;public DJCommentSummary analySiteProductCommentSummaryInfo(String path) throws HttpException, IOException;}
package com.zybank.spring.gson.framework.bean;import java.io.Serializable;/**
* @author:zhangfd
* @version 1.0.0
* @date  2018年12月6日 上午11:22:28
* @description
*/
public class DJCommentsBean implements Serializable{private static final long serialVersionUID = 1L;private String  userId;//用户idprivate String  creationTime;//评论时间private String content;//评论内容private String nickname;//会员号private String userLevelName;//会员等级private String score;//星星数private String productName;//产品名称private String color;//产品颜色private String afterUserCommentTime;//追加评论时间private String hAfterUserComment;//追加评论内容//所使用的客户端,如 来自京东iPhone客户端private String userClientShow ;// 是否使用手机评论的,true表示是,false表示否private String isMobile ;private String productSize;public String getUserId() {return userId;}public void setUserId(String userId) {this.userId = userId;}public String getCreationTime() {return creationTime;}public void setCreationTime(String creationTime) {this.creationTime = creationTime;}public String getContent() {return content;}public void setContent(String content) {this.content = content;}public String getNickname() {return nickname;}public void setNickname(String nickname) {this.nickname = nickname;}public String getUserLevelName() {return userLevelName;}public void setUserLevelName(String userLevelName) {this.userLevelName = userLevelName;}public String getScore() {return score;}public void setScore(String score) {this.score = score;}public String getProductName() {return productName;}public void setProductName(String productName) {this.productName = productName;}public String getColor() {return color;}public void setColor(String color) {this.color = color;}public String getAfterUserCommentTime() {return afterUserCommentTime;}public void setAfterUserCommentTime(String afterUserCommentTime) {this.afterUserCommentTime = afterUserCommentTime;}public String gethAfterUserComment() {return hAfterUserComment;}public void sethAfterUserComment(String hAfterUserComment) {this.hAfterUserComment = hAfterUserComment;}public String getUserClientShow() {return userClientShow;}public void setUserClientShow(String userClientShow) {this.userClientShow = userClientShow;}public String getIsMobile() {return isMobile;}public void setIsMobile(String isMobile) {this.isMobile = isMobile;}public String getProductSize() {return productSize;}public void setProductSize(String productSize) {this.productSize = productSize;}}
package com.zybank.spring.gson.framework.bean;import java.io.Serializable;
import java.util.Map;/*** @author:zhangfd* @version 1.0.0* @date 2018年12月6日 下午4:53:03* @description*/
public class DJCommentSummary implements Serializable{private static final long serialVersionUID = 1L;//热门评价统计,key是评价标签,value是该标签对应的评价数量private Map<String,String> hotCommentTagStatistics;//商品编号private String productId;//综合得分private String averageScore;//中评数private String generalCount;private String generalCountStr;//显示评论数private String showCount;private String showCountStr;//好评数private String goodCount;private String goodCountStr;//追加评价private String afterCount;private String afterCountStr;//视频晒单private String videoCount;private String videoCountStr;//差评数private String poorCount;private String poorCountStr;//全部评论数private String commentCount;private String commentCountStr;//默认好评数private String defaultGoodCount;private String defaultGoodCountStr;//好评率private String goodRate;private String goodRateShow;//一般评率private String generalRate;private String generalRateShow;//差评率private String poorRate;private String poorRateShow;private String goodRateStyle;private String poorRateStyle;private String generalRateStyle;/*private String skuId;private String skuIds;private String oneYear;*/public String getProductId() {return productId;}public void setProductId(String productId) {this.productId = productId;}public String getAverageScore() {return averageScore;}public void setAverageScore(String averageScore) {this.averageScore = averageScore;}public String getGeneralCount() {return generalCount;}public void setGeneralCount(String generalCount) {this.generalCount = generalCount;}public String getGeneralCountStr() {return generalCountStr;}public void setGeneralCountStr(String generalCountStr) {this.generalCountStr = generalCountStr;}public String getShowCount() {return showCount;}public void setShowCount(String showCount) {this.showCount = showCount;}public String getShowCountStr() {return showCountStr;}public void setShowCountStr(String showCountStr) {this.showCountStr = showCountStr;}public String getGoodCount() {return goodCount;}public void setGoodCount(String goodCount) {this.goodCount = goodCount;}public String getGoodCountStr() {return goodCountStr;}public void setGoodCountStr(String goodCountStr) {this.goodCountStr = goodCountStr;}public String getAfterCount() {return afterCount;}public void setAfterCount(String afterCount) {this.afterCount = afterCount;}public String getAfterCountStr() {return afterCountStr;}public void setAfterCountStr(String afterCountStr) {this.afterCountStr = afterCountStr;}public String getVideoCount() {return videoCount;}public void setVideoCount(String videoCount) {this.videoCount = videoCount;}public String getVideoCountStr() {return videoCountStr;}public void setVideoCountStr(String videoCountStr) {this.videoCountStr = videoCountStr;}public String getPoorCount() {return poorCount;}public void setPoorCount(String poorCount) {this.poorCount = poorCount;}public String getPoorCountStr() {return poorCountStr;}public void setPoorCountStr(String poorCountStr) {this.poorCountStr = poorCountStr;}public String getCommentCount() {return commentCount;}public void setCommentCount(String commentCount) {this.commentCount = commentCount;}public String getCommentCountStr() {return commentCountStr;}public void setCommentCountStr(String commentCountStr) {this.commentCountStr = commentCountStr;}public String getDefaultGoodCount() {return defaultGoodCount;}public void setDefaultGoodCount(String defaultGoodCount) {this.defaultGoodCount = defaultGoodCount;}public String getDefaultGoodCountStr() {return defaultGoodCountStr;}public void setDefaultGoodCountStr(String defaultGoodCountStr) {this.defaultGoodCountStr = defaultGoodCountStr;}public String getGoodRate() {return goodRate;}public void setGoodRate(String goodRate) {this.goodRate = goodRate;}public String getGoodRateShow() {return goodRateShow;}public void setGoodRateShow(String goodRateShow) {this.goodRateShow = goodRateShow;}public String getGeneralRate() {return generalRate;}public void setGeneralRate(String generalRate) {this.generalRate = generalRate;}public String getGeneralRateShow() {return generalRateShow;}public void setGeneralRateShow(String generalRateShow) {this.generalRateShow = generalRateShow;}public String getPoorRate() {return poorRate;}public void setPoorRate(String poorRate) {this.poorRate = poorRate;}public String getPoorRateShow() {return poorRateShow;}public void setPoorRateShow(String poorRateShow) {this.poorRateShow = poorRateShow;}public String getGoodRateStyle() {return goodRateStyle;}public void setGoodRateStyle(String goodRateStyle) {this.goodRateStyle = goodRateStyle;}public String getPoorRateStyle() {return poorRateStyle;}public void setPoorRateStyle(String poorRateStyle) {this.poorRateStyle = poorRateStyle;}public String getGeneralRateStyle() {return generalRateStyle;}public void setGeneralRateStyle(String generalRateStyle) {this.generalRateStyle = generalRateStyle;}public Map<String, String> getHotCommentTagStatistics() {return hotCommentTagStatistics;}public void setHotCommentTagStatistics(Map<String, String> hotCommentTagStatistics) {this.hotCommentTagStatistics = hotCommentTagStatistics;}}

给出实现类:

package com.zybank.spring.gson.framework.service.imp;import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;import org.apache.http.HttpEntity;
import org.apache.http.HttpException;
import org.apache.http.HttpStatus;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;import com.google.gson.JsonArray;
import com.google.gson.JsonElement;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import com.zybank.spring.gson.framework.bean.DJCommentsBean;
import com.zybank.spring.gson.framework.bean.DJCommentSummary;
import com.zybank.spring.gson.framework.service.AnalySiteInfoInterface;/**
* @author:zhangfd
* @version 1.0.0
* @date  2018年12月6日 上午11:26:08
* @description
*/
public class AnalySiteInfo implements AnalySiteInfoInterface{@Overridepublic List<DJCommentsBean> analySiteCommentsInfo(String path) throws HttpException, IOException {CloseableHttpResponse response = connectionDJ(path);if(null == response) throw new RuntimeException("连接DJ网站异常");HttpEntity entity = response.getEntity();InputStream input = entity.getContent();BufferedReader br = new BufferedReader(new InputStreamReader(input,"GBK"));String line = null;String productName = "";List<DJCommentsBean> beanList = new ArrayList<DJCommentsBean>();while ((line = br.readLine()) != null) {String reqBody = line.toString().substring(26, line.length() - 2);JsonParser parser = new JsonParser();try {JsonObject json = parser.parse(reqBody).getAsJsonObject();JsonArray commentsJsonArray = json.getAsJsonArray("comments");int length = commentsJsonArray.size(); for (int i = 0; i < length; i++) {JsonElement element = commentsJsonArray.get(i);//评价内容String content = element.getAsJsonObject().get("content").toString();//用户编号 ,如430221650String userId= element.getAsJsonObject().get("id").toString();//创建时间,如2018-09-30 09:36:38String creationTime = element.getAsJsonObject().get("creationTime").toString();//所使用的客户端,如 来自京东iPhone客户端String userClientShow = element.getAsJsonObject().get("userClientShow").toString();// 是否使用手机评论的,true表示是,false表示否String isMobile = element.getAsJsonObject().get("isMobile").toString();//产品颜色String productColor =  element.getAsJsonObject().get("productColor").toString();//会员名称String nickname = element.getAsJsonObject().get("nickname").toString();//会员等级String userLevelName = element.getAsJsonObject().get("userLevelName").toString();//评分星星数String score = element.getAsJsonObject().get("score").toString();String referenceName = element.getAsJsonObject().get("referenceName").toString();if(null != referenceName && referenceName.length() >0) {String[] ls = referenceName.split(" ");productName = ls[0]/*+ls[1]+ls[2]*/;}String productSize =  element.getAsJsonObject().get("productSize").toString();//获取追加评论JsonElement afterUserComment = element.getAsJsonObject().get("afterUserComment");String afterUserCommentTime = null;String hAfterUserComment = null;if(null != afterUserComment) {afterUserCommentTime = afterUserComment.getAsJsonObject().get("created").toString();hAfterUserComment = afterUserComment.getAsJsonObject().get("hAfterUserComment").getAsJsonObject().get("content").toString();}DJCommentsBean jingDongBean = new DJCommentsBean();jingDongBean.setUserId(userId);jingDongBean.setCreationTime(creationTime.substring(1, 17));jingDongBean.setContent(content.substring(1, content.length() - 1));jingDongBean.setNickname(nickname.substring(1, nickname.length()-1));jingDongBean.setUserLevelName(userLevelName);jingDongBean.setScore("star"+score);jingDongBean.setColor(productColor.substring(1, productColor.length()-1));jingDongBean.setProductName(productName.substring(1,productName.length()));if(null != afterUserCommentTime) {jingDongBean.setAfterUserCommentTime(afterUserCommentTime.substring(1, 17));}else {jingDongBean.setAfterUserCommentTime("");}if(null != hAfterUserComment) {jingDongBean.sethAfterUserComment(hAfterUserComment.substring(1, hAfterUserComment.length()-1));}else {jingDongBean.sethAfterUserComment("");}jingDongBean.setIsMobile(isMobile);jingDongBean.setUserClientShow(userClientShow);jingDongBean.setProductSize(productSize);beanList.add(jingDongBean);}} catch (Exception e) {e.printStackTrace();}}return beanList;}@Overridepublic DJCommentSummary analySiteProductCommentSummaryInfo(String path) throws HttpException, IOException {CloseableHttpResponse response = connectionDJ( path);if(null == response) throw new RuntimeException("连接DJ网站异常");HttpEntity entity = response.getEntity();InputStream input = entity.getContent();BufferedReader br = new BufferedReader(new InputStreamReader(input,"GBK"));String line = null;DJCommentSummary  dJProductCommentSummary = null;while ((line = br.readLine()) != null) {String reqBody = line.toString().substring(26, line.length() - 2);JsonParser parser = new JsonParser();try {JsonObject json = parser.parse(reqBody).getAsJsonObject();JsonElement element = json.get("productCommentSummary");dJProductCommentSummary = new DJCommentSummary();dJProductCommentSummary.setProductId(element.getAsJsonObject().get("productId").toString()); dJProductCommentSummary.setAverageScore(element.getAsJsonObject().get("averageScore").toString());dJProductCommentSummary.setGeneralCount(element.getAsJsonObject().get("generalCount").toString());dJProductCommentSummary.setShowCount(element.getAsJsonObject().get("showCount").toString());dJProductCommentSummary.setGoodCount(element.getAsJsonObject().get("goodCount").toString());dJProductCommentSummary.setAfterCount(element.getAsJsonObject().get("afterCount").toString());dJProductCommentSummary.setVideoCount(element.getAsJsonObject().get("videoCount").toString());dJProductCommentSummary.setPoorCount(element.getAsJsonObject().get("poorCount").toString());dJProductCommentSummary.setCommentCount(element.getAsJsonObject().get("commentCount").toString());dJProductCommentSummary.setDefaultGoodCount(element.getAsJsonObject().get("defaultGoodCount").toString());dJProductCommentSummary.setGoodRate(element.getAsJsonObject().get("goodRate").toString());dJProductCommentSummary.setGeneralRate(element.getAsJsonObject().get("generalRate").toString());dJProductCommentSummary.setPoorRate(element.getAsJsonObject().get("poorRate").toString());Map<String,String> maps = new HashMap<String,String>();JsonArray commentsJsonArray = json.getAsJsonArray("hotCommentTagStatistics");int length = commentsJsonArray.size(); for (int i = 0; i < length; i++) {JsonElement element01 = commentsJsonArray.get(i);//评价内容String name = element01.getAsJsonObject().get("name").toString();name =name.substring(1, name.length()-1);String count = element01.getAsJsonObject().get("count").toString();maps.put(name,count);}dJProductCommentSummary.setHotCommentTagStatistics(maps);} catch (Exception e) {e.printStackTrace();}}return dJProductCommentSummary;}private CloseableHttpResponse connectionDJ(String path)  throws HttpException, IOException{HttpGet httpGet = new HttpGet(path);httpGet.setHeader("Connection", "keep-alive");httpGet.setHeader("User-A", "Mozilla/5.0 (Windows NT 6.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2");CloseableHttpClient httpCilent = HttpClients.createDefault();CloseableHttpResponse response = httpCilent.execute(httpGet);int statusCode = response.getStatusLine().getStatusCode();if(statusCode == HttpStatus.SC_OK) {return response;}return null;}}

下面给出解析单个产品评论信息的demo,其实还有一些其他信息,根据个人需求,可自行选择

package com.zybank.spring.gson.framework;import static org.springframework.test.web.servlet.result.MockMvcResultMatchers.forwardedUrl;import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.Set;import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;import com.zybank.spring.gson.framework.bean.DJCommentSummary;
import com.zybank.spring.gson.framework.bean.DJCommentsBean;
import com.zybank.spring.gson.framework.bean.ExcelBean;
import com.zybank.spring.gson.framework.service.AnalySiteInfoInterface;
import com.zybank.spring.gson.framework.service.ExcelInterface;
import com.zybank.spring.gson.framework.service.imp.AnalySiteInfo;
import com.zybank.spring.gson.framework.service.imp.ExcelService;/*** @author:zhangfd* @version:1.0* @date:2018* @description:*/
public class GsonTest6 {public static List<DJCommentsBean> gsonBeanList = new ArrayList<DJCommentsBean>();public static void getProductInfo(String productId) throws Exception {//String jingDongUrl = "https://item.jd.com/"+productId+".html";/*String productId = jingDongUrl.substring(jingDongUrl.lastIndexOf("/")+1);//产品编号productId = productId.substring(0,productId.length()-5);*/String sortType ="5";//5-推荐排序,6-按时间排序String destUrl = "C:\\Users\\lenovo\\Desktop\\excel\\"+productId+".xls";HSSFWorkbook workbook = new HSSFWorkbook();ExcelInterface excel = new ExcelService();DJCommentSummary commentSummary =     selectAll(productId,sortType,10,"1","1");createSheetForCommentSumary(workbook,excel,commentSummary);int totoalCount = 500;for(int j=0;j<6;j++) {//score表示评论的类型(好评为3 中评为2 差评为1 晒图为4 全部评论为0 追评为5 视频晒单为7 )String score = j+"";if("3".equals(score)) {if(null !=commentSummary && null != commentSummary.getGoodCount())totoalCount = Integer.valueOf(commentSummary.getGoodCount())+10;}else if("2".equals(score)) {if(null !=commentSummary && null != commentSummary.getGeneralCount())totoalCount = Integer.valueOf(commentSummary.getGeneralCount())+10;}else if("1".equals(score)) {if(null !=commentSummary && null != commentSummary.getPoorCount())totoalCount = Integer.valueOf(commentSummary.getPoorCount())+10;}else if("0".equals(score)) {if(null !=commentSummary && null != commentSummary.getCommentCount())totoalCount = Integer.valueOf(commentSummary.getCommentCount())+10;}else if("4".equals(score)) {if(null !=commentSummary && null != commentSummary.getVideoCount())totoalCount = Integer.valueOf(commentSummary.getVideoCount())+10;}else if("5".equals(score)) {if(null !=commentSummary && null != commentSummary.getAfterCount())totoalCount = Integer.valueOf(commentSummary.getAfterCount())+10;}//防止评论数据一下子下载过多,强制限制最多1000条totoalCount = totoalCount>1000?1000:totoalCount;//2、查询一个产品的一个评论结果,存放在集合中selectAll(productId,sortType,totoalCount,score,"0");//3、把查询结果放入sheet中createSheetForComment(score, workbook, excel);}//4、把所有创建的sheet写入到excel中excel.writeExcel(destUrl, workbook);}/*** * @param productId * @param callback* @param totoalCount* @param score* @throws Exception*/private static DJCommentSummary selectAll(String productId,String sortType,int totoalCount,String score,String operType) throws Exception{AnalySiteInfoInterface analySite = new AnalySiteInfo();String page="0"; //查询的页数,默认第0页if("0".equals(operType)) {for(int i=0;i<totoalCount/10;i++){page = i +"";String path = "https://sclub.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98vv7206&productId="+productId+"&score="+score+"&sortType="+sortType+"&page="+page+"&pageSize=10&isShadowSku=0&fold=1";List<DJCommentsBean> ff = analySite.analySiteCommentsInfo(path);gsonBeanList.addAll(ff); }}else if("1".equals(operType)) {String path = "https://sclub.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98vv7206&productId="+productId+"&score="+score+"&sortType="+sortType+"&page="+page+"&pageSize=10&isShadowSku=0&fold=1";DJCommentSummary commentSummary = analySite.analySiteProductCommentSummaryInfo(path);return commentSummary;}return null;}private static void createSheetForCommentSumary(HSSFWorkbook workbook, ExcelInterface excel,DJCommentSummary commentSummary){ExcelBean excelBean = new ExcelBean();excelBean.setWorkbook(workbook);excelBean.setSheetName("商品评价总览");List<String> lables = new ArrayList<String>();lables.add(0, "商品名称");lables.add(1, "好评率");lables.add(2, "一般评率");lables.add(3, "差评率");Map<String,String> maps = null;if(null != commentSummary )maps =  commentSummary.getHotCommentTagStatistics();if(null != maps ) {Object[] keySet = maps.keySet().toArray();for(int j=0;j<keySet.length;j++) {lables.add(j+4,keySet[j].toString());}}excelBean.setColumnName(lables);HSSFSheet sheet = excel.createSheet(excelBean);HSSFRow row1 = sheet.createRow(1);if(null != commentSummary) {row1.createCell(0).setCellValue(commentSummary.getProductId());row1.createCell(1).setCellValue(commentSummary.getGoodRate());row1.createCell(2).setCellValue(commentSummary.getGeneralRate());row1.createCell(3).setCellValue(commentSummary.getPoorRate());}for(int i=4;i< lables.size(); i++) {if(null != maps)row1.createCell(i).setCellValue(maps.get(lables.get(i)));}}private static void createSheetForComment(String score, HSSFWorkbook workbook, ExcelInterface excel) {ExcelBean excelBean = new ExcelBean();excelBean.setWorkbook(workbook);excelBean.setSheetName(ScoreType.getNameByIndex(score));String[] s = {"会员名称","评价星级","评价时间","产品名称","产品颜色","评价内容","追加评价时间","追加评价内容"};excelBean.setColumnName(Arrays.asList(s));HSSFSheet sheet = excel.createSheet(excelBean);if(null != gsonBeanList && gsonBeanList.size() >0) {for (int i = 0; i < gsonBeanList.size(); i++) {HSSFRow row1 = sheet.createRow(i + 1);DJCommentsBean user = gsonBeanList.get(i);//创建单元格设值row1.createCell(0).setCellValue(user.getNickname());row1.createCell(1).setCellValue(user.getScore());row1.createCell(2).setCellValue(user.getCreationTime());row1.createCell(3).setCellValue(user.getProductName());row1.createCell(4).setCellValue(user.getColor());row1.createCell(5).setCellValue(user.getContent());row1.createCell(6).setCellValue(user.getAfterUserCommentTime());row1.createCell(7).setCellValue(user.gethAfterUserComment());}}gsonBeanList = new ArrayList<DJCommentsBean>();}}

下面给出 搜索的demo,即根据关键字,搜索出排在前面的产品信息,再循环调用上面的类,可一次性得出多个产品的评论信息

package com.zybank.spring.json;import java.io.IOException;
import java.util.ArrayList;
import java.util.List;import com.zybank.spring.gson.framework.GsonTest6;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Attributes;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
/**
* @author:zhangfd
* @version 1.0.0
* @date  2018年12月12日 下午5:02:53
* @description
*/
public class JsonTest1 {public static void main(String[] args) throws Exception {//只需在这里输入关键字即可String keyword = "手机";String psort = "3" ; //1-价格 高到底;2-价格 底到高,3-销量 高到底 4-评论数  高到底  5-新品String serachUrl = "https://search.jd.com/Search?keyword="+keyword+"&enc=utf-8&qrst=1&rt=1&stop=1&vt=2&cid2=653&cid3=655&click=0";if(null != psort && psort.length() >0)serachUrl = serachUrl +"&psort="+psort;//搜索出在第一页中所有 通过关键字搜到的产品List<String> productList= getProductList(serachUrl);for (String productId : productList) {GsonTest6 gsonTest6 = new GsonTest6();//获取指定产品的评论等信息,输出到excel文件中gsonTest6.getProductInfo(productId);Thread.sleep(100);}}public static List<String> getProductList(String serachUrl) throws IOException{Document doc = Jsoup.connect(serachUrl).get(); Elements el =  doc.select("div[id=J_goodsList]");Elements urls =  el.select("li[class=gl-item]");List<String> productList = new ArrayList<String>();for(Element e:urls){Attributes attr= e.attributes();productList.add(attr.get("data-sku"));}return productList;}}

京东商城网页数据爬取相关推荐

  1. day16-简单网页数据爬取

    day16-简单网页数据爬取 1.练习 """ 将100以内的素数输出到一个文件中""" def is_prime(num:int)-> ...

  2. 使用Java IO流实现网页数据爬取(一)

    使用Java实现网页数据爬取(IO流) 第一阶段:爬取网页源码及所有链接地址 引入代码步骤: 1.将ClimbImg.java,Demo.java文件导入 ClimbImg.java 爬取网页雏形 : ...

  3. python循环爬取页面_使用for或while循环来处理处理不确定页数的网页数据爬取

    本文转载自以下网站: Python For 和 While 循环爬取不确定页数的网页  https://www.makcyun.top/web_scraping_withpython16.html 需 ...

  4. Scrapy框架爬虫项目:京东商城笔记本电脑信息爬取

    一.创建Scrapy项目 在cmd中输入一下指令创建一个新的scrapy项目及一个爬虫 scrapy startproject JD_Goodscd JD_Goodsscrapy genspider ...

  5. python获取网页数据对电脑性能_【Python】网页数据爬取实战

    由于网页结构跟之前有变化,还不是很熟悉.代码待完善,问题记录: 腾讯新闻二级网页内容爬取有问题. 链家网站头文件没有用到. 爬取一条腾讯视频的header内容,存入txt.要求: 包含网页链接 包含t ...

  6. Windows下利用python+selenium+firefox爬取动态网页数据(爬取东方财富网指数行情数据)

    由于之前用urlib和request发现只能获取静态网页数据,目前爬取动态网页有两种方法, (1)分析页面请求 (2)Selenium模拟浏览器行为(霸王硬上弓),本文讲的就是此方法 一.安装sele ...

  7. 【Python爬虫】5行代码破解验证码+网页数据爬取全步骤详细记录

    文章目录 前言 一.抓包分析 二.编写模块代码 1.引入库 2.获取验证码图片 3.识别验证码 4.爬取列表页 5.爬取详情页 6.完整代码 总结 1.TIPS 2.如需交流,可在代码头找到我,或者用 ...

  8. python爬取数据总结_2020-10-23Python——网页数据爬取知识总结

    一.爬虫请求方法 1.模块名:urllib.resquest 2.导入的方式: import urllib.resquest from urllib import request 3.使用的方法 re ...

  9. 京东商城百万数据抓取--苏宁易购,淘宝网,京东商城,百万级价格数据海量抓取

    按照惯例先上成果: 过了分割线就是代码 ps:2020.5.14更新了代码:京东商城每周都会更改规则咱们也不能落后 # -*- coding: utf-8 -*- import requests im ...

  10. 使用Beautiful Soup和lxml轻松搞掂网页数据爬取

    其实这类文章很多了,但还是简要记录一下. 三个黄金搭档:Beautiful Soup.lxml和requests Python标准库: BeautifulSoup(markup, 'html.pars ...

最新文章

  1. 算法学习之路|统计同成绩学生
  2. LCD 显示异常定位分析方法
  3. python能不能用c打开文件_C/C++/Python等 使用二进制模式打开文件与不使用二进制模式的区别...
  4. android5.1 sdk version,java - Android SDK version 23.6 - Stack Overflow
  5. 高级软件工程第四次作业:两只小熊队团队作业
  6. CSS中的margin、border、padding区别 CSS padding margin border属性详解
  7. 洛谷——P1164 小A点菜
  8. 重新写博+linux查找系列
  9. 多目录多源文件的驱动Makefile模板
  10. Java 实现线性运动界面_java 实现顺序结构线性列表
  11. 编译Qtopia2.2.0
  12. Silverlight调用一般性处理程序模拟Silverlight调用WCF效果(2)
  13. 2018.10.24
  14. 9.企业安全建设指南(金融行业安全架构与技术实践) --- 安全认证
  15. 知识回顾之一:WEB编程语言发展回顾...
  16. 线性代数及其应用(第三版)1.3节习题解答
  17. 管家婆财贸双全 凭证记账 Date exceeds maximum of 19-12-31 报错解决办法
  18. 实战演习(四)——网络流量系统分析简介
  19. 2008年世界各国GDP排名
  20. 苹果手机3D-Touch这个功能,其实是吃鸡神器!

热门文章

  1. 全网疯传,谷歌BAT员工「LeetCode刷题手册」,1400+超详细算法题讲解。
  2. python idle是什么_idle是什么意思
  3. Java double value_Java Double doubleValue()用法及代码示例
  4. 背单词APP测试与评估(百词斩vs扇贝)
  5. 【历史上的今天】7 月 18 日:英特尔成立;万维网上传了第一张照片;eBay 分拆 PayPal
  6. WEB前端面试选择题解答
  7. 抖音、吃鸡、王者荣耀:你的自律,是如何被顶级产品经理一步一步毁掉的
  8. matlab output()函数,matlab 函数y=f(input,output)该如何实现?
  9. 计算机整理碎片有用吗,电脑磁盘碎片整理有什么用?需要经常整理吗?
  10. linux解决笔记本pwm背光,担心PWM调光屏幕闪瞎眼?联想这些ThinkPad笔记本要注意...