网络爬虫-爬取微博热门话题前15个

用java+webdriver+testng实现获取微博热门话题前15个，包括话题排名、标题、阅读量、内容，写入txt文件功能

前提条件：

已安装好java环境，工程导入了webdriver的jar包和testng的jar包

代码如下：

第一：新建PublicModel类，该类中实现了写入txt的文件功能和初始化方法

 1 package com.ustc.publics;
 2
 3 import java.io.File;
 4 import java.io.FileOutputStream;
 5 import java.io.IOException;
 6 import java.util.ArrayList;
 7 import java.util.HashMap;
 8
 9 import org.openqa.selenium.WebDriver;
10 import org.openqa.selenium.ie.InternetExplorerDriver;
11
12 public class PublicModel {
13     public static WebDriver driver;
14
15     /**
16      * 初始化方法
17      */
18     public static void initModel() {
19         driver = new InternetExplorerDriver();
20          /*driver.manage().timeouts().implicitlyWait(3, TimeUnit.SECONDS);*/
21         driver.manage().window().maximize();
22     }
23
24
25
26     /**
27      * 写入txt文件方法数组
28      *
29      * @param hotTopics
30      *            hashmap的数组内容
31      * @param file
32      *            文件名称
33      * @throws IOException
34      */
35     public static void writeContent(ArrayList<HashMap<String, String>> hotTopics, String file) throws IOException {
36         /* 文件名：当前工程路径+result+file.txt */
37         String filename = System.getProperty("user.dir") + File.separator + "result" + File.separator + file + ".txt";
38         FileOutputStream fis = new FileOutputStream(filename);
39
40         /* 遍历arrayList的hashMap内容，按行写入txt文件 */
41         for (int i = 0; i < hotTopics.size(); i++) {
42             byte[] a = hotTopics.get(i).toString().getBytes();
43             fis.write(a);
44             fis.write('\n');
45         }
46         fis.close();
47     }
48
49 }

第二：新建BlogTopic类，该类继承了PublicModel类，实现功能为获取微博热门话题15个，包括话题排名、标题、阅读量、内容

 1 package com.ustc.base;
 2
 3 import java.util.ArrayList;
 4 import java.util.HashMap;
 5 import java.util.List;
 6
 7 import org.openqa.selenium.By;
 8 import org.openqa.selenium.WebElement;
 9 import org.testng.annotations.AfterClass;
10 import org.testng.annotations.BeforeClass;
11 import org.testng.annotations.Test;
12
13 import com.ustc.publics.PublicModel;
14
15
16 public class BlogTopic extends PublicModel {
17
18     @BeforeClass
19     public void setUp() {
20         initModel();
21     }
22
23     /**
24      * 获取微博热门话题前15个，包括话题排名、标题、阅读量、内容，写入txt文件
25      * @throws Exception
26      */
27     @Test
28     public void getHotTopic() throws Exception {
29         String url = "http://d.weibo.com/100803?cfs=&Pl_Discover_Pt6Rank__5_filter=hothtlist_type%3D1#_0";
30         driver.get(url);
31         /* 获取微博热门话题根节点 */
32         WebElement rootNode = driver.findElement(By.id("Pl_Discover_Pt6Rank__5"))
33                 .findElement(By.cssSelector("ul[class^='pt_ul']"));
34         List<WebElement> nodes = rootNode.findElements(By.cssSelector("li[class^='pt_li']"));
35         /* 遍历添加话题排名、标题、阅读数、内容到数组中 */
36         ArrayList<HashMap<String, String>> hotTopics = new ArrayList<HashMap<String, String>>();
37         for (WebElement node : nodes) {
38             HashMap<String, String> topic = new HashMap<String, String>();
39             topic.put("正文链接", node.findElement(By.className("S_txt1")).getAttribute("href").toString());
40             topic.put("阅读量", node.findElement(By.className("number")).getText());
41             topic.put("话题排名", node.findElement(By.cssSelector("span[class^='DSC_topicon']")).getText());
42             topic.put("标题", node.findElement(By.className("S_txt1")).getText());
43             hotTopics.add(topic);
44         }
45         /*数组数据写入txt*/
46         writeContent(hotTopics,"blogtopic");
47     }
48
49     @AfterClass
50     public void quit() {
51         driver.quit();
52     }
53
54 }

第三：配置testng.xml文件

1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd">
3 <suite name="Suite" parallel="false">
4   <test name="Test">
5     <classes>
6       <class name="com.ustc.base.BlogTopic"/>    <!--1：微博热门话题  -->
7     </classes>
8   </test> <!-- Test -->
9 </suite> <!-- Suite -->

运行testng.xml结果为：
项目路径result目录下生成了一个文件：blogtopic.txt，内容如下：

转载于:https://www.cnblogs.com/miaomiaokaixin/p/5974288.html

网络爬虫-爬取微博热门话题前15个相关推荐

Python网络爬虫爬取虎扑步行街爆照区话题
Python网络爬虫爬取虎扑步行街爆照区话题作者:郜科科最近的任务挺多的,但是心情很烦躁,想做一些自己喜欢的东西,前些时候感觉Python爬虫很好玩,就自学了一下.本人比较喜欢逛街--虎扑步行街, ...
python爬取微博热门消息（三）—— 爬取微博热门信息的功能函数
微博的热搜榜对于研究大众的流量有非常大的价值. 今天的教程就来说说如何爬取微博的热搜榜. 感兴趣的小伙伴可以收藏 + 关注哦! 另外,关于本项目的效果展示,以及教程,点击一下链接即可. pytho ...
python跑一亿次循环_python爬虫爬取微博评论
原标题:python爬虫爬取微博评论 python爬虫是程序员们一定会掌握的知识,练习python爬虫时,很多人会选择爬取微博练手.python爬虫微博根据微博存在于不同媒介上,所爬取的难度有差异,无 ...
python爬虫微博评论图片_python爬虫爬取微博评论
原标题:python爬虫爬取微博评论 python爬虫是程序员们一定会掌握的知识,练习python爬虫时,很多人会选择爬取微博练手.python爬虫微博根据微博存在于不同媒介上,所爬取的难度有差异,无 ...
php抓取微博评论,python爬虫爬取微博评论案例详解
前几天,杨超越编程大赛火了,大家都在报名参加,而我也是其中的一员. 在我们的项目中,我负责的是数据爬取这块,我主要是把对于杨超越的每一条评论的相关信息. 数据格式:{"name" ...
网络爬虫爬取拉勾招聘网
网络爬虫爬取拉勾招聘网搭配好环境复制以下代码 # -*- coding: utf-8 -*- """ Created on Mon Sep 7 21:44:39 20 ...
python爬取微博热门消息（一）——效果展示
微博的热搜榜对于研究大众的流量有非常大的价值. 今天的教程就来说说如何爬取微博的热搜榜. 感兴趣的小伙伴可以收藏 + 关注哦! 另外,关于本项目的效果展示,以及教程,点击一下链接即可. pytho ...
用八爪鱼爬取微博热门评论
关于八爪鱼八爪鱼, 是一款简单易操作的爬虫工具.当然这种工具也有一定的局限性,可定制性肯定没有用一些爬虫框架(scrapy等)好(毕竟代码是自己写的,哈哈). 使用八爪鱼最近,使用八爪鱼爬取了一些 ...
python网络爬虫_Python网络爬虫——爬取视频网站源视频！
原标题:Python网络爬虫--爬取视频网站源视频! 学习前提 1.了解python基础语法 2.了解re.selenium.BeautifulSoup.os.requests等python第三方库 ...

网络爬虫-爬取微博热门话题前15个

网络爬虫-爬取微博热门话题前15个相关推荐

最新文章

热门文章