随手写个node爬虫

以下案例是用node爬取百度传课，获取免费视频课程信息，并下载展示图片

const fs = require('fs');
const fetch = require('node-fetch');
const cheerio = require('cheerio');
const URL = require('url');
const url = 'https://chuanke.baidu.com/course/72351176577777664__cost_asc___2.html';const protocol = URL.parse(url).protocol;
const host = URL.parse(url).host;const fsName = 'java_course.txt';
const imgPath = './images/';
let page = 1;startSpider();function startSpider() {console.log('爬虫开始...');fs.writeFileSync(fsName, '百度传课' + '\n');itemSpider(url);
}function itemSpider(url) {  // 单页爬虫(async function () {try {console.log('当前页面', url);let html = await fetch(url).then(res => res.text());fs.appendFileSync(fsName, '这是第' + page + '页');page += 1;let $ = cheerio.load(html);queryData($); // 处理数据if (page < 50) {setTimeout(() => {spiderNext($); // 继续下一页}, 1000);}} catch (exception) {console.log('出错了:', exception);}})();
}function queryData($) {try {let panels = $('.item-panel');panels.map((index, item) => {let tittle = $(item).find('.item-title a').text();let href = protocol + $(item).find('.item-title a').attr('href');let price = $(item).find('.price span').text();let text = tittle + ' (' + price + ') ' + href + '\n';let src = protocol + $(item).find('img').attr('src');downImg(tittle, src);fs.appendFile(fsName, text, (err) => {if (!err) {console.log(tittle);}});});} catch (exception) {console.log(exception);}
}function spiderNext($) {let nextUrl = protocol + '//' + host + $('.ck-page .next').attr('href');itemSpider(nextUrl);
}function downImg(tittle, src) {try {fetch(src).then(res => {res.body.pipe(fs.createWriteStream(imgPath + tittle + '.jpg'));});} catch (exception) {console.log(exception);}
}

随手写个node爬虫相关推荐

node爬取app数据_从零开始写一个node爬虫（上）—— 数据采集篇
爬虫相信大家都知道,这里我们从一个空的文件夹开始,也来写一写我们自己的爬虫程序吧. github入口下一篇--数据分析篇入口爬虫毕竟涉及到数据的爬取,所以其实有一个道德的约束,那就是Robots协 ...
superagent post php,Node爬虫——利用superagent模拟登陆
一.概述最近学习了node,试着写了个爬虫,这是模拟登陆的一部分. 1.需要的工具 2.superagent用法的简述 3.抓包分析 4.最小示例二.需要的工具 nodejs,superagent ...
node爬虫puppeteer使用
文章目录 node 爬虫 puppeteer 使用记录一个实战 demo 开发先看文档启动浏览器官网的入门 demo puppeteer.launch page.goto 获取页面上的元素剩下 ...
node爬虫实现文件下载，访问网址
node爬虫实现文件下载, 访问网址试了下 Node写爬虫,访问速度好像比Java快好多. 同一目录下新建index.js和model.js index.js const model = requi ...
node爬虫爬取小说
node爬虫爬取小说 node爬虫爬取小说直接上代码 node爬虫爬取小说最近发现自己喜欢的一个小说无法下载,网页版广告太多,操作太难受,只能自己写个爬虫把内容爬下来放在阅读器里面看项目下载地址 ...
手把手教你写电商爬虫-第二课实战尚妆网分页商品采集爬虫
系列教程手把手教你写电商爬虫-第一课找个软柿子捏捏如果没有看过第一课的朋友,请先移步第一课,第一课讲了一些基础性的东西,通过软柿子"切糕王子"这个电商网站好好的练了一次手,相 ...
php和python写爬虫-一个简单的Python写的XML爬虫
一个简单的Python写的XML爬虫来源:程序员人生发布时间:2013-11-06 16:22:29 阅读次数:1578次原理很简单,读XML结构,返回值,判断,根据返回的值得到下一个XML的地 ...
python爬虫都能干什么用_5 行代码就能写一个 Python 爬虫
欢迎关注我的公众号:第2大脑,或者博客:高级农民工,阅读体验更好. 摘要:5 行代码就能写一个 Python 爬虫. 如果你是比较早关注我的话,会发现我此前的大部分文章都是在写 Python 爬虫,前 ...
python简单爬虫代码-python爬虫超简单攻略，带你写入门级的爬虫，抓取上万条信息...
原标题:python爬虫超简单攻略,带你写入门级的爬虫,抓取上万条信息最近经常有人问我,明明看着教程写个爬虫很简单,但是自己上手的时候就麻爪了...那么今天就给刚开始学习爬虫的同学,分享一下怎么一步 ...

随手写个node爬虫

随手写个node爬虫相关推荐

最新文章

热门文章