python多线程queue_python多线程+队列（提高爬虫时效性）

#仅供学习使用，如有侵权请留言删除

from queue import Queue

import requests

from bs4 import BeautifulSoup

import time

import threading

q = Queue()

'''

队列使用，

.queue 查看队列内容

.get() 获取队列内容

.put()添加队列内容

'''

headers = {

"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36"

}

#生产者

def req_list_page():

'''

请求文章列表

:return:

'''

#判断队列是否为空

print("生产者")

print("获取首页内容，获取link添加队列中")

for i in range(1,3):

url = 'https://club.autohome.com.cn/o/bbs/forum-c-4410-{}.html#pvareaid=6830274'.format(i)

res = requests.get(url,headers=headers).text

soup =BeautifulSoup(res,'lxml')

for item in soup.select('dl[class="list_dl"]') :

# print(item)

try:

link = 'https://club.autohome.com.cn'+item.select('dt > a')[0].get('href')

q.put(link)

except :

pass

print("初始队列状态")

# print(q.queue)

print(q.qsize())

time.sleep(3)

#消费者

def req_info_page():

'''

:return:

'''

while True:

#当队列没有内容了则终止

if q.empty():

break

else:

print("消费者")

link = q.get()

print("请求了：{}".format(link))

# print(q.queue)

print(q.qsize())

time.sleep(1)

print(1111)

if __name__ == '__main__':

start = time.time()

#创建线程,一个生产，两个消费

product = threading.Thread(target=req_list_page)

consume1 = threading.Thread(target=req_info_page)

consume2 = threading.Thread(target=req_info_page)

consume3 = threading.Thread(target=req_info_page)

consume4 = threading.Thread(target=req_info_page)

consume5 = threading.Thread(target=req_info_page)

#启动线程

#获取所有link（生产者）

product.start()

product.join()

#同时消费5个队列内容

consume1.start()

consume2.start()

consume3.start()

consume4.start()

consume5.start()

#设置守护线程，子线程执行完毕，主线程结束

consume1.join()

consume2.join()

consume3.join()

consume4.join()

consume5.join()

end = time.time()

print('总消耗时间：',end-start)

python多线程queue_python多线程+队列（提高爬虫时效性）相关推荐

python3多线程queue_Python多线程（3）——Queue模块
Queue模块支持先进先出(FIFO)队列,支持多线程的访问,包括一个主要的类型(Queue)和两个异常类(exception classes). Python 2 中的Queue模块在Python ...
Python+正则表达式编写多线程百度贴吧网页爬虫
其实本来是想做一个利用Python+XPath的贴吧爬虫,但是遇到了一些很奇怪的问题搞了一天也没有解决,所以只有用简单的正则表达式来代替XPath. 这个小爬虫是用于爬取一个帖子所有的回帖人+回帖内容 ...
python多线程爬取段子_Python爬虫实例-多线程爬虫糗事百科搞笑内涵段子
学习爬虫,其乐无穷! 今天给大家带来一个爬虫案例,爬取糗事百科搞笑内涵段子. 爬取糗事百科段⼦,假设⻚⾯的 URL 是:http://www.qiushibaike.com/8hr/page/1 一. ...
一文看懂Python多进程与多线程编程(工作学习面试必读)
进程(process)和线程(thread)是非常抽象的概念, 也是程序员必需掌握的核心知识.多进程和多线程编程对于代码的并发执行,提升代码效率和缩短运行时间至关重要.小编我今天就来尝试下用一文总结下 ...
python多进程和多线程看这一篇就够了
脑海中关于进程和线程的概念一直很模糊,什么时候该用多进程,什么时候该用多线程总是搞不清楚.同时python因为历史遗留问题存在GIL全局锁,就让人更加困惑.这一篇就完整整理一下python中进程和线程 ...
Python中的多线程
Python中的多线程文章目录 Python中的多线程一.线程介绍 1.什么是线程 2.为什么要使用多线程 3.多线程的优点二.线程实现 1.**`普通创建方式`** 2.**`自定义线程`** ...
python3 多线程_Python3多线程爬虫实例讲解
多线程概述多线程使得程序内部可以分出多个线程来做多件事情,充分利用CPU空闲时间,提升处理效率.python提供了两个模块来实现多线程thread 和threading ,thread 有一些缺点, ...
python web框架多线程和多进程_python的多线程和多进程（一）
在进入主题之前,我们先学习一下并发和并行的概念: --并发:在操作系统中,并发是指一个时间段中有几个程序都处于启动到运行完毕之间,且这几个程序都是在同一个处理机上运行.但任一时刻点上只有一个程序在处理 ...
【干货】python多进程和多线程谁更快
python多进程和多线程谁更快 python3.6 threading和multiprocessing 自从用多进程和多线程进行编程,一致没搞懂到底谁更快.网上很多都说python多进程更快,因为G ...

python多线程queue_python多线程+队列（提高爬虫时效性）

python多线程queue_python多线程+队列（提高爬虫时效性）相关推荐

最新文章

热门文章