python 微信图文消息接口_用Python实现微信公众号API素材库图文消息抓取

[Python] 纯文本查看复制代码# -*- coding:utf-8 -*-

import os

import urllib.parse

from html.parser import HTMLParser

import requests

from bs4 import BeautifulSoup

from pymongo import MongoClient

class ContentHtmlParser(HTMLParser):

"""

过滤html标签

"""

def __init__(self):

HTMLParser.__init__(self)

self.text = ""

def handle_data(self, data):

self.text += data

def get_text(self):

return self.text

mongo_client = MongoClient("ip", 27017)

mongo_db = mongo_client["gongzhonghao"]

def get_words():

words = []

with open("words.txt", encoding="utf-8") as words_file:

for lines in words_file.readlines():

if len(lines.strip()) == 0:

continue

if lines.find("、") != -1:

for p in lines.split("、"):

words.append(p.replace("\n", ""))

else:

words.append(lines.replace("\n", ""))

return words

def get_articles(clt):

articles = []

collection = mongo_db[clt]

doc = collection.find_one()

items = doc["items"]

for it in items:

content = it["content"]["news_item"][0]

articles.append(content)

return articles

def download(dir, file_name, url):

if not os.path.exists(dir):

os.mkdir(dir)

try:

resp = requests.get(url)

path = dir + "\\" + file_name

if os.path.exists(path):

return

with open(path, "wb") as f:

f.write(resp.content)

except :

print(url)

def find_images(content):

imgs = []

c = urllib.parse.unquote(content)

img_labels = BeautifulSoup(c, "html.parser").find_all("img")

for img in img_labels:

src = img.get("data-src")

imgs.append(src)

return imgs

def get_suffix(url):

try:

suffix = url[url.rindex("=") + 1:]

if suffix == "jpeg" or suffix == "other":

return ".jpg"

return "." + suffix

except:

return ".jpg"

def filter_content(content):

parser = ContentHtmlParser()

parser.feed(content)

return parser.get_text()

def check_jinyongci(content):

fc = filter_content(content)

words = get_words()

invalids = []

for w in words:

if fc.find(w) != -1:

invalids.append(w)

return invalids

def save_jinyongci(clt, title, invalids):

if len(invalids) == 0:

return

file = clt + "\\invalid.txt"

with open(file, "a+",encoding="utf-8") as f:

f.write("标题：" + title)

f.write("\r\n敏感词：")

for iv in invalids:

f.write(iv)

f.write("、")

f.write("\r\n\r\n")

if __name__ == "__main__":

clt = "xxx"

if not os.path.exists(clt):

os.mkdir(clt)

articles = get_articles(clt)

print(clt + ": 共" + str(len(articles)) + "个")

for i in range(0, len(articles)):

print("正在处理第 " + str(i) + " 个")

title = articles[i]["title"]

thumb_url = articles[i]["thumb_url"]

content = articles[i]["content"]

# 下载封面

# path = os.path.join(clt, title)

fname = str(i) + "_" + title.replace("|", "").replace("", "")

download(clt, fname + get_suffix(thumb_url), thumb_url)

# 找出文章中的图片

imgs = find_images(content)

index = 0

for img in imgs:

download(clt, fname + "_" + str(index) + get_suffix(img), img)

index = index + 1

# 找出文章中的敏感词

invalids = check_jinyongci(content)

print(invalids,'----',title)

save_jinyongci(clt, title, invalids)

python 微信图文消息接口_用Python实现微信公众号API素材库图文消息抓取相关推荐

python微信api_用Python实现微信公众号API素材库图文消息抓取
[Python] 纯文本查看复制代码# -*- coding:utf-8 -*- import os import urllib.parse from html.parser import HTML ...
python爬取公众号历史文章_pythons爬虫：抓取微信公众号历史文章(selenium+phantomjs)...
原标题:pythons爬虫:抓取微信公众号历史文章(selenium+phantomjs) 大数据挖掘DT数据分析公众号: datadw 本文爬虫代码可以通过回复本公众号关键字"公众号& ...
python的flask实现接口_使用python的Flask实现一个接口mock数据（傻瓜教程）
本教程使用的是python3 因为python2官方已经不维护了,当然在编写过程中没什么区别,安装就比较简单了只要电脑有pip执行pip3 install flask即可一般的接口包括增删改查以及查 ...
qq公众号消息是发送到自己服务器,qq公众号屏蔽后还发消息 qq消息被屏蔽了怎么办...
教你一招如何知道对方屏蔽了我的qq消息如果对方将我们的QQ信息进行屏蔽,那么我们所发送的QQ信息,对方是不会接收到的,可能我们还会一直不停的发,然后等待对方的回复,可是一直没有反应,此刻的心情无比 ...
java上传图文消息_微信公众号开发之上传图文消息素材（十二）
群发消息太重要了,公众号管理员需要定期通过公众号群发一些消息,用户通过推送的消息可以定期了解公众号的最新信息. 群发图文消息的过程如下: 首先,预先将图文消息中需要用到的图片,使用上传图文消息内图片接 ...
添加管理微信公众号图片素材-微信公众号使用教程8
微信公众号发送消息给粉丝时, 有一种素材是经常用到的, 那就是图片. 公众号使用图片的方式在公众号中使用图片有两种方式: 一种是直接复制粘贴, 另外一种是先把图片上传到微信公众号的素材库中, 在使用 ...
微信公众号python_wechat: 微信 Python SDK，支持微信公众号以及企业号的上行消息及 OAuth 接口...
微信公众号Python-SDK 本SDK支持微信公众号以及企业号的上行消息及OAuth接口.本文档及SDK假设使用者已经具备微信公众号开发的基础知识,及有能力通过微信公众号.企业号的文档来查找相关的接 ...
python pc微信接收信息_GitHub - ericadver/WechatPCAPI: 微信PC版的API接口，可通过Python调用微信获取好友、群、公众号列表，并收发消息等功能。...
WechatPCAPI 微信PC版的API接口,可通过Python调用微信获取好友.群.公众号列表,并收发消息等功能.可用于二次开发在线微信机器人.微信消息监控.群控软件.开发界面作多个微信控制软件等 ...
python 微信公众号网页接口调用_GitHub - micsem00/WechatPCAPI: 微信PC版的API接口，可通过Python调用微信获取好友、群、公众号列表，并收发消息等功能。...
WechatPCAPI 微信PC版的API接口,可通过Python调用微信获取好友.群.公众号列表,并收发消息等功能.可用于二次开发在线微信机器人.微信消息监控.群控软件.开发界面作多个微信控制软件等 ...
python的读取微信界面_GitHub - lchb000/WechatPCAPI: 微信PC版的API接口，可通过Python调用微信获取好友、群、公众号列表，并收发消息等功能。...
WechatPCAPI 微信PC版的API接口,可通过Python调用微信获取好友.群.公众号列表,并收发消息,接受转账.好友请求.入群请求,群管理等功能.可用于二次开发在线微信机器人.微信消息监控. ...

python 微信图文消息接口_用Python实现微信公众号API素材库图文消息抓取

python 微信图文消息接口_用Python实现微信公众号API素材库图文消息抓取相关推荐

最新文章

热门文章