【水汐のpython】用python抓取外网的本子站并获取本子封面和信息

放假了正事不干，天天闲着ghs，顺便记录下吧头文件祖传的实际上没用到那么多，看着加吧

import datetime
import socketserver
import time
from json import loads
from xml import etree
import requests
#禁用安全请求警告pip install requests
import re
import base64
from PIL import Image
from io import BytesIO
import urllib.request
from pip._internal.network import session
from requests.packages.urllib3.exceptions import InsecureRequestWarning
import http.cookiejar
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
from bs4 import BeautifulSoup
import requests
from lxml import etree
from urllib import parse
from urllib import request
import urllib
from urllib import request
import socket
import pdb
import datetime
import os,base64
###获取本子封面+名字
def Getbenzi():proxies = {'https': 'https://127.0.0.1:1080','http': 'http://127.0.0.1:1080'}# 需要加上headers， 否则报错: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 8974: invalid start byteheaders = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}google_url = 'https://nhentai.net/random/'opener = request.build_opener(request.ProxyHandler(proxies))request.install_opener(opener)req = request.Request(google_url, headers=headers)response = request.urlopen(req)# print(response.read().decode())tree = etree.HTML(response.read().decode())a=tree.xpath('//div[@id="cover"]/a/img/@data-src')[0]print(a)b=tree.xpath('//*[@id="info"]/h2/span[2]/text()')[0]print(b)print('over')# 网络上图片的地址img_url = a# 将远程图片下载到本地，第二个参数就是要保存到本地的文件名urllib.request.urlretrieve(img_url, 'D:/pic.jpg')#   c=getwww()###这句注释掉# with open("D:/pic.jpg", "rb") as f:  # 转为二进制格式#     base64_data = base64.b64encode(f.read())  # 使用base64进行加密#     print(base64_data)# print(c)return a+b############################################################################
#########################以下可以省略##############################################################################################后面的getwww（）可以省略 ，这是来自lcy的接口，
###用于上传图片到国内网址并且获取他在qq上的md5码,此处暂时不公开
##获取图片网址
def getwww():session = requests.session()files={"file":('pic.jpg',open('D://pic.jpg','rb'),'image/jpeg')}result=session.post('https://icy???',files=files)# result = requests.post('https://icy???', files=files)  # 成功print(result.text)print('ok')return getpiccode(result.text)
##获取图片md5
def getpiccode(www):session = requests.session()urls={'url':www}request=session.post('https://icyicy???',urls)print(request.text)return request.text
def socketrun():socket.setdefaulttimeout(2)s=socket.socket()s.connect(("localhost",9955))print("C:input data (with 'end' for exit the program)")goon=Truewhile(goon):print ("C:-------------------------------------")print ("C:Please input data:")indata=input()+"\n"bbb = Getbenzi() + "\n"s.send(bbb.encode()) #must add "\n"# data=s.recv(1024).strip('\n')# if "end"!=data :#     print("C:receive result:"+data.encode())# else:#     goon=False;#   print("C:end...")s.close()return
###发送群消息
def sendmsggroup():session = requests.session()data = {'groupid': '？？？', #群号'msg': Getbenzi(),}result=session.post('https://icy？？？',data)print(result)return
###发送私聊消息
def sendmsgprivate():session = requests.session()data = {# 'groupid': '？？？','msg': Getbenzi(),'time': '2021-01-15T15:25:44','timemethod': 'exact','everydaycb': 'false','qqid': '？？？',  # 人的qq号'sendto': 'priv'}result = session.post('https://icy？？？', data)print(result)return

【水汐のpython】用python抓取外网的本子站并获取本子封面和信息相关推荐

003.[python学习] 简单抓取豆瓣网电影信息程序
003.[python学习] 简单抓取豆瓣网电影信息程序声明:本程序仅用于学习爬网页数据,不可用于其它用途. 本程序仍有很多不足之处,请读者不吝赐教. 依赖:本程序依赖BeautifulSoup4和 ...
Python爬虫实战 | 抓取小说网完结小说斗罗大陆
储备知识应有:Python语言程序设计 Python网络爬虫与信息提取两门课程都是中国大学MOOC的精彩课程,特别推荐初学者.环境Python3 本文整体思路是:1.获取小说目录页面,解析目录页面, ...
python怎么批量下载年报_如何用Python写一个抓取新浪财经网指定企业年报的脚本...
匿名用户 1级 2017-08-02 回答 1.先得到需要的上市公司的股票代码和名字. 2.分析下载链接地址.以康达尔为例,年报地址,下载链接的页面 ,链接末尾的8个数字前6个是股票代码,后两位01 ...
在当当买了python怎么下载源代码-初学Python 之抓取当当网图书页面目录并保存到txt文件...
这学期新开了门"高大上"的课<机器学习>,也开始入门Python.然后跟我们一样初学Python 的老师布置了个"作业"--用Python 弄个抓取 ...
独家 | 手把手教你用Python进行Web抓取（附代码）
作者:Kerry Parker 翻译:田晓宁校对:丁楠雅本文约2900字,建议阅读10分钟. 本教程以在Fast Track上收集百强公司的数据为例,教你抓取网页信息. 作为一名数据科学家,我在工 ...
Python爬虫实战---抓取图书馆借阅信息
Python爬虫实战---抓取图书馆借阅信息原创作品,引用请表明出处:Python爬虫实战---抓取图书馆借阅信息前段时间在图书馆借了很多书,借得多了就容易忘记每本书的应还日期,老是担心自己会违约 ...
推荐：手把手教你用Python进行Web抓取（附代码）
作者:Kerry Parker :翻译:田晓宁:校对:丁楠雅: 本文约2900字,建议阅读10分钟. 本教程以在Fast Track上收集百强公司的数据为例,教你抓取网页信息. 作为一名数据科学家,我 ...
python爬虫微信朋友圈怎么发文字_如何利用Python网络爬虫抓取微信朋友圈的动态（上）...
今天小编给大家分享一下如何利用Python网络爬虫抓取微信朋友圈的动态信息,实际上如果单独的去爬取朋友圈的话,难度会非常大,因为微信没有提供向网易云音乐这样的API接口,所以很容易找不到门.不过不要慌 ...
python抓取微信朋友圈动态_2018最全如何利用Python网络爬虫抓取微信朋友圈的动态...
今天小编给大家分享一下如何利用Python网络爬虫抓取微信朋友圈的动态信息,实际上如果单独的去爬取朋友圈的话,难度会非常大,因为微信没有提供向网易云音乐这样的API接口,所以很容易找不到门.不过不要慌 ...

【水汐のpython】用python抓取外网的本子站并获取本子封面和信息

【水汐のpython】用python抓取外网的本子站并获取本子封面和信息相关推荐

最新文章

热门文章

【水汐のpython】 用python抓取外网的本子站并获取本子封面和信息

【水汐のpython】 用python抓取外网的本子站并获取本子封面和信息相关推荐

最新文章

热门文章

【水汐のpython】用python抓取外网的本子站并获取本子封面和信息

【水汐のpython】用python抓取外网的本子站并获取本子封面和信息相关推荐