Python学习

python常见错误

UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x8c in position 22: illegal multibyte sequence //打开的文件未编码

encoding='UTF-8'

expected an indented block

未缩进，if for

continue 又进入一次新的循环

% 将其他变量置入字符串特定位置以生成新字符串的操作，但原来那个变量并不会改变

python在安装时，默认的编码是ascii，当程序中出现非ascii编码时，python的处理常常会报错UnicodeDecodeError: 'ascii' codec can't decode byte 0x?? in position 1: ordinal not in range(128)，python没办法处理非ascii编码的，此时需要自己设置python的默认编码，一般设置为utf8的编码格式。在程序中加入以下代码：即可将编码设置为utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')python3.x下应该改为如下方式：import importlibimportlib.reload(sys)

报错

No module named 'pandas'  没有模块

TypeError: read() got an unexpected keyword argument ‘encoding‘  需要将x encoding='UTF-8'删除

当我们输入密码时

getpass模块无效，则使用input，让一个变量等于input

class Person(object):def __init__(self,name,score):#相当于是构造函数self.name = name     #name就是一个变量，我们需要进行赋值self.score = scorestudent = Person()      # 此处应该有参数传入，却没有传
print(student.name)
print(student.score)

在类中函数相互调用要加 self

多线程的理解

就好比有两个函数，要实现不同功能，正常情况下，我们只能先执行一个函数，然后再执行另外的函数，而在多线程的情况下，两个函数可以并发执行，当一个函数的线程结束后，线程数会自动减一，而如果将线程方法用for循环进行，则一个函数会有多个线程进行执行，与我们想要的结果也就不一样

"""
# 拆包enumerate返回值是一个元组
names = ["aa", "bb", "cc"]
for i, name in enumerate(names):print(i, name)
"""
import threading
import timedef test1():t=0for i in range(10):print("-----test1-----%s\n" % i)time.sleep(1)t+=1print('总的次数是%d'%t)# 如果创建Thread时执行的函数，函数运行结束意味着 这个子线程结束...def test2():for i in range(5):print("-----test2-----%s\n" % i)time.sleep(1)def main():
# for g in range(5):# 在调用thread之前打印当前线程信息print(threading.enumerate())# 创建线程t1 = threading.Thread(target=test1)t2 = threading.Thread(target=test2)t1.start()t2.start()# 查看线程数量while True:thread_num = len(threading.enumerate())print("线程数量是%d" % thread_num)if thread_num <= 1:breaktime.sleep(1)if __name__ == '__main__':main()###########################################################################################################"""
# 拆包enumerate返回值是一个元组
names = ["aa", "bb", "cc"]
for i, name in enumerate(names):print(i, name)
"""
import threading
import timedef test1():t=0for i in range(10):print("-----test1-----%s\n" % i)time.sleep(1)t+=1print('总的次数是%d'%t)# 如果创建Thread时执行的函数，函数运行结束意味着 这个子线程结束...def test2():for i in range(5):print("-----test2-----%s\n" % i)time.sleep(1)def main():for g in range(5):# 在调用thread之前打印当前线程信息print(threading.enumerate())# 创建线程t1 = threading.Thread(target=test1)t2 = threading.Thread(target=test2)t1.start()t2.start()# 查看线程数量while True:thread_num = len(threading.enumerate())print("线程数量是%d" % thread_num)if thread_num <= 1:breaktime.sleep(1)if __name__ == '__main__':main()

有for循环的也就会出现这样一个情况

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-trQrM2SW-1639054198305)(C:\Users\acer\AppData\Roaming\Typora\typora-user-images\image-20210616165232043.png)]

函数多线程

如果不进行线程的开始，函数就不会被调用，因为该脚本中没有进行调用函数，而这个脚本又与上面有所不同，虽说也有进行循环来开启多线程，但是由于有names进行判断，在进行多线程执行时，names值会减一，这里减一是全局减一，而不是说每一个线程对同一个值减一，实际上就是说一个数减一后，另外一个线程再拿减一的数进行减一
import  time
import threading
import mathdef showName(threadNum,name):nowTime = time.strftime('%H:%M:%S', time.localtime(time.time())) # 获取当前时间print('I am thread-%d ,My name is function-%s, now time: %s ' % (threadNum, name, nowTime))time.sleep(1)
if __name__=='__main__':print('I am main...')names=list(range(20))threadNum=1threadPool=list()while names:for i in range(6):try:name=names.pop()except IndexError as e:print('The list is empty')break# else:#     t=threading.Thread(target=showName,args=(i, name))#     threadPool.append(t)#     t.start()while threadPool:t=threadPool.pop()t.join()threadNum+=1print('----------------------\r\n')print('main is over')

‘utf-8’) as fp

线程同步能够保证多个线程安全访问竞争资源，最简单的同步机制是引入互斥锁。互斥锁为资源引入一个状态：锁定/非锁定。某个线程要更改共享数据时，先将其锁定，此时资源的状态为“锁定”，其他线程不能更改；直到该线程释放资源，将资源的状态变成“非锁定”，其他的线程才能再次锁定该资源。互斥锁保证了每次只有一个线程进行写入操作，从而保证了多线程情况下数据的正确性。

cannot join thread before it is started  出现这个错误说明少了 start（）方法

无Python环境解决

Pyinstaller可将Python代码生成EXE文件运行，解决内网无Python环境，现在电脑环境无需再pyinstallser目录下

famous_person = "Steve Jobs once said"
message = "You've got to find what you love"
yiqi = famous_person + ',''"'+message+'"'
print(yiqi)

在终端输入参数，运行脚本

sys.argv[]说白了就是一个从程序外部获取参数的桥梁
比如有一个函数需要传参
def search(self,data,fofa)
简单的写法
data=sys.argv[1]
fofa=sys.argv[2]
search(data,fofa)

python add_argument() 的使用方法

import argparseparser = argparse.ArgumentParser()
parser.add_argument('--sparse', action='store_true', default=False, help='GAT with sparse version or not.')
parser.add_argument('--seed', type=int, default=72, help='Random seed.')
parser.add_argument('--epochs', type=int, default=10000, help='Number of epochs to train.')args = parser.parse_args()print(args.sparse)
print(args.seed)
print(args.epochs)输出
/home/user/anaconda3/bin/python3.6 /home/user/lly/pyGAT-master/test.py
False
72
10000Process finished with exit code 0

每个参数解释如下:

name or flags - 选项字符串的名字或者列表，例如 foo 或者 -f, --foo。
action - 命令行遇到参数时的动作，默认值是 store。
store_const，表示赋值为const；
append，将遇到的值存储成列表，也就是如果参数重复则会保存多个值;
append_const，将参数规范中定义的一个值保存到一个列表；
count，存储遇到的次数；此外，也可以继承 argparse.Action 自定义参数解析；
nargs - 应该读取的命令行参数个数，可以是具体的数字，或者是?号，当不指定值时对于 Positional argument 使用 default，对于 Optional argument 使用 const；或者是 * 号，表示 0 或多个参数；或者是 + 号表示 1 或多个参数。
const - action 和 nargs 所需要的常量值。
default - 不指定参数时的默认值。
type - 命令行参数应该被转换成的类型。
choices - 参数可允许的值的一个容器。
required - 可选参数是否可以省略 (仅针对可选参数)。
help - 参数的帮助信息，当指定为 argparse.SUPPRESS 时表示不显示该参数的帮助信息.
metavar - 在 usage 说明中的参数名称，对于必选参数默认就是参数名称，对于可选参数默认是全大写的参数名称.
dest - 解析后的参数名称，默认情况下，对于可选参数选取最长的名称，中划线转换为下划线.
然后对应程序中的内容：action - 命令行遇到参数时的动作，默认值是 store。所以sparse返回的是 Ture，

platform模块

platform.system() 获取操作系统类型，windows、linux等

platform.platform() 获取操作系统，Darwin-9.8.0-i386-32bit

platform.version() 获取系统版本信息

os.system(‘clear’)

system函数可以将字符串转化成命令在服务器上运行；其原理是每一条system函数执行时，其会创建一个子进程在系统上执行命令行，子进程的执行结果无法影响主进程；

readlines() 方法用于读取所有行

python -m SimpleHTTPServer 端口号设置web服务器

raw_input函数

raw_input([prompt]) 函数从标准输入读取一个行，并返回一个字符串（去掉结尾的换行符）：

#!/usr/bin/python
# -*- coding: UTF-8 -*- str = raw_input("请输入：")
print "你输入的内容是: ", str

input函数

input([prompt]) 函数和 raw_input([prompt]) 函数基本类似，但是 input 可以接收一个Python表达式作为输入，并将运算结果返回。

#!/usr/bin/python
# -*- coding: UTF-8 -*- str = input("请输入：")
print "你输入的内容是: ", str

网页请求

请求网址并获取返回的状态码

request.get(url+payload_linux).status_code  #url+payload_linux为自己定义的

解决登录才能提取数据的问题

import requests #载入模块
headers={'cookie':'cookie值也可以使用;分隔开继续请求;result_per_page=20',
}
请求时直接载入 requests.get(urls,headers=headers).content

请求网页报错解决

通过try来规避错误，当然在输出内容时也要进行编码

try:data=requests.get(url=url,headers=headers)print(data.content.decode('utf-8'))
except Exception as e:time.sleep(0.5)pass

请求的页面信息出现乱码无法写入到文件中，如上面的data信息，在写入文件时先帮他编码

data=requests.get(url=url,headers=headers).content.decode('utf-8')

code=requests.get(urls,headers=headers,proxies=proxy).status_code  #接收状态码,proxies设置代理

请求时都需要将payload进行编码

payload = base64.b64encode(payload.encode('utf-8'))#payload为要请求的请求头

请求url时一般添加如下：

rep1=requests.get(url,headers=headers,allow_redirects=False,timeout=1)
这个很重要allow_redirects=False,timeout=1

python执行顺序

https://www.cnblogs.com/cnXuYang/p/8336370.html

Python文件操作

在进行文件写入时，想要覆盖原文件内容可以用‘w’，不要的话用’+a’

    with open(r'xuexitong.txt', 'w', encoding='utf-8') as f:f.write(data)f.close()

将文件读取file_name = 'resource.txt'with open(file_name) as file_obj:lines = file_obj.readlines()print(lines)

解决加载文件多出换行符问题

字典中有换行符\n，我们需要屏蔽，比如变量为bian，那么屏蔽的语句为bian.replace(’/n’,’’)

csv.writer().writerow()保存的csv文件，打开时每行后都多一行空行

for paths in open('fs.txt',encoding='utf=8'):url=''paths=paths.replace('\n','')############################################################################################################
with open(path, mode='r+') as f1:for line in f1.readlines():subdomin = line.strip('\n')print(subdomin)两者的区别，第二个将文件中的内容以列表存放，删除回车，这样的话subdomain的值只会保留最后一行,最终的paths都会############################################################################################################
import re
data = []
with open(r'student.txt','r',encoding="utf-8") as f:for line in f.readlines():line = line.strip('\n')data.append(line)
print(data)这样也可以
f = open('student.txt',encoding='utf-8')
data = [line.strip() for line in f.readlines()]
f.close()
print(data)

对文件读取并保存在列表中，将列表写入到文件中并添加换行符

import re
data = []
with open(r'student.txt','r',encoding="utf-8") as f:for line in f.readlines():line = line.strip('\n')data.append(line)
print(data)
tidata = re.compile(r'(138\d{8})[^\d]')
datas = tidata.findall(str(data))
print(datas)
for aa in datas:with open(r'haoma.txt','w+',encoding='utf-8') as f:f.write(aa+'\n')
f.close()

字典中有换行符\n，我们需要屏蔽，比如变量为bian，那么屏蔽的语句为bian.replace(’/n’,’’) 也可以用strip(’\n’)

#以下内容解释，当我们进行网页爬取时，经常提取的内容会出现以下的形式
['加油','来吧','hhh \n']这样以列表的形式存储，此时需要进行如下设置，
tiqus = '\n'.join(tiqu)
tiquss=tiqus.split()############################################################################################################
data = ['jksdf \n lksfja','kjasfdkjlk askjfdkfkkkkkkkkkkk']
result1 = '\n'.join(data)
print(result1)
result2 = result1.split()
print(result2)输出
jksdf lksfja
kjasfdkjlk askjfdkfkkkkkkkkkkk
['jksdf', 'lksfja', 'kjasfdkjlk', 'askjfdkfkkkkkkkkkkk']

使用 lxml提取内容

提取爬取的网页内容参考：https://www.cnblogs.com/zhangxinqi/p/9210211.html

from lxml import etree
# import re
# regs=r"ab+"
# data="acabc"
# m=re.search(regs,data)
# print(m)
with open(r'xuexitong.txt','r',encoding='utf-8') as rede:xue=rede.read()
soup=etree.HTML(xue)
# print(type(soup))
# html=etree.parse('soup',etree.HTMLParser())
tiqu=soup.xpath('//ul[@class="nav fr"]/li[@class="appCode"]/text()')
#寻找ul中属性值为class="nav fr"的ul标签，并且在该标签中存在子标签li，属性值为class="appCode"的内容#以下内容解释，当我们进行网页爬取时，经常提取的内容会出现以下的形式
['加油','来吧','hhh \n']这样以列表的形式存储，此时需要进行如下设置，
# tiqus = '\n'.join(tiqu)
# tiquss=tiqus.split()
# print(tiquss)
for cichi in tiqu:with open(r'xueru.txt','a+',encoding='utf-8') as wc:wc.write(cichi+'\n')wc.close()

from lxml import etreetext='''
<div><ul><li class="item-0"><a href="link1.html">第一个</a></li><li class="item-1"><a href="link2.html">second item</a></li></ul></div>
'''html=etree.HTML(text,etree.HTMLParser())
result=html.xpath('//li[@class="item-1"]/a/text()') #获取a节点下的内容
result1=html.xpath('//li[@class="item-0"]//text()') #获取li下所有子孙节点的内容，并不包括second item,因为下面的并不属于子孙节点
result2=html.xpath('//div//text()')#li是他的子节点
result22='\n'.join(result2)#将列表使用\n分隔
result222=result22.split()#以空格为分隔符，将所有的\n和空格去除都去除
result3=html.xpath('//li/a/@href')  #获取a的href属性
result4=html.xpath('//li[@class="item-1"]//a/@href')#获取li属性为class="item-1"下的子节点a的href的属性值
result5=html.xpath('//li/a')#//li用于选中所有li节点，/a用于选中li节点的所有直接子节点a
print(result)
print(result1)
print(result22)
print(result3)
print(result4)
print(result5)

split的理解

processes = []with open("file.txt", "r") as f:lines = f.readlines()#print(lines)# Loop through all lines, ignoring header.# Add last element to list (i.e. the process name)for l in lines[1:]:processes.append(l.split()[-1])#将获得的内容添加到processes列表中，空格为分隔符保留最后一个print(processes)

正则表达式提取内容

https://blog.csdn.net/u013074465/article/details/44310427?locationnum=7&fps=1

https://blog.csdn.net/ysy_1_2/article/details/104790079

https://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html

https://www.runoob.com/python/python-reg-expressions.html

正则表达式在线测试网站：https://c.runoob.com/front-end/854/

正则表达式通关网站：https://codejiaonang.com/#/course/regex_chapter1/0/0

Python正则表达式表格：Python正则表达式指南 - AstralWind - 博客园 (cnblogs.com)

经验总结

1、提取文本中的内容一般不转换为列表

2、在进行正则匹配时，在要提取的数据前面有多个数据需要匹配时，一般不用正向断言，而是用(?

python安全渗透笔记相关推荐

vulnhub靶机Tr0ll1渗透笔记
Tr0ll:1渗透笔记靶场下载地址:https://www.vulnhub.com/entry/tr0ll-1,100/ kali ip:192.168.20.128 靶机和kali位于同一网段信 ...
python做直方图-python OpenCV学习笔记实现二维直方图
本文介绍了python OpenCV学习笔记实现二维直方图,分享给大家,具体如下: 官方文档 – https://docs.opencv.org/3.4.0/dd/d0d/tutorial_py_2d ...
Python中知识点笔记
Python中知识点笔记 Wentao Sun. Nov.14, 2008 来这个公司11个月了,最开始来的一个笔记本用完了,里面都是工作时记录的一些片段,看到一块自己当时学/写 python程序时记 ...
python 队列一次取多个_Queue 队列模块-Python成为专业人士笔记
"专业人士笔记"系列目录: 创帆云:Python成为专业人士笔记--强烈建议收藏!每日持续更新!zhuanlan.zhihu.com 介绍队列模块能实现多生产者.多消费者的队列 ...
python 慕课课程笔记（一）
python 慕课课程笔记 1.python 是动态语言,而 java 是静态语言.动态语言变量本身的类型时不固定的,而静态语言在变量定义时就必须指定变量的类别. 2. 在python 代码中书写 a ...
python 正则学习笔记
python 正则学习笔记官方document #1.0 import re m=re.search('(?<=abc)def','cxabcdefgb')print(m.group(0))# ...
【Tools】python环境操作笔记
python环境操作笔记 1.安装Python虚拟环境 2.python pip 添加清华镜像 3.Ubuntu卸载python后出现系统崩溃以及各种问题的解决方法 4.Python下关于 tkint ...
python笔记视频_终于拿到！清华大佬Python视频+书+笔记汇总
终于拿到!清华大佬Python视频+书+笔记汇总清华学姐推荐的Python视频400集,拿走不谢!
Python 三维可视化笔记1 -- TVTK库
Python 三维可视化笔记1 – TVTK库 Python 三维可视化系列笔记是笔者在学习黄天羽老师的<Python科学计算三维可视化>课程及笔者实践三维可视化的笔记. 课程链接: Py ...

python安全渗透笔记