pyinstxtractor 源码分析及填坑

pyinstxtractor 是一个用来反编译PyInstaller打包成的EXE的脚本。

PyInstaller：python脚本---》生成EXE，可脱离python环境运行。

pyinstxtractor: EXE--》python脚本，即还原出上述脚本。

pyinstxtractor 下载地址：

https://github.com/extremecoders-re/pyinstxtractor

"""
PyInstaller Extractor v2.0 (Supports pyinstaller 3.6, 3.5, 3.4, 3.3, 3.2, 3.1, 3.0, 2.1, 2.0)
Author : Extreme Coders
E-mail : extremecoders(at)hotmail(dot)com
Web    : https://0xec.blogspot.com
Date   : 26-March-2020
Url    : https://github.com/extremecoders-re/pyinstxtractor
For any suggestions, leave a comment on
https://forum.tuts4you.com/topic/34455-pyinstaller-extractor/
This script extracts a pyinstaller generated executable file.
Pyinstaller installation is not needed. The script has it all.
For best results, it is recommended to run this script in the
same version of python as was used to create the executable.
This is just to prevent unmarshalling errors(if any) while
extracting the PYZ archive.
Usage : Just copy this script to the directory where your exe residesand run the script with the exe file name as a parameter
C:\path\to\exe\>python pyinstxtractor.py <filename>
$ /path/to/exe/python pyinstxtractor.py <filename>
Licensed under GNU General Public License (GPL) v3.
You are free to modify this source.
CHANGELOG
================================================
Version 1.1 (Jan 28, 2014)
-------------------------------------------------
- First Release
- Supports only pyinstaller 2.0
Version 1.2 (Sept 12, 2015)
-------------------------------------------------
- Added support for pyinstaller 2.1 and 3.0 dev
- Cleaned up code
- Script is now more verbose
- Executable extracted within a dedicated sub-directory
(Support for pyinstaller 3.0 dev is experimental)
Version 1.3 (Dec 12, 2015)
-------------------------------------------------
- Added support for pyinstaller 3.0 final
- Script is compatible with both python 2.x & 3.x (Thanks to Moritz Kroll @ Avira Operations GmbH & Co. KG)
Version 1.4 (Jan 19, 2016)
-------------------------------------------------
- Fixed a bug when writing pyc files >= version 3.3 (Thanks to Daniello Alto: https://github.com/Djamana)
Version 1.5 (March 1, 2016)
-------------------------------------------------
- Added support for pyinstaller 3.1 (Thanks to Berwyn Hoyt for reporting)
Version 1.6 (Sept 5, 2016)
-------------------------------------------------
- Added support for pyinstaller 3.2
- Extractor will use a random name while extracting unnamed files.
- For encrypted pyz archives it will dump the contents as is. Previously, the tool would fail.
Version 1.7 (March 13, 2017)
-------------------------------------------------
- Made the script compatible with python 2.6 (Thanks to Ross for reporting)
Version 1.8 (April 28, 2017)
-------------------------------------------------
- Support for sub-directories in .pyz files (Thanks to Moritz Kroll @ Avira Operations GmbH & Co. KG)
Version 1.9 (November 29, 2017)
-------------------------------------------------
- Added support for pyinstaller 3.3
- Display the scripts which are run at entry (Thanks to Michael Gillespie @ malwarehunterteam for the feature request)
Version 2.0 (March 26, 2020)
-------------------------------------------------
- Project migrated to github
- Supports pyinstaller 3.6
- Added support for Python 3.7, 3.8
- The header of all extracted pyc's are now automatically fixed
"""from __future__ import print_function
import os
import struct
import marshal
import zlib
import sys
from uuid import uuid4 as uniquename# imp is deprecated in Python3 in favour of importlib
if sys.version_info.major == 3:from importlib.util import MAGIC_NUMBERpyc_magic = MAGIC_NUMBER
else:import imppyc_magic = imp.get_magic()class CTOCEntry:def __init__(self, position, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name):self.position = positionself.cmprsdDataSize = cmprsdDataSizeself.uncmprsdDataSize = uncmprsdDataSizeself.cmprsFlag = cmprsFlagself.typeCmprsData = typeCmprsDataself.name = nameclass PyInstArchive:PYINST20_COOKIE_SIZE = 24           # For pyinstaller 2.0PYINST21_COOKIE_SIZE = 24 + 64      # For pyinstaller 2.1+MAGIC = b'MEI\014\013\012\013\016'  # Magic number which identifies pyinstallerdef __init__(self, path):self.filePath = pathdef open(self):try:self.fPtr = open(self.filePath, 'rb')self.fileSize = os.stat(self.filePath).st_sizeexcept:print('[!] Error: Could not open {0}'.format(self.filePath))return Falsereturn Truedef close(self):try:self.fPtr.close()except:passdef checkFile(self):print('[+] Processing {0}'.format(self.filePath))# Check if it is a 2.0 archiveself.fPtr.seek(self.fileSize - self.PYINST20_COOKIE_SIZE, os.SEEK_SET)magicFromFile = self.fPtr.read(len(self.MAGIC))if magicFromFile == self.MAGIC:self.pyinstVer = 20     # pyinstaller 2.0print('[+] Pyinstaller version: 2.0')return True# Check for pyinstaller 2.1+ before bailing outself.fPtr.seek(self.fileSize - self.PYINST21_COOKIE_SIZE, os.SEEK_SET)magicFromFile = self.fPtr.read(len(self.MAGIC))if magicFromFile == self.MAGIC:print('[+] Pyinstaller version: 2.1+')self.pyinstVer = 21     # pyinstaller 2.1+return Trueprint('[!] Error : Unsupported pyinstaller version or not a pyinstaller archive')return Falsedef getCArchiveInfo(self):try:if self.pyinstVer == 20:self.fPtr.seek(self.fileSize - self.PYINST20_COOKIE_SIZE, os.SEEK_SET)# Read CArchive cookie(magic, lengthofPackage, toc, tocLen, self.pyver) = \struct.unpack('!8siiii', self.fPtr.read(self.PYINST20_COOKIE_SIZE))elif self.pyinstVer == 21:self.fPtr.seek(self.fileSize - self.PYINST21_COOKIE_SIZE, os.SEEK_SET)# Read CArchive cookie(magic, lengthofPackage, toc, tocLen, self.pyver, pylibname) = \struct.unpack('!8siiii64s', self.fPtr.read(self.PYINST21_COOKIE_SIZE))except:print('[!] Error : The file is not a pyinstaller archive')return Falseprint('[+] Python version: {0}'.format(self.pyver))# Overlay is the data appended at the end of the PEself.overlaySize = lengthofPackageself.overlayPos = self.fileSize - self.overlaySizeself.tableOfContentsPos = self.overlayPos + tocself.tableOfContentsSize = tocLenprint('[+] Length of package: {0} bytes'.format(self.overlaySize))return Truedef parseTOC(self):# Go to the table of contentsself.fPtr.seek(self.tableOfContentsPos, os.SEEK_SET)self.tocList = []parsedLen = 0# Parse table of contentswhile parsedLen < self.tableOfContentsSize:(entrySize, ) = struct.unpack('!i', self.fPtr.read(4))nameLen = struct.calcsize('!iiiiBc')(entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name) = \struct.unpack( \'!iiiBc{0}s'.format(entrySize - nameLen), \self.fPtr.read(entrySize - 4))name = name.decode('utf-8').rstrip('\0')if len(name) == 0:name = str(uniquename())print('[!] Warning: Found an unamed file in CArchive. Using random name {0}'.format(name))self.tocList.append( \CTOCEntry(                      \self.overlayPos + entryPos, \cmprsdDataSize,             \uncmprsdDataSize,           \cmprsFlag,                  \typeCmprsData,              \name                        \))parsedLen += entrySizeprint('[+] Found {0} files in CArchive'.format(len(self.tocList)))def _writeRawData(self, filepath, data):nm = filepath.replace('\\', os.path.sep).replace('/', os.path.sep).replace('..', '__')nmDir = os.path.dirname(nm)if nmDir != '' and not os.path.exists(nmDir): # Check if path exists, create if notos.makedirs(nmDir)with open(nm, 'wb') as f:f.write(data)def extractFiles(self):print('[+] Beginning extraction...please standby')extractionDir = os.path.join(os.getcwd(), os.path.basename(self.filePath) + '_extracted')if not os.path.exists(extractionDir):os.mkdir(extractionDir)os.chdir(extractionDir)for entry in self.tocList:basePath = os.path.dirname(entry.name)if basePath != '':# Check if path exists, create if notif not os.path.exists(basePath):os.makedirs(basePath)self.fPtr.seek(entry.position, os.SEEK_SET)data = self.fPtr.read(entry.cmprsdDataSize)if entry.cmprsFlag == 1:data = zlib.decompress(data)# Malware may tamper with the uncompressed size# Comment out the assertion in such a caseassert len(data) == entry.uncmprsdDataSize # Sanity Checkif entry.typeCmprsData == b's':# s -> ARCHIVE_ITEM_PYSOURCE# Entry point are expected to be python scriptsprint('[+] Possible entry point: {0}.pyc'.format(entry.name))self._writePyc(entry.name + '.pyc', data)elif entry.typeCmprsData == b'M' or entry.typeCmprsData == b'm':# M -> ARCHIVE_ITEM_PYPACKAGE# m -> ARCHIVE_ITEM_PYMODULE# packages and modules are pyc files with their header's intactself._writeRawData(entry.name + '.pyc', data)else:self._writeRawData(entry.name, data)if entry.typeCmprsData == b'z' or entry.typeCmprsData == b'Z':self._extractPyz(entry.name)def _writePyc(self, filename, data):with open(filename, 'wb') as pycFile:pycFile.write(pyc_magic)            # pyc magicif self.pyver >= 37:                # PEP 552 -- Deterministic pycspycFile.write(b'\0' * 4)        # BitfieldpycFile.write(b'\0' * 8)        # (Timestamp + size) || hash else:pycFile.write(b'\0' * 4)      # Timestampif self.pyver >= 33:pycFile.write(b'\0' * 4)  # Size parameter added in Python 3.3pycFile.write(data)def _extractPyz(self, name):dirName =  name + '_extracted'# Create a directory for the contents of the pyzif not os.path.exists(dirName):os.mkdir(dirName)with open(name, 'rb') as f:pyzMagic = f.read(4)assert pyzMagic == b'PYZ\0' # Sanity CheckpycHeader = f.read(4) # Python magic value# Skip PYZ extraction if not running under the same python versionif pyc_magic != pycHeader:print('[!] Warning: This script is running in a different Python version than the one used to build the executable.')print('[!] Please run this script in Python{0} to prevent extraction errors during unmarshalling'.format(self.pyver))print('[!] Skipping pyz extraction')return(tocPosition, ) = struct.unpack('!i', f.read(4))f.seek(tocPosition, os.SEEK_SET)try:toc = marshal.load(f)except:print('[!] Unmarshalling FAILED. Cannot extract {0}. Extracting remaining files.'.format(name))returnprint('[+] Found {0} files in PYZ archive'.format(len(toc)))# From pyinstaller 3.1+ toc is a list of tuplesif type(toc) == list:toc = dict(toc)for key in toc.keys():(ispkg, pos, length) = toc[key]f.seek(pos, os.SEEK_SET)fileName = keytry:# for Python > 3.3 some keys are bytes object some are str objectfileName = fileName.decode('utf-8')except:pass# Prevent writing outside dirNamefileName = fileName.replace('..', '__').replace('.', os.path.sep)if ispkg == 1:filePath = os.path.join(dirName, fileName, '__init__.pyc')else:filePath = os.path.join(dirName, fileName + '.pyc')fileDir = os.path.dirname(filePath)if not os.path.exists(fileDir):os.makedirs(fileDir)try:data = f.read(length)data = zlib.decompress(data)except:print('[!] Error: Failed to decompress {0}, probably encrypted. Extracting as is.'.format(filePath))open(filePath + '.encrypted', 'wb').write(data)else:self._writePyc(filePath, data)def main():if len(sys.argv) < 2:print('[+] Usage: pyinstxtractor.py <filename>')else:arch = PyInstArchive(sys.argv[1])if arch.open():if arch.checkFile():if arch.getCArchiveInfo():arch.parseTOC()arch.extractFiles()arch.close()print('[+] Successfully extracted pyinstaller archive: {0}'.format(sys.argv[1]))print('')print('You can now use a python decompiler on the pyc files within the extracted directory')returnarch.close()if __name__ == '__main__':main()

此脚本使用很简单,命令行下脚本后边跟上要解包的文件即可。

下边以python3.8 打包的程序为例;

PyInstArchive类解析：

def __init__(self, path): 初始化函数，将命令行传入的路径保存到成员filepath中。

def open(self): 以二进制文件方式打开文件，并保存文件指针及文件大小到成员变量 fptr 和 filesize中

文件大小：9,930,109 0x97857D

def close(self): 文件指针关闭

def checkFile(self): 判断是否是pyinstall打包的程序以及打包程序版本。

Pyinstaller version: 2.0 文件倒数第24个字节处为标志字符串 'MEI\014\013\012\013\016'

Pyinstaller version: 2.1+ 2.1以后的版本文件倒数第88(=24+64)个字节处为标志字符串 'MEI\014\013\012\013\016'

尾部数据：

阴影部分即为尾部的结构数据部分。

def getCArchiveInfo(self): 解析尾部的结构这个是大端序

Pyinstaller version: 2.0

magic 偏移0 长度8 标志字符串 MEI。。。。。

lengthofPackage 偏移 8 长度4 附加数据大小 0x0093c97d

toc 偏移12 长度4 0x00930A15 //附件数据起始位置(overlayPos )为0x0x3bc00

tocLen, 偏移16 长度 4 0xBF10 //tableOfContentsPos=0x96c615

self.pyver 偏移20 长度 4 0x26

2.1以上版本跟这个大同小异，只不过多了一个pylibname 这里文件名为python38.dll

def parseTOC(self): 解析文件

1.定位到 tableOfContentsPos 0x96c615 上图中从阴影开始的位置

            (entrySize, ) = struct.unpack('!i', self.fPtr.read(4))#计算给定的格式(fmt)占用多少字节的内存  namelen=18nameLen = struct.calcsize('!iiiiBc')(entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name) = \struct.unpack( \'!iiiBc{0}s'.format(entrySize - nameLen), \self.fPtr.read(entrySize - 4))name = name.decode('utf-8').rstrip('\0')

上图中带阴影部分为第一个结构。

entrySize =0x20

entrypos=0

cmprsdDataSize=0x105 压缩后大小

uncmprsdDataSize=0x157 压缩前大小

cmprsFlag=1 压缩标志

typeCmprsData=0x6D // 这个不同的值对应不同的含义,具体含义下下文：

s 0x73 Entry point are expected to be python scripts 入口python文件

m 0x6D packages and modules are pyc files with their header's intact m -> ARCHIVE_ITEM_PYMODULE

M 0x4D M -> ARCHIVE_ITEM_PYPACKAGE

Z

z

name 长度为0x20-18=14 字符为 struct

绿色线为下一个结构

红色线+灰色线是再下一个结构

def _writeRawData(self, filepath, data): 写文件

def extractFiles(self): 提取文件根据 parseTOC 函数中提取的文件结构写文件

def _writePyc(self, filename, data): 写pyc文件，坑就是再这里

#坑在这
# imp is deprecated in Python3 in favour of importlib
if sys.version_info.major == 3:from importlib.util import MAGIC_NUMBERpyc_magic = MAGIC_NUMBER
else:import imppyc_magic = imp.get_magic()def _writePyc(self, filename, data):with open(filename, 'wb') as pycFile:pycFile.write(pyc_magic)            # pyc magicif self.pyver >= 37:                # PEP 552 -- Deterministic pycspycFile.write(b'\0' * 4)        # BitfieldpycFile.write(b'\0' * 8)        # (Timestamp + size) || hash else:pycFile.write(b'\0' * 4)      # Timestampif self.pyver >= 33:pycFile.write(b'\0' * 4)  # Size parameter added in Python 3.3pycFile.write(data)

这个函数不复杂，就是先写入pyc文件的头部结构，然后写入后边的数据。而坑就在pyc_magic 这里，此脚本在写入头部的四字节时不是按照解析文件的版本写入标志字节，而是直接读取当前环境下的头部magic进行写入。

例如环境是 3.8 那 pyc_magic 就是 ’\x55\x0d\x0d\x0a‘ 不管你解析的程序中python版本是多少。

环境是 2.7 那 pyc_magic 就是 ’\x03\xf3\x0d\x0a‘ 不管你解析的程序中python版本是多少。

在此例中，作者环境为python3.8 而此exe的版本正好也是3.8 所以解出来的pyc可以顺利反编译。

假如作者环境时2.7 那解包出来的pyc就没法反编译。

https://github.com/zrax/pycdc/blob/master/PythonBytecode.txt

各版本对应的magic如下，下表为小端序。如果时3.8 实际应写入 ’\x55\x0d\x0d\x0a‘

Python  MAGIC           Python  MAGIC           Python  MAGIC
1.0     0x00999902      2.0     0x0A0DC687      3.0     0x0A0D0C3A
1.1     0x00999903      2.1     0x0A0DEB2A      3.1     0x0A0D0C4E
1.2     0x00999903      2.2     0x0A0DED2D      3.2     0x0A0D0C6C
1.3     0x0A0D2E89      2.3     0x0A0DF23B      3.3     0x0A0D0C9E
1.4     0x0A0D1704      2.4     0x0A0DF26D      3.4     0x0A0D0CEE
1.5     0x0A0D4E99      2.5     0x0A0DF2B3      3.5     0x0A0D0D16
1.6     0x0A0DC4FC      2.6     0x0A0DF2D1      3.5.3   0x0A0D0D172.7     0x0A0DF303      3.6     0x0A0D0D333.7     0x0A0D0D423.8     0x0A0D0D55

def _extractPyz(self, name): 写PYZ文件的

源码解析到此结束。

需要填坑的函数就是

def _writePyc(self, filename, data):with open(filename, 'wb') as pycFile:pycFile.write(pyc_magic)            # pyc magic

//pyc_magic 需要改为根据self.pyver 查下表写入。

Python  MAGIC           Python  MAGIC           Python  MAGIC
1.0     0x00999902      2.0     0x0A0DC687      3.0     0x0A0D0C3A
1.1     0x00999903      2.1     0x0A0DEB2A      3.1     0x0A0D0C4E
1.2     0x00999903      2.2     0x0A0DED2D      3.2     0x0A0D0C6C
1.3     0x0A0D2E89      2.3     0x0A0DF23B      3.3     0x0A0D0C9E
1.4     0x0A0D1704      2.4     0x0A0DF26D      3.4     0x0A0D0CEE
1.5     0x0A0D4E99      2.5     0x0A0DF2B3      3.5     0x0A0D0D16
1.6     0x0A0DC4FC      2.6     0x0A0DF2D1      3.5.3   0x0A0D0D172.7     0x0A0DF303      3.6     0x0A0D0D333.7     0x0A0D0D423.8     0x0A0D0D55

修复代码就不贴了。

pyinstxtractor 源码分析及填坑相关推荐

Spring5.x源码分析 | 从踩坑到放弃之环境搭建
Spring5.x源码分析--从踩坑到放弃之环境搭建前言自从Spring发行4.x后,很久没去好好看过Spring源码了,加上最近半年工作都是偏管理和参与设计为主,在技术细节上或多或少有点疏忽,最 ...
Flutter中网络图片加载和缓存源码分析，踩坑了
关于Android的近况大家都知道,今年移动开发不那么火热了,完全没有了前两年Android开发那种火热的势头,如此同时,AI热火朝天,很多言论都说Android不行了.其实不光是Android,i ...
v50.03 鸿蒙内核源码分析(编译环境) | 编译鸿蒙防掉坑指南 | 百篇博客分析HarmonyOS源码
颜渊死.子曰:"噫!天丧予!天丧予!" <论语>:先进篇百篇博客系列篇.本篇为: v50.xx 鸿蒙内核源码分析(编译环境篇) | 编译鸿蒙防掉坑指南编译构建相关篇 ...
微前端框架之 qiankun 从入门到源码分析
当学习成为了习惯,知识也就变成了常识.感谢各位的点赞.收藏和评论. 新视频和文章会第一时间在微信公众号发送,欢迎关注:李永宁lyn 文章已收录到 github,欢迎 Watch 和 Star. 简介 ...
从 Masscan, Zmap 源码分析到开发实践
作者:w7ay@知道创宇404实验室日期:2019年10月12日原文链接:https://paper.seebug.org/1052/ Zmap和Masscan都是号称能够快速扫描互联网的扫描器, ...
Framework学习之路（一）—— UI绘制深入源码分析
Framework学习之路(一)-- UI绘制深入源码分析本篇为笔者对Android SDK 33版本的UI绘制入口进行追踪的过程,主要作笔记作用.由于笔者经验尚浅,水平也有限,所以会存在很多不足的 ...
【学习笔记】Keras库下的resnet源码分析
Keras库下的resnet源码应用及解读其实我也不知道这种东西有没有写下来的必要,但是跑代码的时候总摸鱼总归是不好的.虽然很简单,不过我也大概做个学习记录,写给小白看的.源码来自keras文档,大 ...
Handler源码分析（超详细的）
这篇博客是一种入门级的但讲的很细,本人能力有限,希望看到大神,发现有不对的地方请联系我,也希望可以和大家在讨论区互动. 本文不管是从哪得到得信息,本人都认真的研究过和测试.包括代码所以说本人技术可能一 ...
Python3.5源码分析-内存管理
Python3源码分析本文环境python3.5.2. 参考书籍<<Python源码剖析>> python官网 Python3的内存管理概述 python提供了对内存的垃圾收 ...

pyinstxtractor 源码分析及填坑

pyinstxtractor 源码分析及填坑相关推荐

最新文章

热门文章