Skip to content

Re-assemble Python disassembly text to bytecode

Notifications You must be signed in to change notification settings

JorianWoltjer/python-reassembler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python re-assembler

Re-assemble Python disassembly text to bytecode

This project serves a pretty unique use case, where you have output of the dis.dis() module in the form of disassembled bytecode:

  0 LOAD_GLOBAL              0 (print)
  2 LOAD_CONST               1 ('Hello,')
  4 LOAD_CONST               2 ('world!')
  6 CALL_FUNCTION            2
  8 POP_TOP
 10 LOAD_CONST               0 (None)
 12 RETURN_VALUE

Using this tool, the above can be parsed and turned back into raw bytecode. This allows decompilers like uncompyle6, decompyle3 and pycdc to work with the bytes.

$ python-reassembler debug/example.txt -q
Opening 'example.txt' and reassembling the code object...
[SUCCESS] Code object written to 'output.pyc'
Now use a decompiler like `uncompyle6`, `decompyle3`, or `pycdc` on the output to decompile it

$ decompyle3 output.pyc
# decompyle3 version 3.9.1
print("Hello,", "world!")
# okay decompiling output.pyc

The inspiration for this tool came from a PicoCTF 2024 challenge named 'weirdSnake', where players received a text file containing disassembled Python bytecode. The easiest solution there would be to manually try and understand the code and became quite easy with some educated guesses. But this made me wonder, would it be possible to turn this back into clean readable source code?
The answer: Yes! You can solve the challenge by re-assembling the input file at debug/4.txt, and decompiling it with decompyle3:

$ python-reassembler debug/4.txt
[SUCCESS] Code object written to 'output.pyc'
$ decompyle3 output.pyc | tee output.py
# decompyle3 version 3.9.1
input_list = [
 4,54,41,0,112,32,25,49,33,3,0,0,57,32,108,23,48,4,9,70,7,110,36,8,108,7,49,10,4,86,43,102,126,92,0,16,58,41,89,78]
key_str = "J"
key_str = "_" + key_str
key_str = key_str + "o"
key_str = key_str + "3"
key_str = "t" + key_str
key_list = [ord(char) for char in key_str]
while len(key_list) < len(input_list):
    key_list.extend(key_list)

result = [a ^ b for a, b in zip(input_list, key_list)]
result_text = "".join(map(chr, result))
# okay decompiling output.pyc
$ python -i output.py
>>> result_text
'picoCTF{...}'

Installation

git clone https://github.com/JorianWoltjer/python-reassembler.git && cd python-reassembler
python3 -m pip install -e .

Then use the python-reassembler binary on any file to re-assemble it.

Testing

To ensure correctness of the output this tool generates, several tests are included in the tests/ folder. Using pytest these can be ran.

First, it compares real source code against re-assembled and decompiled output to make sure the result is the same. Then it also compares compiled and decompiled code against the re-assembled and decompiled output, to make sure any issues aren't just bugs in the decompyle3 decompiler.

If you find any input that does not re-assemble correctly, please write a new .py file into the tests/ folder that fails to be re-assembled, and create an Issue showing the input, and/or a Pull Request that fixes it!

Resources

About

Re-assemble Python disassembly text to bytecode

Topics

Resources

Stars

Watchers

Forks

Languages