Python310新特性:Structural Pattern Matching在VM虚拟机逆向中的妙用
2024-2-12 17:15:28 Author: mp.weixin.qq.com(查看原文) 阅读量:5 收藏


前言

这种写法最初是在2022-GoogleCTF-eldar, 国外的DiceGang的hgarrereyn师傅的wp:https://ctf.harrisongreen.me/2022/googlectf/eldar/中首次用到。也是用于解析虚拟机, 不过是用于解析的ELF metadata-driven turing weird machine。

然后后来国内的2022-强网杯中的deeprev又将这个ELF metadata-driven turing weird machine出了一次, 我也用其来写过这个重定位机的解析,确实效果非常好,可以说毫不夸张像魔法一样。

当时就在Todolist中写道,用Structural Pattern Matching这种新特性去写下正常虚拟机的解析, 肯定属于是轻轻松松。后续工作上的事情就放下了没去完成,在我的Todolist中吃灰了接近一年,这一年都在被工作推着走,每天就像机器人一样去执行自己头天写的指令,记忆好像也变差了,经常忘事情,年末项目交付了一些了才有时间弄些自己的,创业之路真的很难。

言归正传, 后续在dicectf-2022-breach这道题的wp:https://github.com/reductor/dice-ctf-2022-breach-writeup中,被正式用于了解析常规虚拟机。

直至放到了今日,才回来写,其实虚拟机解析之前我在之前已经发过不少。总结来说,这种方法属于是disassembler的升级版, 远优于之前发的disassembler, 你说它优于decompiler吗?我无法给出一个肯定答案,毕竟decompiler属于一种抽象为高级语言的思路。

Learn Structural Pattern Matching

Structural Pattern Matching介绍

PEP 634 – Structural Pattern Matching: Specification(https://peps.python.org/pep-0634/):介绍 match 语法和支持的模式
PEP 635 – Structural Pattern Matching: Motivation and Rationale(https://peps.python.org/pep-0635/):解释语法这么设计的理由
PEP 636 – Structural Pattern Matching: Tutorial(https://peps.python.org/pep-0636/):一个教程,介绍概念、语法和语义
match patterns:
Mapping patterns: match mapping structures like dictionaries.
Sequence patterns: match sequence structures like tuples and lists.
Capture patterns: bind values to names.
AS patterns: bind the value of subpatterns to names.
OR patterns: match one of several different subpatterns.
Wildcard patterns: match anything.
Class patterns: match class structures.
Value patterns: match values stored in attributes.
Literal patterns: match literal values.

Capture patterns(捕捉模式)

匹配一个模式,并绑定到一个name:
def sum_list(numbers):
match numbers:
case []: # 匹配空列表
return 0
case [first, *rest]:# sequence pattern, 由两个capture pattern 组成的 sequence pattern。*rest 匹配剩下的所有元素
return first + sum_list(rest)
def average(*args):
match args:
case [x, y]: # captures the two elements of a sequence
return (x + y) / 2
case [x]: # captures the only element of a sequence
return x
case []:
return 0
case a: # captures the entire sequence
return sum(a) / len(a)

guards(向模式添加条件)

用来进一步限制匹配模式,如下:
# 从小到大排序
def sort(seq):
match seq:
case [] | [_]: # 匹配空序列[] 或者 非空列表中的任何单个元素[_]
return seq
case [x, y] if x <= y:
return seq
case [x, y]:
return [y, x]
case [x, y, z] if x <= y <= z:
return seq
case [x, y, z] if x >= y >= z:
return [z, y, x]
case [p, *rest]:
a = sort([x for x in rest if x <= p]) # 比p小的去排序
b = sort([x for x in rest if p < x]) # 比p大的去排序
return a + [p] + b

AS Patterns(as模式)

给限制条件取别名,使其能够与bind name一起工作。
子模式在 match 语法里面是可以灵活组合的。
In : def as_pattern(obj):
...: match obj:
...: case str() as s:
...: print(f'Got str: {s=}')
...: case [0, int() as i]:
...: print(f'Got int: {i=}')
...: case [tuple() as tu]:
...: print(f'Got tuple: {tu=}')
...: case list() | set() | dict() as iterable:
...: print(f'Got iterable: {iterable=}')
...:
...:

In : as_pattern('sss')
Got str: s='sss'

In : as_pattern([0, 1])
Got int: i=1

In : as_pattern([(1,)])
Got tuple: tu=(1,)

In : as_pattern([1, 2, 3])
Got iterable: iterable=[1, 2, 3]

In : as_pattern({'a': 1})
Got iterable: iterable={'a': 1}

def simplify_expr(tokens):
match tokens:
case [('('|'[') as l, *expr, (')'|']') as r] if (l+r) in ('()', '[]'):
return simplify_expr(expr)
case [0, ('+'|'-') as op, right]:
return UnaryOp(op, right)
case [(int() | float() as left) | Num(left), '+', (int() | float() as right) | Num(right)]:
return Num(left + right)
case [(int() | float()) as value]:
return Num(value)

OR Patterns(或模式)

第一种写法,用逗号分隔:
case 401, 403, 404:
print("Some HTTP error")
第二种写法与C语言类似:
case 401:
case 403:
case 404:
print("Some HTTP error")
第三种写法:
case in 401, 403, 404:
print("Some HTTP error")
第四种写法:
case ("a"|"b"|"c"):
第五种写法:
case ("a"|"b"|"c") as letter:

Literal Patterns(字面量模式)

使用 Python 自带的基本数据结构,如字符串、数字、布尔值和 None等。
match number:
case 0:
print('zero')
case 1:
print('one')
case 2:
print('two')
def simplify(expr):
match expr:
case ('+', 0, x): # x + 0
return x
case ('+' | '-', x, 0): # x +- 0
return x
case ('and', True, x): # True and x
return x
case ('and', False, x):
return False
case ('or', False, x):
return x
case ('or', True, x):
return True
case ('not', ('not', x)):
return x
return expr

Wildcard Pattern(通配符模式)

Wildcard Pattern 是一种特殊的 capture pattern,它接收任何值,但是不将该值绑定到任何一个变量(其实就是忽略不关心的位置)。
def is_closed(sequence):
match sequence:
case [_]: # any sequence with a single element
return True
case [start, *_, end]: # a sequence with at least two elements
return start == end
case _: # anything
return False

Value Patterns(值模式)

这种模式主要匹配常量或者 enum 模块的枚举值:
In : class Color(Enum):
...: RED = 1
...: GREEN = 2
...: BLUE = 3
...:

In : class NewColor:
...: YELLOW = 4
...:

In : def constant_value(color):
...: match color:
...: case Color.RED:
...: print('Red')
...: case NewColor.YELLOW:
...: print('Yellow')
...: case new_color:
...: print(new_color)
...:

In : constant_value(Color.RED) # 匹配第一个case
Red

In : constant_value(NewColor.YELLOW) # 匹配第二个case
Yellow

In : constant_value(Color.GREEN) # 匹配第三个case
Color.GREEN

In : constant_value(4) # 常量值一样都匹配第二个case
Yellow

In : constant_value(10) # 其他常量
10

这里注意,因为 case 具有绑定的作用,所以不能直接使用 YELLOW 这种常量,例如下面这样:
YELLOW = 4

def constant_value(color):
match color:
case YELLOW:
print('Yellow')
# 这样语法是错误的

就是在模式中使用其他变量的值,那么使用的其他变量与 capture 模式的绑定名如何区分呢?用 "." 区分。
目前只能使用带 '.' 的常量。
class Codes:
SUCCESS = 200
NOT_FOUND = 404

def handle(retcode):
match retcode:
case Codes.SUCCESS:
print('success')
case Codes.NOT_FOUND:
print('not found')
case _:
print('unknown')

Sequence Patterns(序列模式)

可以在 match 里使用列表或者元组格式的结果。
不区分 [a, b, c], (a, b, c) 和 a, b, c,它们是等价的,若要明确判断类型则需要 list([a, b, c])。
加星号的模式会匹配任意长度的元素,例如 (*, 3, *), 匹配任何含有 3 的列表。
不会迭代整个迭代器,所有的元素以下标和切片的形式访问。
In : def sequence(collection):
...: match collection:
...: case 1, [x, *others]:
...: print(f"Got 1 and a nested sequence: {x=}, {others=}")
...: case (1, x):
...: print(f"Got 1 and {x}")
...: case [x, y, z]:
...: print(f"{x=}, {y=}, {z=}")
...:

In : sequence([1])

In : sequence([1, 2])
Got 1 and 2

In : sequence([1, 2, 3])
x=1, y=2, z=3

In : sequence([1, [2, 3]])
Got 1 and a nested sequence: x=2, others=[3]

In : sequence([1, [2, 3, 4]])
Got 1 and a nested sequence: x=2, others=[3, 4]

In : sequence([2, 3])

In : sequence((1, 2))
Got 1 and 2

Mapping Patterns(映射模式)

为了效率,key 必须是常量(literals、value patterns)
其实就是 case 后支持使用字典做匹配。
In : def mapping(config):
...: match config:
...: case {'sub': sub_config, **rest}:
...: print(f'Sub: {sub_config}')
...: print(f'OTHERS: {rest}')
...: case {'route': route}:
...: print(f'ROUTE: {route}')
...:

In : mapping({})

In : mapping({'route': '/auth/login'})
ROUTE: /auth/login

# 匹配有sub键的字典,值绑定到sub_config上,字典其他部分绑定到rest上
In : mapping({'route': '/auth/login', 'sub': {'a': 1}})
Sub: {'a': 1}
OTHERS: {'route': '/auth/login'}

def change_red_to_blue(json_obj):
match json_obj:
case { 'color': ('red' | '#FF0000') }:
json_obj['color'] = 'blue'
case { 'children': children }:
for child in children:
change_red_to_blue(child)

Class Patterns(类模式)

Class Patterns 主要实现两个目标:检查对象是某个类的实例、从对象的特定属性中提取数据。
# case 后支持任何对象做匹配。我们先来一个错误的示例:

In : class Point:
...: def __init__(self, x, y):
...: self.x = x
...: self.y = y
...:

In : def class_pattern(obj):
...: match obj:
...: case Point(x, y):
...: print(f'Point({x=},{y=})')
...:

In : class_pattern(Point(1, 2))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [], in <cell line: 1>()
----> 1 class_pattern(Point(1, 2))

Input In [], in class_pattern(obj)
1 def class_pattern(obj):
2 match obj:
----> 3 case Point(x, y):
4 print(f'Point({x=},{y=})')

TypeError: Point() accepts 0 positional sub-patterns (2 given)

# 这是因为对于匹配来说, 位置需要确定 ,所以需要使用位置参数来标识:

In : def class_pattern(obj):
...: match obj:
...: case Point(x=1, y=2):
...: print(f'match')
...:

In : class_pattern(Point(1, 2))
match

# 另外一个解决这种自定义类不用位置参数的匹配方案,使用 __match_args__ 返回一个位置参数的数组,
# 就像这样:
In : class Point:
...: __match_args__ = ('x', 'y')
...:
...: def __init__(self, x, y):
...: self.x = x
...: self.y = y
...:

# 还有就是用dataclass, 这里的 Point2 使用了标准库的 dataclasses.dataclass 装饰器
# 它会提供 __match_args__ 属性,所以可以直接用
In : from dataclasses import dataclass

In : @dataclass
...: class Point2:
...: x: int
...: y: int
...:

In : def class_pattern(obj):
...: match obj:
...: case Point(x, y):
...: print(f'Point({x=},{y=})')
...: case Point2(x, y):
...: print(f'Point2({x=},{y=})')
...:

In : class_pattern(Point(1, 2))
Point(x=1,y=2)

In : class_pattern(Point2(1, 2))
Point2(x=1,y=2)

def eval_expr(expr):
"""Evaluate an expression and return the result."""
match expr:
case BinaryOp('+', left, right):
return eval_expr(left) + eval_expr(right)
case BinaryOp('-', left, right):
return eval_expr(left) - eval_expr(right)
case BinaryOp('*', left, right):
return eval_expr(left) * eval_expr(right)
case BinaryOp('/', left, right):
return eval_expr(left) / eval_expr(right)
case UnaryOp('+', arg):
return eval_expr(arg)
case UnaryOp('-', arg):
return -eval_expr(arg)
case VarExpr(name):
raise ValueError(f"Unknown value of: {name}")
case float() | int():
return expr
case _:
raise ValueError(f"Invalid expression value: {repr(expr)}")
另外一个例子:
match media_object:
case Image(type="jpg"):
return media_object
case Image(type="png") | Image(type="gif"):
return render_as(media_object, "jpg")
case Video():
raise ValueError("Can't extract frames from video yet")
case other_type:
raise Exception(f"Media type {media_object} can't be handled yet")
namedtuple 例子,也属于是 class pattern:
from collections import namedtuple
Mov = namedtuple('mov', ['dst', 'src', 'sz', 'ridx'])
switch op:
case Mov(dst, src, 8, ridx):
pass

Type Unions, Aliases, and Guards

numbers 的类型指定为 List,元素类型可以是 float 或 int。
def mean(numbers: list[float | int]) -> float:
return sum(numbers) / len(numbers)
可以定义类型别名,类型检查器和程序员都可以识别到这种模式:
from typing import TypeAlias

Card: TypeAlias = tuple[str, str] # ('', '')
Deck: TypeAlias = list[Card] # [('', '')]

Type guards用于缩小 type union 的范围。


一般这种disassembler都是逐渐去优化的,优化到最后能使用https://docs.pwntools.com/en/stable/asm.html#pwnlib.asm.make_elf_from_assembly。

直接装配成一个elf。

1:建立指令类型,写出parse

◆Ezmachine-disassembler-parsefunc.py
from collections import namedtuplefrom dataclasses import dataclass
@dataclassclass Regs(object): idx: int
def __repr__(self): if self.idx == 0: return "eax" elif self.idx == 1: return "ebx" elif self.idx == 2: return "ecx" elif self.idx == 3: return "edx" else: return "unknown reg {}".format(self.idx)
Nop = namedtuple("Nop", ["addr"]) # case 0: nopMovReg = namedtuple("MovReg", ["addr", "dst", "imm"]) # case 1: mov reg, immPushImm = namedtuple("PushImm", ["addr", "imm"]) # case 2: push immPushReg = namedtuple("PushReg", ["addr", "reg"]) # case 3: push regPopReg = namedtuple("PopReg", ["addr", "reg"]) # case 4: pop reg# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'PrintStr = namedtuple("PrintStr", ["addr"])
AddReg = namedtuple("AddReg", ["addr", "dst", "src"]) # case 6: add reg, regSubReg = namedtuple("SubReg", ["addr", "dst", "src"]) # case 7: sub reg, regMulReg = namedtuple("MulReg", ["addr", "dst", "src"]) # case 8: mul reg, regDivReg = namedtuple("DivReg", ["addr", "dst", "src"]) # case 9: div reg, regXorReg = namedtuple("XorReg", ["addr", "dst", "src"]) # case 10: xor reg, reg
Jmp = namedtuple("Jmp", ["addr", "target"]) # case 11: jmp addrCmp = namedtuple("Cmp", ["addr", "dst", "src"]) # case 12: cmp reg, regJz = namedtuple("Jz", ["addr", "target"]) # case 13: jz addrJnz = namedtuple("Jnz", ["addr", "target"]) # case 14: jnz addrJg = namedtuple("Jg", ["addr", "target"]) # case 15: jg addrJl = namedtuple("Jl", ["addr", "target"]) # case 16: jl addr
# case 17: gets(mem); eax=strlen(mem);InputStr = namedtuple("InputStr", ["addr"])
InitMem = namedtuple( "InitMem", ["addr", "mem_addr", "sz"]) # case 18: memset(mem_addr, 0, sz)
MovRegStack = namedtuple( "MovRegStack", ["addr", "dst", "src"]) # case 19: mov reg, [ebp-src]
MovRegMem = namedtuple( "MovRegMem", ["addr", "dst", "src"]) # case 20: mov reg, mem[src]
Exit = namedtuple("Exit", ["addr"]) # case 0xff: exit(0)
def parse(buffer): instructions = []
pc = 0 while pc < len(buffer): opcode = buffer[pc]
match opcode: case 0: instructions.append(Nop(pc)) pc += 1 case 1: dst = buffer[pc + 1] imm = buffer[pc + 2] instructions.append(MovReg(pc, Regs(dst), imm)) pc += 3 case 2: imm = buffer[pc + 1] instructions.append(PushImm(pc, imm)) pc += 3 case 3: reg = buffer[pc + 1] instructions.append(PushReg(pc, Regs(reg))) pc += 3 case 4: reg = buffer[pc + 1] instructions.append(PopReg(pc, Regs(reg))) pc += 3 case 5: instructions.append(PrintStr(pc)) pc += 3 case 6: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(AddReg(pc, Regs(dst), Regs(src))) pc += 3 case 7: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(SubReg(pc, Regs(dst), Regs(src))) pc += 3 case 8: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(MulReg(pc, Regs(dst), Regs(src))) pc += 3 case 9: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(DivReg(pc, Regs(dst), Regs(src))) pc += 3 case 10: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(XorReg(pc, Regs(dst), Regs(src))) pc += 3 case 11: target = 3 * buffer[pc + 1] - 3 instructions.append(Jmp(pc, target)) pc += 3 case 12: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(Cmp(pc, Regs(dst), Regs(src))) pc += 3 case 13: target = 3 * buffer[pc + 1] - 3 instructions.append(Jz(pc, target)) pc += 3 case 14: target = 3 * buffer[pc + 1] - 3 instructions.append(Jnz(pc, target)) pc += 3 case 15: target = 3 * buffer[pc + 1] - 3 instructions.append(Jg(pc, target)) pc += 3 case 16: target = 3 * buffer[pc + 1] - 3 instructions.append(Jl(pc, target)) pc += 3 case 17: instructions.append(InputStr(pc)) pc += 3 case 18: mem_addr = buffer[pc + 1] sz = buffer[pc + 2] instructions.append(InitMem(pc, mem_addr, sz)) pc += 3 case 19: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(MovRegStack(pc, Regs(dst), Regs(src))) pc += 3 case 20: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(MovRegMem(pc, Regs(dst), Regs(src))) pc += 3 case 255: instructions.append(Exit(pc)) pc += 3 case _: raise Exception(f"unknown opcode: {opcode} at {pc}") break
return instructions
if __name__ == '__main__': opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00] instructions = parse(opcode) for ins in instructions: print(ins)
◆Ezmachine-disassembler-parsefunc.out
MovReg(addr=0, dst=edx, imm=3)PrintStr(addr=3)InputStr(addr=6)MovReg(addr=9, dst=ebx, imm=17)Cmp(addr=12, dst=eax, src=ebx)Jz(addr=15, target=27)MovReg(addr=18, dst=edx, imm=1)PrintStr(addr=21)Exit(addr=24)MovReg(addr=27, dst=ecx, imm=0)MovReg(addr=30, dst=eax, imm=17)Cmp(addr=33, dst=eax, src=ecx)Jz(addr=36, target=126)MovRegMem(addr=39, dst=eax, src=ecx)MovReg(addr=42, dst=ebx, imm=97)Cmp(addr=45, dst=eax, src=ebx)Jl(addr=48, target=75)MovReg(addr=51, dst=ebx, imm=122)Cmp(addr=54, dst=eax, src=ebx)Jg(addr=57, target=75)MovReg(addr=60, dst=ebx, imm=71)XorReg(addr=63, dst=eax, src=ebx)MovReg(addr=66, dst=ebx, imm=1)AddReg(addr=69, dst=eax, src=ebx)Jmp(addr=72, target=105)MovReg(addr=75, dst=ebx, imm=65)Cmp(addr=78, dst=eax, src=ebx)Jl(addr=81, target=105)MovReg(addr=84, dst=ebx, imm=90)Cmp(addr=87, dst=eax, src=ebx)Jg(addr=90, target=105)MovReg(addr=93, dst=ebx, imm=75)XorReg(addr=96, dst=eax, src=ebx)MovReg(addr=99, dst=ebx, imm=1)SubReg(addr=102, dst=eax, src=ebx)MovReg(addr=105, dst=ebx, imm=16)DivReg(addr=108, dst=eax, src=ebx)PushReg(addr=111, reg=ebx)PushReg(addr=114, reg=eax)MovReg(addr=117, dst=ebx, imm=1)AddReg(addr=120, dst=ecx, src=ebx)Jmp(addr=123, target=30)PushImm(addr=126, imm=7)PushImm(addr=129, imm=13)PushImm(addr=132, imm=0)PushImm(addr=135, imm=5)PushImm(addr=138, imm=1)PushImm(addr=141, imm=12)PushImm(addr=144, imm=1)PushImm(addr=147, imm=0)PushImm(addr=150, imm=0)PushImm(addr=153, imm=13)PushImm(addr=156, imm=5)PushImm(addr=159, imm=15)PushImm(addr=162, imm=0)PushImm(addr=165, imm=9)PushImm(addr=168, imm=5)PushImm(addr=171, imm=15)PushImm(addr=174, imm=3)PushImm(addr=177, imm=0)PushImm(addr=180, imm=2)PushImm(addr=183, imm=5)PushImm(addr=186, imm=3)PushImm(addr=189, imm=3)PushImm(addr=192, imm=1)PushImm(addr=195, imm=7)PushImm(addr=198, imm=7)PushImm(addr=201, imm=11)PushImm(addr=204, imm=2)PushImm(addr=207, imm=1)PushImm(addr=210, imm=2)PushImm(addr=213, imm=7)PushImm(addr=216, imm=2)PushImm(addr=219, imm=12)PushImm(addr=222, imm=2)PushImm(addr=225, imm=2)MovReg(addr=228, dst=ecx, imm=1)MovRegStack(addr=231, dst=ebx, src=ecx)PopReg(addr=234, reg=eax)Cmp(addr=237, dst=eax, src=ebx)Jnz(addr=240, target=270)MovReg(addr=243, dst=ebx, imm=34)Cmp(addr=246, dst=ecx, src=ebx)Jz(addr=249, target=264)MovReg(addr=252, dst=ebx, imm=1)AddReg(addr=255, dst=ecx, src=ebx)Jmp(addr=258, target=231)MovReg(addr=261, dst=edx, imm=0)PrintStr(addr=264)Exit(addr=267)MovReg(addr=270, dst=edx, imm=1)PrintStr(addr=273)Exit(addr=276)Nop(addr=279)
拿parsefunc.out的原因是检查parse及指定类型定义是否合理。

2:编写初步dump

◆Ezmachine-disassembler-version0.py
from collections import namedtuplefrom dataclasses import dataclass
@dataclassclass Regs(object): idx: int
def __repr__(self): if self.idx == 0: return "eax" elif self.idx == 1: return "ebx" elif self.idx == 2: return "ecx" elif self.idx == 3: return "edx" else: return "unknown reg {}".format(self.idx)
Nop = namedtuple("Nop", ["addr"]) # case 0: nopMovReg = namedtuple("MovReg", ["addr", "dst", "imm"]) # case 1: mov reg, immPushImm = namedtuple("PushImm", ["addr", "imm"]) # case 2: push immPushReg = namedtuple("PushReg", ["addr", "reg"]) # case 3: push regPopReg = namedtuple("PopReg", ["addr", "reg"]) # case 4: pop reg# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'PrintStr = namedtuple("PrintStr", ["addr"])
AddReg = namedtuple("AddReg", ["addr", "dst", "src"]) # case 6: add reg, regSubReg = namedtuple("SubReg", ["addr", "dst", "src"]) # case 7: sub reg, regMulReg = namedtuple("MulReg", ["addr", "dst", "src"]) # case 8: mul reg, regDivReg = namedtuple("DivReg", ["addr", "dst", "src"]) # case 9: div reg, regXorReg = namedtuple("XorReg", ["addr", "dst", "src"]) # case 10: xor reg, reg
Jmp = namedtuple("Jmp", ["addr", "target"]) # case 11: jmp addrCmp = namedtuple("Cmp", ["addr", "dst", "src"]) # case 12: cmp reg, regJz = namedtuple("Jz", ["addr", "target"]) # case 13: jz addrJnz = namedtuple("Jnz", ["addr", "target"]) # case 14: jnz addrJg = namedtuple("Jg", ["addr", "target"]) # case 15: jg addrJl = namedtuple("Jl", ["addr", "target"]) # case 16: jl addr
# case 17: gets(mem); eax=strlen(mem);InputStr = namedtuple("InputStr", ["addr"])
InitMem = namedtuple( "InitMem", ["addr", "mem_addr", "sz"]) # case 18: memset(mem_addr, 0, sz)
MovRegStack = namedtuple( "MovRegStack", ["addr", "dst", "src"]) # case 19: mov reg, [ebp-src]
MovRegMem = namedtuple( "MovRegMem", ["addr", "dst", "src"]) # case 20: mov reg, mem[src]
Exit = namedtuple("Exit", ["addr"]) # case 0xff: exit(0)
def parse(buffer): instructions = []
pc = 0 while pc < len(buffer): opcode = buffer[pc]
match opcode: case 0: instructions.append(Nop(pc)) pc += 1 case 1: dst = buffer[pc + 1] imm = buffer[pc + 2] instructions.append(MovReg(pc, Regs(dst), imm)) pc += 3 case 2: imm = buffer[pc + 1] instructions.append(PushImm(pc, imm)) pc += 3 case 3: reg = buffer[pc + 1] instructions.append(PushReg(pc, Regs(reg))) pc += 3 case 4: reg = buffer[pc + 1] instructions.append(PopReg(pc, Regs(reg))) pc += 3 case 5: instructions.append(PrintStr(pc)) pc += 3 case 6: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(AddReg(pc, Regs(dst), Regs(src))) pc += 3 case 7: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(SubReg(pc, Regs(dst), Regs(src))) pc += 3 case 8: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(MulReg(pc, Regs(dst), Regs(src))) pc += 3 case 9: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(DivReg(pc, Regs(dst), Regs(src))) pc += 3 case 10: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(XorReg(pc, Regs(dst), Regs(src))) pc += 3 case 11: target = 3 * buffer[pc + 1] - 3 instructions.append(Jmp(pc, target)) pc += 3 case 12: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(Cmp(pc, Regs(dst), Regs(src))) pc += 3 case 13: target = 3 * buffer[pc + 1] - 3 instructions.append(Jz(pc, target)) pc += 3 case 14: target = 3 * buffer[pc + 1] - 3 instructions.append(Jnz(pc, target)) pc += 3 case 15: target = 3 * buffer[pc + 1] - 3 instructions.append(Jg(pc, target)) pc += 3 case 16: target = 3 * buffer[pc + 1] - 3 instructions.append(Jl(pc, target)) pc += 3 case 17: instructions.append(InputStr(pc)) pc += 3 case 18: mem_addr = buffer[pc + 1] sz = buffer[pc + 2] instructions.append(InitMem(pc, mem_addr, sz)) pc += 3 case 19: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(MovRegStack(pc, Regs(dst), src)) pc += 3 case 20: dst = buffer[pc + 1] src = buffer[pc + 2] instructions.append(MovRegMem(pc, Regs(dst), src)) pc += 3 case 255: instructions.append(Exit(pc)) pc += 3 case _: raise Exception(f"unknown opcode: {opcode} at {pc}") break
return instructions
def dump(instructions): for ins in instructions: match ins: case Nop(addr): print(f"_0x{addr:04x}: nop") case MovReg(addr, dst, imm): print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}") case PushImm(addr, imm): print(f"_0x{addr:04x}: push 0x{imm:02x}") case PushReg(addr, reg): print(f"_0x{addr:04x}: push {reg}") case PopReg(addr, reg): print(f"_0x{addr:04x}: pop {reg}") case PrintStr(addr): print(f"_0x{addr:04x}: print_str") case AddReg(addr, dst, src): print(f"_0x{addr:04x}: add {dst}, {src}") case SubReg(addr, dst, src): print(f"_0x{addr:04x}: sub {dst}, {src}") case MulReg(addr, dst, src): print(f"_0x{addr:04x}: mul {dst}, {src}") case DivReg(addr, dst, src): print(f"_0x{addr:04x}: div {dst}, {src}") case XorReg(addr, dst, src): print(f"_0x{addr:04x}: xor {dst}, {src}") case Jmp(addr, target): print(f"_0x{addr:04x}: jmp _0x{target:04x}") case Cmp(addr, dst, src): print(f"_0x{addr:04x}: cmp {dst}, {src}") case Jz(addr, target): print(f"_0x{addr:04x}: jz _0x{target:04x}") case Jnz(addr, target): print(f"_0x{addr:04x}: jnz _0x{target:04x}") case Jg(addr, target): print(f"_0x{addr:04x}: jg _0x{target:04x}") case Jl(addr, target): print(f"_0x{addr:04x}: jl _0x{target:04x}") case InputStr(addr): print(f"_0x{addr:04x}: input_str") case InitMem(addr, mem_addr, sz): print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})") case MovRegStack(addr, dst, src): print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]") case MovRegMem(addr, dst, src): print(f"_0x{addr:04x}: mov {dst}, mem[{src}]") case Exit(addr): print(f"_0x{addr:04x}: exit(0)") case _: raise Exception(f"unknown instruction: {ins}") break
if __name__ == '__main__': opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00] instructions = parse(opcode) dump(instructions)
◆Ezmachine-disassembler-dumpfunc-version0.out
_0x0000: mov edx, 0x03_0x0003: print_str_0x0006: input_str_0x0009: mov ebx, 0x11_0x000c: cmp eax, ebx_0x000f: jz _0x001b_0x0012: mov edx, 0x01_0x0015: print_str_0x0018: exit(0)_0x001b: mov ecx, 0x00_0x001e: mov eax, 0x11_0x0021: cmp eax, ecx_0x0024: jz _0x007e_0x0027: mov eax, mem[2]_0x002a: mov ebx, 0x61_0x002d: cmp eax, ebx_0x0030: jl _0x004b_0x0033: mov ebx, 0x7a_0x0036: cmp eax, ebx_0x0039: jg _0x004b_0x003c: mov ebx, 0x47_0x003f: xor eax, ebx_0x0042: mov ebx, 0x01_0x0045: add eax, ebx_0x0048: jmp _0x0069_0x004b: mov ebx, 0x41_0x004e: cmp eax, ebx_0x0051: jl _0x0069_0x0054: mov ebx, 0x5a_0x0057: cmp eax, ebx_0x005a: jg _0x0069_0x005d: mov ebx, 0x4b_0x0060: xor eax, ebx_0x0063: mov ebx, 0x01_0x0066: sub eax, ebx_0x0069: mov ebx, 0x10_0x006c: div eax, ebx_0x006f: push ebx_0x0072: push eax_0x0075: mov ebx, 0x01_0x0078: add ecx, ebx_0x007b: jmp _0x001e_0x007e: push 0x07_0x0081: push 0x0d_0x0084: push 0x00_0x0087: push 0x05_0x008a: push 0x01_0x008d: push 0x0c_0x0090: push 0x01_0x0093: push 0x00_0x0096: push 0x00_0x0099: push 0x0d_0x009c: push 0x05_0x009f: push 0x0f_0x00a2: push 0x00_0x00a5: push 0x09_0x00a8: push 0x05_0x00ab: push 0x0f_0x00ae: push 0x03_0x00b1: push 0x00_0x00b4: push 0x02_0x00b7: push 0x05_0x00ba: push 0x03_0x00bd: push 0x03_0x00c0: push 0x01_0x00c3: push 0x07_0x00c6: push 0x07_0x00c9: push 0x0b_0x00cc: push 0x02_0x00cf: push 0x01_0x00d2: push 0x02_0x00d5: push 0x07_0x00d8: push 0x02_0x00db: push 0x0c_0x00de: push 0x02_0x00e1: push 0x02_0x00e4: mov ecx, 0x01_0x00e7: mov ebx, [ebp-2]_0x00ea: pop eax_0x00ed: cmp eax, ebx_0x00f0: jnz _0x010e_0x00f3: mov ebx, 0x22_0x00f6: cmp ecx, ebx_0x00f9: jz _0x0108_0x00fc: mov ebx, 0x01_0x00ff: add ecx, ebx_0x0102: jmp _0x00e7_0x0105: mov edx, 0x00_0x0108: print_str_0x010b: exit(0)_0x010e: mov edx, 0x01_0x0111: print_str_0x0114: exit(0)_0x0117: nop
其实这里拿到的Ezmachine-disassembler-dumpfunc-version0.out,就跟以前我们的disassembler得到的差不多。
拿这个dumpfunc-version0.out的目的,就是为了参考这个去做优化。

3:优化

- (1) 添加函数头尾

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_QUTPQFYHWRH7AJ4.webp)

由于头和尾都是直接开始的指令,没有栈帧,我们为其添加

```python
from collections import namedtuple
from dataclasses import dataclass

......

# 优化(1): 添加main函数序言和结尾
prologue = namedtuple("prologue", [])
epilogue = namedtuple("epilogue", [])
def add_main_prologue_epilogue(instructions):
instructions.insert(0, prologue())
instructions.append(epilogue())
return instructions

def dump(instructions):
for ins in instructions:
match ins:
case prologue():
print(f"push ebp")
print(f"mov ebp, esp")
case epilogue():
print(f"mov esp, ebp")
print(f"pop ebp")
print(f"ret")
......
case _:
raise Exception(f"unknown instruction: {ins}")
break

if __name__ == '__main__':
opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]
instructions = parse(opcode)
instructions = add_main_prologue_epilogue(instructions)
dump(instructions)
```

- (2) 处理VM中mem及字符串

```python
.....

# VM中要使用的内存
def dump_data():
print("\n")
print("""right:\n .asciz "right" """)
print("""wrong:\n .asciz "wrong" """)
print("""plz_input:\n .asciz "plz input:" """)
print("""hacker:\n .asciz "hacker" """)
print("""mem:\n .space 0x100 """)

if __name__ == '__main__':
opcode = [...]
instructions = parse(opcode)
instructions = add_main_prologue_epilogue(instructions)
dump(instructions)
dump_data()
```

- (3) 处理print_str

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_4U5PDW3GE26ETHF.webp)

我们弄出来的汇编中有这种语句

```python
# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'
PrintStr = namedtuple("PrintStr", ["addr"])
```

其主要就是根据edx的值,来打印不同的字符串

难以避免的要进行函数调用,我们可以借用pwntools的shellcraft来产生:https://docs.pwntools.com/en/stable/shellcraft/i386.html#module-pwnlib.shellcraft.i386.linux

```python
from collections import namedtuple
from dataclasses import dataclass

.....
write_func_call = namedtuple("write_func_call", ["addr", "str_idx"])
# 优化(3): 处理print_str
def handle_print_str(instructions):
"""
_0x0000: mov edx, 0x03
_0x0003: print_str

_0x0012: mov edx, 0x01
_0x0015: print_str

_0x0105: mov edx, 0x00
_0x0108: print_str

_0x010e: mov edx, 0x01
_0x0111: print_str
"""
idx = 0
while idx < len(instructions):
match instructions[idx: idx+2]:
case [
MovReg(addr1, Regs(3), imm),
PrintStr(addr2)
] if (imm == 0x00 or imm == 0x01 or imm == 0x03 or imm == 0x04):
instructions[idx: idx+2] = [write_func_call(addr2, imm)]
idx += 1

def dump(instructions):
for ins in instructions:
match ins:
......
case write_func_call(addr, str_idx):
if str_idx == 0:
print_right = f"""/* write(fd=1, buf='right', n=5) */
_0x{addr:04x}: pushad
push 1
pop ebx
mov ecx, right
push 5
pop edx
push SYS_write /* 4 */
pop eax
int 0x80
popad
"""
print(print_right)
elif str_idx == 1:
print_wrong = f"""/* write(fd=1, buf='wrong', n=5) */
_0x{addr:04x}: pushad
push 1
pop ebx
mov ecx, wrong
push 5
pop edx
push SYS_write /* 4 */
pop eax
int 0x80
popad
"""
print(print_wrong)
elif str_idx == 3:
print_plz_input = f"""/* write(fd=1, buf='plz input:', n=10) */
_0x{addr:04x}: pushad
push 1
pop ebx
mov ecx, plz_input
push 10
pop edx
push SYS_write /* 4 */
pop eax
int 0x80
popad
"""
print(print_plz_input)
elif str_idx == 4:
print_hacker = f"""/* write(fd=1, buf='hacker', n=6) */
_0x{addr:04x}: pushad
push 1
pop ebx
mov ecx, hacker
push 6
pop edx
push SYS_write /* 4 */
pop eax
int 0x80
popad
"""
print(print_hacker)
case Nop(addr):
print(f"_0x{addr:04x}: nop")
case MovReg(addr, dst, imm):
print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}")
case PushImm(addr, imm):
print(f"_0x{addr:04x}: push 0x{imm:02x}")
case PushReg(addr, reg):
print(f"_0x{addr:04x}: push {reg}")
case PopReg(addr, reg):
print(f"_0x{addr:04x}: pop {reg}")
case PrintStr(addr):
print(f"_0x{addr:04x}: print_str")
case AddReg(addr, dst, src):
print(f"_0x{addr:04x}: add {dst}, {src}")
case SubReg(addr, dst, src):
print(f"_0x{addr:04x}: sub {dst}, {src}")
case MulReg(addr, dst, src):
print(f"_0x{addr:04x}: mul {dst}, {src}")
case DivReg(addr, dst, src):
print(f"_0x{addr:04x}: div {dst}, {src}")
case XorReg(addr, dst, src):
print(f"_0x{addr:04x}: xor {dst}, {src}")
case Jmp(addr, target):
print(f"_0x{addr:04x}: jmp _0x{target:04x}")
case Cmp(addr, dst, src):
print(f"_0x{addr:04x}: cmp {dst}, {src}")
case Jz(addr, target):
print(f"_0x{addr:04x}: jz _0x{target:04x}")
case Jnz(addr, target):
print(f"_0x{addr:04x}: jnz _0x{target:04x}")
case Jg(addr, target):
print(f"_0x{addr:04x}: jg _0x{target:04x}")
case Jl(addr, target):
print(f"_0x{addr:04x}: jl _0x{target:04x}")
case InputStr(addr):
print(f"_0x{addr:04x}: input_str")
case InitMem(addr, mem_addr, sz):
print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})")
case MovRegStack(addr, dst, src):
print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
case MovRegMem(addr, dst, src):
print(f"_0x{addr:04x}: mov {dst}, mem[{src}]")
case Exit(addr):
print(f"_0x{addr:04x}: exit(0)")
case _:
raise Exception(f"unknown instruction: {ins}")
break

......
```

- (4) 处理input_str

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_X3P96QF84NKZ4PR.webp)

```python
# case 17: gets(mem); eax=strlen(mem);
InputStr = namedtuple("InputStr", ["addr"])
```

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_JAXQMQ975NMVWXV.webp)

```python
from collections import namedtuple
from dataclasses import dataclass

......

read_strlen_func_call = namedtuple("read_func_call", ["addr"])
# 优化(4): 处理input_str
def handle_input_str(instructions):
"""
_0x0006: input_str
"""
idx = 0
while idx < len(instructions):
match instructions[idx: idx+1]:
case [
InputStr(addr)
]:
instructions[idx: idx+1] = [read_strlen_func_call(addr)]
idx += 1

def dump(instructions):
for ins in instructions:
match ins:
......
case read_strlen_func_call(addr):
print_read_strlen = f"""/* read(fd=0, buf=mem, n=0x100) */
_0x{addr:04x}: push eax
push ebx
push ecx
push edx
xor ebx, ebx
mov ecx, mem
push 0x100
pop edx
push SYS_read /* 3 */
pop eax
int 0x80

/* strlen(mem) */
mov edi, mem
xor eax, eax
push -1
pop ecx
repnz scas al, BYTE PTR [edi]
inc ecx
inc ecx
neg ecx
/* moving ecx into ecx, but this is a no-op */
mov edi, ecx
pop edx
pop ecx
pop ebx
pop eax
mov eax, edi
"""
print(print_read_strlen)
case Nop(addr):
print(f"_0x{addr:04x}: nop")
case MovReg(addr, dst, imm):
print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}")
case PushImm(addr, imm):
print(f"_0x{addr:04x}: push 0x{imm:02x}")
case PushReg(addr, reg):
print(f"_0x{addr:04x}: push {reg}")
case PopReg(addr, reg):
print(f"_0x{addr:04x}: pop {reg}")
case PrintStr(addr):
print(f"_0x{addr:04x}: print_str")
case AddReg(addr, dst, src):
print(f"_0x{addr:04x}: add {dst}, {src}")
case SubReg(addr, dst, src):
print(f"_0x{addr:04x}: sub {dst}, {src}")
case MulReg(addr, dst, src):
print(f"_0x{addr:04x}: mul {dst}, {src}")
case DivReg(addr, dst, src):
print(f"_0x{addr:04x}: div {dst}, {src}")
case XorReg(addr, dst, src):
print(f"_0x{addr:04x}: xor {dst}, {src}")
case Jmp(addr, target):
print(f"_0x{addr:04x}: jmp _0x{target:04x}")
case Cmp(addr, dst, src):
print(f"_0x{addr:04x}: cmp {dst}, {src}")
case Jz(addr, target):
print(f"_0x{addr:04x}: jz _0x{target:04x}")
case Jnz(addr, target):
print(f"_0x{addr:04x}: jnz _0x{target:04x}")
case Jg(addr, target):
print(f"_0x{addr:04x}: jg _0x{target:04x}")
case Jl(addr, target):
print(f"_0x{addr:04x}: jl _0x{target:04x}")
case InputStr(addr):
print(f"_0x{addr:04x}: input_str")
case InitMem(addr, mem_addr, sz):
print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})")
case MovRegStack(addr, dst, src):
print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
case MovRegMem(addr, dst, src):
print(f"_0x{addr:04x}: mov {dst}, mem[{src}]")
case Exit(addr):
print(f"_0x{addr:04x}: exit(0)")
case _:
raise Exception(f"unknown instruction: {ins}")
break

# 优化(2): VM中要使用的内存
def dump_data():
print("\n")
print("""right:\n .asciz "right" """)
print("""wrong:\n .asciz "wrong" """)
print("""plz_input:\n .asciz "plz input:" """)
print("""hacker:\n .asciz "hacker" """)
print("""mem:\n .space 0x100 """)

if __name__ == '__main__':
opcode = [.....]
instructions = parse(opcode)
instructions = add_main_prologue_epilogue(instructions)
handle_print_str(instructions)
handle_input_str(instructions)
dump(instructions)
dump_data()
```

- (5) 处理exit(0)

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_XYKFP696Y5UC9UJ.webp)
```python
case Exit(addr):
print(f"""/* exit(status=0) */
_0x{addr:04x}: xor ebx, ebx
push SYS_exit /* 1 */
pop eax
int 0x80
""")
```

- (6) 优化mov ebx, [ebp-ecx]

这种asm是会报错的

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_MVYBK4BXJK84HFF.webp)

换成如下这种

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_9RHVZ9B8HUX8NJE.webp)

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_4BK2G6J8HPXACSM.webp)

```python
case MovRegStack(addr, dst, src):
# print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
print(f"_0x{addr:04x}: mov {dst}, ebp")
print(f" sub {dst}, {src}")
print(f" mov {dst}, [{dst}]")
```

- (7) 优化_0x006c: div eax, ebx

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_2T8PCPWA695R3CV.webp)

正常的div ebx执行之后,商将存储在 eax 寄存器中,余数将存储在 edx 寄存器中

它的div有所不同,是存到eax和ebx中的

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_P25BCYVPP4BEK4M.webp)

我们还需要在div eax, ebx后面,加一条mov ebx, edx

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_PV5TZ5K9T466DWK.webp)

Ezmachine-disassembler.py(https://prod-files-secure.s3.us-west-2.amazonaws.com/461378ca-73a1-498e-b83e-fbb0244aa01b/0c2d246f-a2d4-484c-8671-4d65e9ac8fa1/Ezmachine-disassembler.py)
Ezmachine-disassembler-out.asm(https://prod-files-secure.s3.us-west-2.amazonaws.com/461378ca-73a1-498e-b83e-fbb0244aa01b/1edf8fac-54c2-42ba-84ac-1e46937eaf1e/Ezmachine-disassembler-out.asm)

4:调用pwntools make_elf

Ezmachine-asm_compile.py(https://prod-files-secure.s3.us-west-2.amazonaws.com/461378ca-73a1-498e-b83e-fbb0244aa01b/a6df480c-d615-4ab5-a2bd-dee5da074416/Ezmachine-asm_compile.py)
from ast import dump
from pwn import *

code = """
push ebp
mov ebp, esp
.....
ret

right:
.asciz "right"
wrong:
.asciz "wrong"
plz_input:
.asciz "plz input:"
hacker:
.asciz "hacker"
mem:
.space 0x100
"""

elf = make_elf_from_assembly(code)
print(elf)

效果:


看雪ID:SYJ-Re

https://bbs.kanxue.com/user-home-921830.htm

*本文为看雪论坛优秀文章,由 SYJ-Re 原创,转载请注明来自看雪社区

# 往期推荐

1、区块链智能合约逆向-合约创建-调用执行流程分析

2、在Windows平台使用VS2022的MSVC编译LLVM16

3、神挡杀神——揭开世界第一手游保护nProtect的神秘面纱

4、为什么在ASLR机制下DLL文件在不同进程中加载的基址相同

5、2022QWB final RDP

6、华为杯研究生国赛 adv_lua

球分享

球点赞

球在看

点击阅读原文查看更多


文章来源: https://mp.weixin.qq.com/s?__biz=MjM5NTc2MDYxMw==&mid=2458542081&idx=1&sn=d06b0af07c2a9be62b841386bc79e4c6&chksm=b18d508b86fad99d371828cd445249441b7933114a59c7d1d0c6087babdce264a22a2d6fa974&scene=58&subscene=0#rd
如有侵权请联系:admin#unsafe.sh