Skip to content

[experiment]: Use capstone to implement ELF.libc_start_main_ret#2580

Open
tesuji wants to merge 5 commits intoGallopsled:devfrom
tesuji:capstone-disasm
Open

[experiment]: Use capstone to implement ELF.libc_start_main_ret#2580
tesuji wants to merge 5 commits intoGallopsled:devfrom
tesuji:capstone-disasm

Conversation

@tesuji
Copy link
Contributor

@tesuji tesuji commented Apr 24, 2025

Use capstone to implement ELF.libc_start_main_ret.

Reasons:

  • Avoid text-based searching on objdump's output to ease maintenance.
  • Make it easier to support other architectures.
    powerpc64 still fail after this PR.
  • For fun.

I marked this PR as draft to get some early feedback, and some volunteer testers.

Testing

Need more testing!!! And some design works to make the code cleaner.

This PR has been manually tested on ( is pass, X is failure):

arch 2.28 2.36 2.31
x86_64
i386
mips
mips64el
arm64
armel
armhf
ppc64el
s390 N/A

Failure cases:

  • mips: Not really fail but upgrading capstone to v6.alpha should remove a condition.

Failure arch on dev:

  • armhf:
    Fail to decompile this target, this should use arm-linux-gnueabihf-objdump.

@tesuji tesuji marked this pull request as ready for review April 24, 2025 17:13
@tesuji tesuji changed the title [experiment]: Use capstone to refactor ELF.libc_start_main [experiment]: Use capstone to implement ELF.libc_start_main Apr 24, 2025
@tesuji tesuji changed the title [experiment]: Use capstone to implement ELF.libc_start_main [experiment]: Use capstone to implement ELF.libc_start_main_ret Apr 24, 2025
if arch in bfdnames:
return bfdnames[arch]
else:
name = bfdnames.get(arch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it.

self.config['version'] = self.version

def cs_disasm(self, md: cs.Cs, address, n_bytes):
if self.arch == 'arm' and address & 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.arch == 'arm' and address & 1:
if self.arch in ('arm','thumb'):
address &= ~1

if self.arch in ['arm', 'thumb']:
call_instructions = set(['blx', 'bl'])
# FIXME: I have no idea why setting self.arch = 'armhf' does not work
if b'armhf' in self.linker: eabi = 'hf'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this. Maybe just always use just arm and always try eabihf first for disassembly as it handles a strict superset of instructions?

call_instructions = set(['bal', 'jalr'])
# Account for the delay slot.
call_return_offset = 2
elif self.arch in ['i386', 'amd64', 'ia64']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move the empty suites to the end or remove them altogether and also remove the error case?

calls = [(index, line) for index, line in enumerate(lines) if set(line.split()) & call_instructions]
from pwnlib.asm import get_cs_disassembler
md = get_cs_disassembler(arch=self.arch, endian=self.endian, bits=self.bits, eabi=eabi)
dis = list(self.cs_disasm(md, func.address, func.size))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be made an API of the Function object, like disasm? It will be more useful across pwntools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants