This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Disassembler/Decompiler using libbfd
- From: Jakub Zawadzki <darkjames at darkjames dot ath dot cx>
- To: binutils at sourceware dot org
- Date: Wed, 28 Mar 2007 02:39:00 +0200
- Subject: Disassembler/Decompiler using libbfd
Hello,
Last time I though to write simple asm to C decompiler, (or just
simple disassembler which can guess functions, args for functions,
display imports from other shared library, etc..)
I don't like idea about implementing inside my program all opcodes from
intel x86, and other arch... So I though about libbfd. I look at objdump
sources, cleanup interesting stuff and i have 272lines disassembler..
It's great using libbfd for this stuff.. But now I stuck.
I can display decoded opcodes+args on the screen/on file. I can display
all decoded opcodes from one call to leave/retn/another call (using
sscanf()). Ok, but I don't have any idea how should I parse decoded opcodes + args..
I need to make it more parsable. [Not sscanf()!]
Yeah, I know I can do:
init_disassemble_info(&info, my_own_data, (fprintf_ftype) my_own_function);
And i do, and
When i just print format and pipe it to sort and uniq.
I get only: "%s" and "," in format,
So we would have:
/* lock_opcode global variable */
lock_opcode = 1; /* lock -> new opcode */
(*disassemble_fn)(section->vma + start_offset, &info); /* call disassemble_fn */
lock_opcode = 0; /* unlock */
and then in my_own_function()
we can check if lock_opcode = 1, than we'd have decoded name of opcode.
than we'd have first param [if avail], Than if opcode has more params.
We'd have: ',' next_param_in_%s, ',' next_param_in_%s, etc...
till we'd have lock_opcode = 0.
First I though it'd be hack.. Now I think it's quite good idea, but I
don't know if on every file, on every arch it'd happen.
[For now I'm trying to dissasm win32 PE file, good file, not `broken`.
`Broken` -> obfuscated, compressed, etc.. are not in my concern at
whole. I only want to dissasm/decompile good files]
So I have some questions:
- If this method is acceptable to do decode first opcode than args -
If all arch-system-opcode-decoders work this way?
- If libbfd can/shouldn't be used this way [For writting
decompiler/disassembler]
- If there's other way to do what I want. I don't know maybe something
from: disassemble_info struct, there's some *results of instruction
decoders.*
I don't really like idea of copying code from bfd, or by implementing my
own instruction translators.. [SPOT rule, inventing wheel once again,
etc.. really, really bad idea :(]
I would be grateful for any reply. Even: `it's senseless/stupid/whatever
to write decompiler`
My english is not best, so if you don't understand some part or big
part or event whole thing. Sorry. I'll try to explain again.