Binary diffing

Yesterday, a new feature was pushed to radare2: offset-based function diffing. We'd like to take this opportunity to write a bit about radare2's diffing features before showing the shiny new one.

Let's take a copy of a cracked crackme as an example, and the true and false binaries.

Without parameter, radiff2 will by default show what bytes changed, and the corresponding offsets.

$ radiff2 genuine cracked
0x000081e0 85c00f94c0 => 9090909090 0x000081e0  
0x0007c805 85c00f84c0 => 9090909090 0x0007c805

$ rasm2 -d 85c00f94c0
test eax, eax  
sete al

Notice how the two jumps are noped.

For bulk processing, you may want to have a higher-overview of the differences. This is why radare2 is able to compute the distance and the percentage of similarity between two files with the -s option:

$ radiff2 -s /bin/true /bin/false
similarity: 0.97  
distance: 743  

If you want more concrete data, it's also possible to count the differences, with the -c option:

$ radiff2 -c genuine cracked
2  

If you're unsure about the fact that you're dealing with similar binaries, you can check if some functions are matching with the -C option. The columns being: "First file offset", "Percentage of matching" and "Second file offset".

$ radiff2 -C /bin/false /bin/true 
                        entry0  0x4013e8 |   MATCH  (0.904762) | 0x4013e2  entry0
     sym.imp.__libc_start_main  0x401190 |   MATCH  (1.000000) | 0x401190  sym.imp.__libc_start_main
                  fcn.00401196  0x401196 |   MATCH  (1.000000) | 0x401196  fcn.00401196
                  fcn.0040103c  0x40103c |   MATCH  (1.000000) | 0x40103c  fcn.0040103c
                  fcn.00401046  0x401046 |   MATCH  (1.000000) | 0x401046  fcn.00401046
                  [...]

And now the cool feature : radare2 supports graph-diffing, à la DarunGrim, with the -g option. You can either give a symbol name, of specify two offsets in case the function you want to diff doesn't have the same name in both file.

For example, radiff2 -g main /bin/true /bin/false | xdot - will show the differences between the main function of true and false. You can compare it to radiff2 -g main /bin/false /bin/true (Notice the order of the arguments) to get the two versions.

This is the result:
/bin/true and /bin/false graph diff

The parts in yellow are indicating that some offsets are not matching, the grey one is a perfect match, while the red one highlight a strong difference. If you look closely, you'll see that the left one is mov edi, 0x1; call sym.imp.exit, while the right one is xor edi, edi; call sym.imp.exit.

Binary diffing is an important feature for reverse engineering. It can be used to analyze security updates, infected binaries, firmware changes and more..

We have only shown the code analysis diffing functionality, but radare2 supports more sort of diffing between two binaries at byte level, deltified similarities and more to come.

We have plans to implement more kinds of bindiffing functionalities into r2, and why not, add support for ascii art graph diffing and better integration with the rest of the toolkit.

Rop'n'roll

As attackers are moving forwards, so does the defense. Since a couple of years, every decent operating system has non-executable stack, defeating the classic 'put your shellcode on the stack and execute it' modus operanti.

This is why attackers are now using (among other things) Return Oriented Programming, also known as ROP, to bypass this protection.

Because radare2 (also) aims to be useful to exploits writers and make their life easier, it can now:

  • hunt for gadgets, with a configurable depth
  • filter gadgets
  • do this for multiples archs
[0x08048320]> /R
Do you want to print 468.9K chars? (y/N)  

Well, let's filter.

[0x08048320]> /R pop,pop,ret
0x080484b0 ret  
  0x080484a8 df83c41c5b5e  fild word [ebx+0x5e5b1cc4]
  0x080484ae           5f  pop edi
  0x080484af           5d  pop ebp
  0x080484b0           c3  ret

0x080484b0 ret  
  0x080484aa       c41c5b  les ebx, [ebx+ebx*2]
  0x080484ad           5e  pop esi
  0x080484ae           5f  pop edi
  0x080484af           5d  pop ebp
  0x080484b0           c3  ret

0x080484b0 ret  
  0x080484ab         1c5b  sbb al, 0x5b
  0x080484ad           5e  pop esi
  0x080484ae           5f  pop edi
  0x080484af           5d  pop ebp
  0x080484b0           c3  ret

0x080484b0 ret  
  0x080484ac           5b  pop ebx
  0x080484ad           5e  pop esi
  0x080484ae           5f  pop edi
  0x080484af           5d  pop ebp
  0x080484b0           c3  ret

0x080484b0 ret  
  0x080484ad           5e  pop esi
  0x080484ae           5f  pop edi
  0x080484af           5d  pop ebp
  0x080484b0           c3  ret

0x080484b0 ret  
  0x080484ae           5f  pop edi
  0x080484af           5d  pop ebp
  0x080484b0           c3  ret

It's possible to change the depth of the search, to speed-up the hunt:

[0x08048320]> e search.roplen = 4
[0x08048320]> /R mov ebp,call eax
0x08048386 call eax  
  0x0804837a         89e5  mov ebp, esp
  0x0804837c       83ec18  sub esp, 0x18
  0x0804837f c7042420a00408  mov dword [esp], 0x804a020
  0x08048386         ffd0  call eax

0x0804840f call eax  
  0x08048403         89e5  mov ebp, esp
  0x08048405       83ec18  sub esp, 0x18
  0x08048408 c70424109f0408  mov dword [esp], 0x8049f10
  0x0804840f         ffd0  call eax

[0x08048320]>

The next step might be ROP chain assembling and manipulation, and why not automatic-construction, à la mona.py? Contributions are welcome!

Payloads in C

Writing exploits requires to perform several steps to achieve the final purpose of the attack.

  • find a vulerability
  • reverse engineer the bug
  • achieve code execution
  • write the payload
  • profit

This post will focus on the later step: write the payload.

The payload can spawn a shell, reuse a socket or do a connect back. But sometimes we will need a more complex payload that will need to open a file, change some permissions, do some mmap, etc.

We can use different tools to do this:

But r2 already provides its own functionalities to do this with ragg2. Ragg2 implements the following features:

  • generic shellcodes
  • xor encoder
  • tiny binary construction via rabin2
  • specific low level lang
  • C compatible lang to shellcode (based on shellforge)

We will focus on the last option. It's the most flexible, but have a runtime dependency on gcc/clang.

Halp

Let's first run ragg2-cc -h to see what it can do for us:

$ ragg2-cc -h
Usage: ragg2-cc [-dabkoscxv] [file.c]
  -d       enable debug mode
  -a x86   set arch (x86, arm)
  -b 32    bits (32, 64)
  -k linux set kernel (darwin, linux)
  -o file  set output file
  -s       generate assembly
  -c       generate compiled shellcode
  -x       show hexpair bytes
  -v       show version

Compiling a shellcode

At this point we are probably not interested on memory constraints, but we still need a simple way to express code.

Shellforge was abandoned a while ago, and it was done in Python. Ragg2 just replicates the same functionalities in few shellscript lines and aims to keep the include directory updated.

We just need to write C code, using syscalls, and avoiding libc or other library calls. For example:

$ cat hi.c
int main() {
  write (1,"Hello!\n",7);
  exit(0);
}

We can now compile this with ragg2-cc:

$ ragg2-cc -x hi.c
eb00488d351d000000bf01000000ba07000000b8010000000f0531ffb83c0000000f0531c0c348656c6c6f210a00

The -x flag will dump the shellcode in hexadecimal to stdout.

We can then disassemble this using rasm2 -d.

Under the hood

What ragg2-cc is doing is basically replacing the -isystem include path and compile the program in relocatable mode (-fPIE).

Then it will just dump the whole text section, prepending it with a single jmp to main.

We can feed that hexpair thing to ragg2 or metasploit to encrypt or encode it before sending the payload to the target.

--pancake