Challenge RE #8
At this point we don’t need further introduction, I’ll bring you my solution to the 8th challenge.
Description
The description of the challenge can be found in Challenge RE #8. In the previous ones I forgot the share the link, my bad.
Introduction
Assembly code to understand
<f>:
0: push r12
2: test rsi,rsi
5: mov r12,rdi
8: push rbp
9: mov rbp,rdx
c: push rbx
d: mov rbx,rsi
10: je 32
12: nop WORD PTR [rax+rax*1+0x0]
18: mov rsi,QWORD PTR [rbx]
1b: mov rdi,rbp
1e: call r12
21: test eax,eax
23: je 56
25: js 40
27: mov rbx,QWORD PTR [rbx+0x18]
2b: test rbx,rbx
2e: xchg ax,ax
30: jne 18
32: pop rbx
33: pop rbp
34: xor eax,eax
36: pop r12
38: ret
39: nop DWORD PTR [rax+0x0]
40: mov rbx,QWORD PTR [rbx+0x10]
44: test rbx,rbx
47: je 32
49: mov rsi,QWORD PTR [rbx]
4c: mov rdi,rbp
4f: call r12
52: test eax,eax
54: jne 25
56: mov rax,rbx
59: pop rbx
5a: pop rbp
5b: pop r12
5d: ret
This time f
it’s way more longer than in our previous challenges, instead of complaining about this and give up at the first moment. It’s better to try to understand small pieces of this code.
Let’s start from instructions in memory 0 to instruction in memory 0x10. From these instructions we can notice, that our parameters came from registers rdi
, rsi
and rdx
. We perform a check on one of them, to see if it’s equal to 0.
Now interesting enough, we have that in rdi
we receive a callback function. If you take a look, rdi
it’s moved to r12
who later it’s been called on instruction in 1e. For that reason we can infer, that one of the arguments passed to f
it’s a callback function.
What can we tell about the signature of this callback function? Well tracking the use of register r12
you can notice that the function takes two arguments, rsi
and rdi
and return the value int eax
. Every call r12
it’s followed by a check on eax
, where if we have zero or NULL value we jump to the end of the program, otherwise we jump to position 0x25. Looking at the use of its arguments, we can take a save assumption that the first one can be number while the second…well the second one is not so straight forward, let’s take a look at its use in the program.
Identifying rbx
layout
The second argument it’s been taken from rbx
, this register it’s only modified once, when we copied rsi
into it on instruction mov rbx,rsi
. After that we’ve been calling it in this three times:
;; first time
mov rsi,QWORD PTR [rbx]
;; more down in position 0x27
mov rbx,QWORD PTR [rbx+0x18]
;; more down in position 0x40
mov rbx,QWORD PTR [rbx+0x10]
In the last two cases, we are overwriting the value of rbx
, with QWORD PTR [rbx+0x18]
and QWORD PTR [rbx+0x10]
. The padding or offset from rbx
it’s 0x18 = 24 and 0x10 = 16, so we can assume the data stored in rbx
has to following arrangement.
[0......|0x10.....|0x18....]
And the data in [rbx+0x10]
and [rbx+0x18]
can overwrite rbx
. I don’t know you, but two me this smells like a binary tree, where [rbx+0x10]
and [rbx+0x18]
are the left and right children of the node rbx
.
This is a huge insight. If we are on the right path, we can be looking at a binary search here, but’s let’s not take hasty conclusions.
First resume
At the moment, we have that:
r12
contains a callback functionrbx
contains a binary tree
With this in mind we can start declaring our Node
structure and our callback function.
typedef struct Node
{
int data;
struct Node *left;
struct Node *right;
} Node;
// signature of callback function
int g_callback(int val, Node *n);
Signature of f
We already know that the arguments for f
comes from registers:
rdi
, which is a callback functionrsi
, which is our binary treerdx
, which it’s data of a node of our binary tree, let’s assume the data are just numbers, for simplicity. In any case later we can check if this is the case.
What it’s missing it’s our returned value. For this, looking at the last ret
you can notice that what we return it’s actually the value in rbx
which hold a Node
of our binary tree.
With this in mind we can assume the following signature for f
Node *f(int (int, Node *), Node *, int);
Main logic
With the signature, will be easier to infer the logic in the program. Having a clear idea of what are the structures involved also help us a lot. I mean, we haven’t even looked at the logic, and you might be guessing that this has to be some kind of search in a binary tree. Maybe I’m the only one imagining that, given that I’m my only user in my blog 😟. Anyway let’s continue.
Let’s gather all our finding and put it in C code
typedef struct Node
{
int data;
struct Node *left;
struct Node *right;
} Node;
int g_callback(int val, Node *n);
Node *f(int cb(int, Node *), struct Node *n, int data)
{
return NULL;
}
Ok, this looks good. Does it? Don’t doubt on yourself great Gealber, this is amazing.
Let’s analyze the logic by small chunks, first chunk
18: mov rsi,QWORD PTR [rbx]
1b: mov rdi,rbp
1e: call r12
21: test eax,eax
23: je 56
25: js 40
;; in 56 we just return the Node
Nothing fancy, now that we know more about the layout of rbx
, we know it’s a Node
, so this code can be interpreted as a call to to the callback function we passed as an argument. In C code would be:
int ret = cb(n, data);
if (ret == 0)
return n;
if (ret < 0) {
// do something
}
Second chunk of code…maybe chunk is not the right word Gealber 🤔
27: mov rbx,QWORD PTR [rbx+0x18]
2b: test rbx,rbx
2e: xchg ax,ax
30: jne 18
In this case, we first overwrite our current node for our right child, and later perform a check on it, followed by a jump to 18 if it’s not NULL
. Interesting, the code in 0x18 it’s our previous chunk of code. This seems as a recursion call, because the code in 0x18 it’s actually the start of our f
method. Given the nature of binary trees and the operations we normally perform with them, is not a crazy assumption to say that this is a recursion call.
Taking this into C, would be like this
Node *f(int cb(int, Node *), struct Node *n, int data)
{
int ret = cb(n, data);
if (ret == 0)
return n;
if (ret < 0) {
// do something
}
n = n->right;
if (n != NULL)
n = f(cb, n, data);
return NULL;
}
The next most important part, is the code in 0x40. This code it’s quite similar to our previous one, just that in this case we check with our left node. The way to reach this code is from instruction 0x25, with js 40
.
A way to put this in C would be
n = n->left;
if (n != NULL)
n = f(cb, n, data);
We already can make a correct forecast of what this function does, in a simple way this code performs a search in a binary tree, with a supplied comparison callback function.
Let’s put all this together in a clean way
typedef struct Node
{
int data;
struct Node *left;
struct Node *right;
} Node;
// f performs a search in a binary tree
// using cb callback function as our comparison data
Node *f(int cb(int, Node *), struct Node *n, int data)
{
int ret = cb(n, data);
if (ret == 0)
return n;
if (ret < 0) {
return f(cb, n->left, data);
}
return f(cb, n->right, data);
}
Final notes
Take into account that data
doesn’t has to be an int, given that we are supplying a comparison function we can declare data as void *
. As long as the comparison function can handle it, won’t be a problem.
As a personal note, I found this challenge quite instructive, not because it taught me a new data structure. What I found valuable it’s that it showed me how a structure is defined in assembly, basically there’s just an offset between field and another. The more I do this challenges the more I realize that C is so close to assembly 😅.