# Challenge RE # 5

## Introduction

My solution to the 5th reverse engineering challenge from challenges.re. The assembly code to understand is the following:

``````
f:
cmp	rcx, rsi
ja	.L10
sub	rsi, rcx
mov	r11, rsi
je	.L10
test	rcx, rcx
jne	.L16
mov	rax, rdi
ret
.L10:
xor	eax, eax
ret
.L16:
push	rbx
xor	r10d, r10d
mov	r9d, 1
.L4:
lea	rax, [rdi+r10]
xor	esi, esi
xor	r8d, r8d
.L8:
movzx	ebx, BYTE PTR [rdx+rsi]
cmp	BYTE PTR [rax+rsi], bl
cmovne	r8d, r9d
cmp	rsi, rcx
jne	.L8
test	r8d, r8d
je	.L12
cmp	r10, r11
jne	.L4
xor	eax, eax
.L12:
pop	rbx
ret

``````

A little bit longer than the previous one. Let’s start analyzing it.

## Analysis

### Inferring f signature

As in our previous challenges we have a function `f`, now this function has more jumps than our previous challenges. Looking at our first lines

``````	cmp	rcx, rsi
ja	.L10
sub	rsi, rcx
mov	r11, rsi
je	.L10
test	rcx, rcx
jne	.L16
mov	rax, rdi
ret
``````

We can infer the following, first it’s expected that one of the parameters to be smaller than the other one. More precisely `rcx` should be less than `rsi`. Also the value in `rcx` shouldn’t be equal to `rsi`. In both of these case the program will jump to the end of execution and will return a zeroed `eax` register. For that reason we can assume that `rcx` and `rsi` are parameters passed in our function, and both of them numbers. Then we have

``````	// ...
if (b >= a)
return NULL;
``````

Now one thing particular interesting here, is that after this check we perform two instructions which equivalent in C would be something like this:

``````int d = a - b + 1;
``````

Which give us a clue that maybe we are dealing here with a counting of elements in an array or string. This is quite similar, if we want to count the amount of elements between an index i and j of an array, including both. For example, the elements between index i = 0, and j = 3, would be 3 - 0 + 1 = 4. Also it’s also possible, that if we are dealing with strings instead of arrays(which in C is just an array of `char` type), what we perform with this operation can be seen as the difference in size between two stings. Let’s keep this thought in mind. It’s also save to assume that we are receiving in our arguments both of these arrays or strings. The main reason for this assumption, is that there’s no `malloc` call, so the program is not allocating memory in the heap for any of these 2 arrays or strings. Now we need to decide which one of these two options should be. Let’s explore a bit more before deciding that.

Next instructions, give us more clues about what could be happening here, for example:

``````test rcx, rcx
jne .L16
mov rax, rdi
ret
``````

Which basically it’s checking if one of our parameter it’s 0 we can return the value in register `rdi`. Interesting, what’s in `rdi`? Knowing this, we can infer the return type of `f`. Making a search in the program you can notice that the way, `rdi` it’s used in the program, gave us the impression that `rdi` it’s first of all a parameter also passed into `f` and a string. The reason for been a string, is specially this instruction:

``````.L4:
lea rax, [rdi + r10]
;; ... after .L8 tag
cmp BYTE PTR [rax + rsi], bl
``````

Which can be interpreted as please load the address of rdi into rax register, and later we will perform a comparison of its character. From this we have that `rdi` contains our first string. With this we can infer the most important part, the signature of `f`. Which should be as follows:

``````char *f(char *str1, int str1_size, char *str2, int str2_size)
``````

Putting all previously explained into into C code, at the moment we have:

``````char *f(char *str1, int str1_size, char *str2, int str2_size)
{
if (str2_size >= str1_size)
return NULL;

int diff = str1_size - str2_size + 1;

if (str2_size != 0) {
// something goes here...
}

return  str1;
}

``````

Nice this is a great progress. One key note to take here is that the signature of the function is one of the first thing you should try to figure out.

### Inferring main logic

Now here comes the part that will give us the whole part of the puzzle. At the moment we can describe what we have in a formal language as: the function f receives two strings, if the second one it’s longer than the first one we return a NULL response. In case the the second one has length 0, we return the first string. Late on, we will give a better formal description, now we need to figure out the main logic.

Ok, so the main logic happens under .L16 tag. First thing to notice here, is that we initialize a counter on register `r10d`, and we use that counter to iterate over the string in register `rdi`, which is our first string. The instructions that gave us this clue are the following:

``````.L16:
push rbx ;; push into stack to preserve its value, given that later we modify rbx register
xor r10d, r10d
mov r9d, 1
.L4:
;; .... more down in the code after .L8 tag
cmp r10, r11
jne .L4
``````

Yes! We can identify here a loop which stop condition it’s that the counter in `r10` register get the value of `r11`. What is it in the `r11` register? Looking back, we can remember that in `r11` we stored the result of `str1_size - str2_size + 1`. Then we can infer here, the following C code:

``````	// previous code

int diff = str1_size - str2_size + 1;

if (str2_size != 0) {
int i = 0;
for (i = 0; i < diff; i++) {
// something goes here...
}
}

``````

In the same way we can identify that there’s also a second loop, which stop condition it’s that the counter reach `str2_size`. These can be seen in instructions:

``````.L4:
lea rax, [rdi + r10]
xor	esi, esi
xor	r8d, r8d
.L8:
;; instructions goes here...
cmp	rsi, rcx
jne	.L8
``````

This an be written in C, in this way:

``````
int diff = str1_size - str2_size + 1;

if (str2_size != 0) {
int i = 0, j = 0;
for (i = 0; i < diff; i++) {
int flag =0;
for (j = 0; j < str2_size; j++) {
if (str1[j] != str2[j]) {
flag = 1;
}
}

if (flag == 0) {
return str1;
}

str1++;
}
}
``````

Here I added to subsequent logic, to resume a bit. The instructions

``````	movzx	ebx, BYTE PTR [rdx+rsi]
cmp	BYTE PTR [rax+rsi], bl
cmovne	r8d, r9d
;; ...
test r8d, r8d
je .L12
;; ...
.L12:
pop rbx
ret
``````

Gave us the clue for the code inside the loops. The whole code in C, of function `f` would be the following:

``````char *f(char *str1, int str1_size, char *str2, int str2_size)
{

// cmp	rcx, rsi
// ja	.L10
// and later on...
// je	.L10
if (str2_size >= str1_size)
// .L10:
//     xor	eax, eax
//     ret
return NULL;

// sub	rsi, rcx
// mov	r11, rsi
int diff = str1_size - str2_size + 1;

if (str2_size != 0) {
int i = 0, j = 0;
for (i = 0; i < diff; i++) {
int flag =0;
// movzx	ebx, BYTE PTR [rdx+rsi]
// cmp	BYTE PTR [rax+rsi], bl
// cmovne	r8d, r9d
// cmp	rsi, rcx
// jne	.L8
for (j = 0; j < str2_size; j++) {
if (str1[j] != str2[j]) {
flag = 1;
}
}

if (flag == 0) {
return str1;
}

str1++;
}

return NULL;
}

return  str1;
}
``````

### Formal description

The idea behind these challenges is not actually to translate all the code in C, but to been able to describe what this assembly code does with words. In a way that you can use that description as a function documentation for example. Then a proper description for this function would be:

Formal description or documentation:

Given two strings and their length, f checks if the second string its a substring of the first one. In case is not we will receive a NULL value, otherwise we receive the pointer to the position where the substring begins

## Final notes

2. Have a good understanding of what’s the difference between `lea` and `mov` instructions. In my particular case, I always mess up with these two simple instructions. You can checkout this answer in SO.