Challenge RE #7
Introduction
From my previous posts you can notice that I’ve been poisoning myself with small doses of assembly language and C. The best combination for a fast effect 😁, soon I will be immune to it. To be honest I’ve been enjoying the challenges, because all of them are accessible, in the same sense that David Hilbert said:
A mathematical problem should be difficult in order to entice us, yet not completely inaccessible, lest it mock at our efforts. It should be to us a guide post on the mazy paths to hidden truths, and ultimately a reminder of our pleasure in the successful solution.
Should be enjoyable in general, without frustrating you. Wanted to remark that because the guy who made this challenges, Dennis Yurichev, did a great job on it.
Enough for the talk. Let’s see the 7th challenge. The assembly code to understand is the following
<f>:
0: movzx edx,BYTE PTR [rdi]
3: mov rax,rdi
6: mov rcx,rdi
9: test dl,dl
b: je 29
d: nop DWORD PTR [rax]
10: lea esi,[rdx-0x41]
13: cmp sil,0x19
17: ja 1e
19: add edx,0x20
1c: mov BYTE PTR [rcx],dl
1e: add rcx,0x1
22: movzx edx,BYTE PTR [rcx]
25: test dl,dl
27: jne 10
29: repz ret
Analysis
The first 4 instructions, give us the impression we are dealing with a string in rdi
register. Specially for the copy of character and the jump to the end of the program. The character it’s copied into edx
register. Let’s keep describing the program before we have a complete signature of f.
Next to this, we can see the following instructions
lea esi,[rdx-0x41]
cmp sil,0x19
ja 1e
Something interesting here is the lea esi,[rdx-0x41]
instruction which give us the clue that in rdx
we might have something with more than 65 bits. Why 65? The magic here is that 0x41 or 65 it’s the ASCII code for the character 'A'
. Then when we combine these two instructions, what we are checking is if the character is NOT between 'A'
and 'Z'
ASCII characters. Basically if belongs to the lowercase characters in the English alphabet. If that’s the case we jump then to 1e memory position.
Now on this memory position we have the following instructions
add rcx,0x1
movzx edx,BYTE PTR [rcx]
test dl,dl
jne 10
Which will pass to the next character in the sequence, and continue with the loop in case the character is not the '\0'
character.
The last instructions to analyze, are the following
add edx,0x20
mov BYTE PTR [rcx],dl
Remember that at this point we have checked if the character is an lowercase letter, so what we have at this point need to be an uppercase letter. When we add 0x20, to an uppercase letter we will get its corresponding lowercase letter.
For example:
'A' ASCII code is 65, after adding 0x20 would be 97, which is indeed the ASCII code for 'a'
With this we already know what the code does, it’s lowercasing a provided string. The code in C would be like this:
void f(char *str)
{
if (*str == '\0')
return;
while (*str != '\0') {
if (*str - 0x41 > 0x19) {
str++;
continue;
}
// lowercasing a character in case is latin letters
*str += 0x20;
str++;
}
}
Which can be expressed shorter as
void f(char *str)
{
for ( ;*str != '\0'; str++) {
if (*str - 0x41 <= 0x19) {
*str += 0x20;
}
}
}
Conclusion
That’s it!! The code performs a basic lowercase of a string.