Challenge #18
This challenge has more code, but according to the description is not a big deal. Here is the short description that you can find always here:
Now this is easy. Keep in mind, the code is 64-bit, because it uses 64-bit value(s). That is why I’ve omitted code fragments for 32-bit ARM and MIPS. So what does it do?
As always with large code, will avoid to display the whole code in this article, because large code of assembly can cause brain damage. I bet there’s a “legit” study about that…academy. Also because you can find the code example in the original website here.
Analysis
As always let’s break this code into blocks, in order to understand it better. The first block that I’m going to analyze it’s this one
;; check if the string in rsp
;; has length 36
sub rsp, 24
mov QWORD PTR [rsp], r8
mov QWORD PTR [rsp+8], r9
call strlen
cmp rax, 36
mov edx, -1
jne .L28 ;; EXIT PROGRAM
mov r12, rbx ;; copy rdi string into r12
xor ebp, ebp ;; ebp = 0
jmp .L35
I added some notations into the block code, here we have a call to strlen
, famous C function, to know the length of a string(assuming it’s \0
terminated). In case our string length is not 36 we jump to tag .L28, where is the exit of the program, returning -1. This should count as a failure example, so on failure we are returning -1.
Another thing to identify here, it’s the presence of a loop, between tags .L33 and .L.42. Over what iterate this loop? What is the stop condition here? It’s easy to notice that we are iterating over the string we just copied into r12 register. Take a look at this snippet
;; START OF A LOOP
;; ------------------>
.L33:
;; if char it's hex, end program with -1
movsx edi, BYTE PTR [r12]
call isxdigit
test eax, eax
je .L37 ;; EXIT PROGRAM WITH -1
;; ... continuation other tags come between these two
.L42:
;; check if we are in the last char
cmp BYTE PTR [r12], 0
jne .L33
;; <<<<<<<<<<<<<<<<<<<
;; END OF A LOOP
;; <<<<<<<<<<<<<<<<<<<
If you noticed, at the end of tag .L42 we have a check for the amazing \0
null terminating character. Also at the start of the tag .L33, we have a copy of one character from the string in r12. This give us the certainty we are iterating over string in r12.
Another point to notice here, is the call to isxdigit
, for checking if the current character it’s a hexadecimal digit. In case is not, we finish the program as well with -1. Given that this is inside a loop, we can infer that each character in the string should be a valid hexadecimal digit, otherwise we exit the program with failure code -1.
We haven’t analyzed everything and we have already an idea, that this program it’s checking if all the characters on the string are hex. Cool, let’s continue
The loop
The body of the loop gives us some insights as well. For example tag .L32, contains the increments of the loop, here you can notice that we have a counter in ebp register, and we also increment the string in r12 of course.
.L32:
;; increase counter and string pointer position
add ebp, 1
add r12, 1
cmp ebp, 37
je .L34
.L35:
cmp ebp, 8
jne .L43
.L29:
;; if negative sign, continue
cmp BYTE PTR [r12], 45 ;; '-' ascii, negative number
je .L32
.L37:
;; if we get here the program will exit with -1
mov edx, -1
Another operation here is the comparison to 37 of our counter in ebp. Remember our string’s length must be 36. Also keep in mind C strings have a null terminated character.
We can also see that on tag .L29 we perform a check to '-'
character, ignoring it in case we found it. Literally jumping to the increment steps.
Now here there’s something interesting as well, on tag .L35 we check our counter with 8, in case it’s not equal we jump to tag .L43. Which also have several checks on our counter in ebp. The checks can be resumed in this way: if counter is 13, 18 or 23
check for '-'
on tag .L29, if counter is 36
end the program. The default case, is just to read another character from our string on tag .L33.
First question that comes to my mind is, why 13, 18 and 23
? That question it’s easy to answer when you take a look at tag .L34. I’ll add it here as well, with some comments. In this part of the code, we make use of strtoul
to convert chunks of the string, from hexadecimal string representation to a unsigned long in C. These chunks of the string are divided into 5 ranges, from 0 to 8, from 9 to 13, from 14 to 18, from 19 to 23 and from 24 to 36. The results of these conversions are been stored in r15, r14, r13 and rcx.
;; ANALYZING ON 5 STEPS
.L34:
;; rbx from 0 to 36
;; ------------------------------------------------------------------------------------------------------------------|
;; 8 4 4 4 12 |
;; rbx --- rbx + 8| rbx + 9 ---- rbx + 13 | rbx + 14 ---- rbx + 18 | rbx + 19 ---- rbx + 23 | rbx + 24 ---- rbx + 36 |
;; ------------------------------------------------------------------------------------------------------------------|
;; convert string in rdi, intial string into number
mov edx, 16 ;; 3rd argument, base of number to convert
xor esi, esi ;; 2nd argument, esi it's passed as NULL, we don't need to store the address of the first valid char
mov rdi, rbx ;; 1st argument
call strtoul
lea rdi, [rbx+9] ;; 1st arg
mov edx, 16 ;; 3rd arg
xor esi, esi ;; 2nd arg
mov DWORD PTR [r15], eax ;; value of number just converted on previous call
call strtoul
lea rdi, [rbx+14] ;; 1st arg
mov edx, 16 ;; 2nd arg
xor esi, esi ;; 3rd arg
mov WORD PTR [r14], ax ;; value of previous
call strtoul
lea rdi, [rbx+19] ;; 1st arg
mov edx, 16 ;; 2nd arg
xor esi, esi ;; 3rd arg
mov WORD PTR [r13+0], ax ;; value of previous call
call strtoul
mov rcx, QWORD PTR [rsp]
lea rdi, [rbx+24] ;; 1st arg
mov edx, 16 ;; 2nd arg
xor esi, esi ;; 3rd arg
mov WORD PTR [rcx], ax ;; previous call returned
call strtoull
mov rcx, QWORD PTR [rsp+8]
xor edx, edx ;; goes as 0
mov QWORD PTR [rcx], rax
jmp .L28
Flow
Let’s put this into a flow diagram, will be easier to understand
Looking at the flowchart, what we have it’s easier to understand. We are iterating over the string with this format <8 hex digits>-<4 hex digits>-<4hex digits>-<4 hex digits>-<12 hex digits>
. Now we have it!!! This is the format for an uuid, for example 45ab0c9c-873f-422e-963c-27d13c3fdac9
.
Formal description
We are checking if the string provided has an uuid format.
Conclusion
Quite interesting how things are implemented in assembly.