x86-64 TUTORIAL: FACTORIAL WITH RECURSION

The calculation of a factorial of a number can be done using recursion. Below is the algorithm:

FACTORIAL(num) {
    if (num==0)
         return 1;
    else
        return num * FACTORIAL(num-1);
}

We can do it in assembly in possibly multiple ways. One way is to use the stack to store every return value from FACTORIAL() so as to calculate the product. Below is the assembly version of the above algorithm. An unsigned long (8 bytes) is used to store the factorial, although ideally factorials for numbers > 20 can be huge and exceed the 64-bit space allotted to them.

I have not figured out how to take care of that yet and that requires Bignum implementation which is out of scope for this tutorial.

section .rodata
    prompt1 db  "Enter a positive number:",0
    num_fmt db  "%lu",0
    prompt2 db  "The factorial of %lu is %lu ",10,0

section .text
    global  main, factorial
    extern  printf, scanf

    main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 8
        pushfq
        push    rbx
        push    r12
        push    r13
        push    r14
        push    r15
        
        ; Read in a number
        mov     rdi, dword prompt1
        xor     rax, rax
        call    printf
        lea     rsi, [rbp-8]
        mov     rdi, dword num_fmt
        xor     rax,rax
        call    scanf

        ; Calculate Factorial. Result returned in RDX:RAX
        mov     rdi, [rbp-8]
        xor     rax, rax
        call    factorial
 
        ; Print result (this part is flawed. will be explained later)
        mov     rdx, rax
        mov     rsi, [rbp-8]
        mov     rdi, dword prompt2
        xor     rax, rax
        call    printf
 
        pop     r15
        pop     r14
        pop     r13
        pop     r12
        pop     rbx
        popfq
        mov     rsp,rbp
        pop     rbp
        ret

    factorial:
        push    rbp
        mov     rbp, rsp
        ; Check if the number in RDI is 0. 
        ; If yes, return with value 1 in RAX as factorial(0). 
        mov     rax, 1
        xor     rdx,rdx
        cmp     rdi, 0
        je      return_here
        ; If value in RDI > 0, push on stack for multiplication,
        ; decrement and call factorial again.
        push    rdi
        dec     rdi
        xor     rax, rax
        call    factorial
        ; Eventually when factorial returns, it will have RAX = 1 when RDI = 0.
        ; So we pop the stack and do unsigned multiply with RAX and return that value in RDX:RAX.
        pop     rsi
        mul     rsi
    return_here:
        mov     rsp, rbp
        pop     rbp
        ret

As we see above, the code is fairly simple. There are a few points to note however.

Recursion is handled by use of the stack - we could try not to use the stack, but then we would run out of registers where we would store every value of the number in RDI when factorial gets called multiple times. Moreover, it would be very hard to keep track of those registers. The stack gives us an easy way to handle recursion, with the hardware and the LIFO concept coming to our rescue.
Results need to be in the same register everytime for the recursive return to work - the final result keeps getting multiplied with the numbers from the stack and placed back into the same register. This will keep the factorial function independent of the number being multiplied with.

If we look closely at the factorial function, we see that we do not handle overflow. When the unsigned multiply instruction MUL is used, it multiplies RSI with RAX and places the resulting value in RDX:RAX.

But when a particular call to factorial returns, we are always multiplying with the value in RAX and neglecting the value in RDX. So in cases where the value of the factorial of the input number overflows into RDX we get the wrong result.

While printing out the result in the main function, again we do not take care of the overflow and move the value from RAX to RDX so that the x86-64 ABI rules can be followed. This needs to be fixed and we should be able to print a 128-bit number as the result. This limits the maximum number to which the correct factorial can be calculated to 20.

If the above assembly program was called factorial.asm below would be the steps to assembling and linking it into an executable factorial.out.

$ yasm -f elf64 factorial.asm
$ ld -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 \
    /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o \
    factorial.o  /usr/lib/x86_64-linux-gnu/crtn.o -lc -o factorial.out

Download factorial.asm.

x86-64 TUTORIAL: FACTORIAL WITH RECURSION

SUPPORT THIS SITE