The calculation of a factorial of a number can be done using recursion. Below is the algorithm:
FACTORIAL(num) {
if (num==0)
return 1;
else
return num * FACTORIAL(num1);
}
We can do it in assembly in possibly multiple ways. One way is to use the stack
to store every return value from FACTORIAL()
so as to calculate the product.
Below is the assembly version of the above algorithm. An unsigned long
(8 bytes)
is used to store the factorial, although ideally factorials for numbers > 20 can
be huge and exceed the 64bit space allotted to them.
I have not figured out how to take care of that yet and that requires Bignum implementation which is out of scope for this tutorial.
section .rodata
prompt1 db "Enter a positive number:",0
num_fmt db "%lu",0
prompt2 db "The factorial of %lu is %lu ",10,0
section .text
global main, factorial
extern printf, scanf
main:
push rbp
mov rbp, rsp
sub rsp, 8
pushfq
push rbx
push r12
push r13
push r14
push r15
; Read in a number
mov rdi, dword prompt1
xor rax, rax
call printf
lea rsi, [rbp8]
mov rdi, dword num_fmt
xor rax,rax
call scanf
; Calculate Factorial. Result returned in RDX:RAX
mov rdi, [rbp8]
xor rax, rax
call factorial
; Print result (this part is flawed. will be explained later)
mov rdx, rax
mov rsi, [rbp8]
mov rdi, dword prompt2
xor rax, rax
call printf
pop r15
pop r14
pop r13
pop r12
pop rbx
popfq
mov rsp,rbp
pop rbp
ret
factorial:
push rbp
mov rbp, rsp
; Check if the number in RDI is 0.
; If yes, return with value 1 in RAX as factorial(0).
mov rax, 1
xor rdx,rdx
cmp rdi, 0
je return_here
; If value in RDI > 0, push on stack for multiplication,
; decrement and call factorial again.
push rdi
dec rdi
xor rax, rax
call factorial
; Eventually when factorial returns, it will have RAX = 1 when RDI = 0.
; So we pop the stack and do unsigned multiply with RAX and return that value in RDX:RAX.
pop rsi
mul rsi
return_here:
mov rsp, rbp
pop rbp
ret
As we see above, the code is fairly simple. There are a few points to note however.

Recursion is handled by use of the stack  we could try not to use the stack, but then we would run out of registers where we would store every value of the number in
RDI
when factorial gets called multiple times. Moreover, it would be very hard to keep track of those registers. The stack gives us an easy way to handle recursion, with the hardware and the LIFO concept coming to our rescue. 
Results need to be in the same register everytime for the recursive return to work  the final result keeps getting multiplied with the numbers from the stack and placed back into the same register. This will keep the factorial function independent of the number being multiplied with.
If we look closely at the factorial function, we see that we do not
handle overflow. When the unsigned multiply instruction MUL
is used,
it multiplies RSI
with RAX
and places the resulting
value in RDX:RAX
.
But when a particular call to factorial returns, we are always
multiplying with the value in RAX
and neglecting the value in RDX
. So in
cases where the value of the factorial of the input number overflows
into RDX
we get the wrong result.
While printing out the result in the main function, again we do not take
care of the overflow and move the value from RAX
to RDX
so that the x8664 ABI
rules can be followed. This needs to be fixed and we should be able to print a 128bit number as the result.
This limits the maximum number to which the correct factorial can be calculated to 20.
If the above assembly program was called factorial.asm
below would be
the steps to assembling and linking it into an executable factorial.out
.
$ yasm f elf64 factorial.asm
$ ld m elf_x86_64 dynamiclinker /lib64/ldlinuxx8664.so.2 \
/usr/lib/x86_64linuxgnu/crt1.o /usr/lib/x86_64linuxgnu/crti.o \
factorial.o /usr/lib/x86_64linuxgnu/crtn.o lc o factorial.out
Download factorial.asm.