The aim of solving this problem is to learn how to use the XMM registers for
multiplication of floating point numbers. Matrix multiplication is a slow
calculation especially if the floating point unit is used, and hence doing
packed floating point calculations (if double precision is not required) might
just be much faster. So this program will test that.
This program does not use any fixed memory locations for the head or tail of the
link list, but uses all the registers available to it. However, for some of the
functions it does not follow the convention of saving all the registers RBX, R12-R15
on the stack at every function call since some of these registers
contain pointers to the head and tail of the link list. Even if we did that, the
program would hardly change much.
The Labouchere system for roulette is played as follows. Write down a list of
numbers, usually 1, 2, 3, 4. Bet the sum of the first and last, i.e. 1 + 4 = 5,
on red. If you win, delete the first and last numbers from the list. If you
lose, add the amount that you last bet to the end of the list. Then use the new
list and bet the sum of the first and last numbers (if there is only one number,
bet that amount). Continue until your list becomes empty. You will see
that, if this happens, you will always win the sum 1 + 2 + 3 + 4 = 10, of the
original list. The below program simulates this system. Execute the
program, and see if you always win!
x86-64 TUTORIAL: CONDITIONAL OPERATIONS WITHOUT BRANCHING
The regular JMP and conditional Jcc jump instructions change the course of
working code, the latter based on the runtime status of certain bits in the RFLAGS register. The x86 and x86-64 processors implement pipelining of
instructions where they prefetch a certain number of instructions and evaluate
them before time. The number of instructions prefetched is dependent on the
prefetch input queue (PIQ).
Logical shifts are operations in which the bits of a register or memory location
are moved to the right or left by a certain number or a value in the CL
register. They are also a very quick way to multiply or divide by 2 or powers of
2 as it involves just a shift of bits. There are 4 shift bit instructions, 4
rotate bit instructions and 2 double precision shift bit instructions for
general purpose registers.
Below is a code snippet that prints a list of prime numbers, one on each line,
based on a limit entered by the user. It uses both while loops and conditional
branch if - else statements. We shall convert this to an assembly program to
demonstrate implementation of these control flow structures in x86-64 assembly.
In the Hello World sample
program we had used the instructions REPNZ and SCASB to calculate the length of the string being printed at runtime. In this program
we use NASM’sequ directive to calculate the length during assembly time as
opposed to at runtime. The variable promptlen gives an example.
Here are some print functions for strings, integers and newline characters.
There is also a function for reading an integer. All the code is in
The macros prologue and epilogue, are used to save space and avoid repetitiveness.
NOTE: Remember that the registers RBP, RBX and R12-R15 need to be saved across function calls.
Below is a program that prints "Hello World!" on screen followed by a newline
character. In the data section we first store the string "Hello World!",
followed by the newline character which has an ASCII value of 10 and the NULL
character or the value 0. The NULL character is used here because of the way we calculate the
string length. There are other ways to calculate the string length as well, by
using NASM’s directives like equ, but we shall use that in another sample program.
The x86 and x86-64 instruction sets have an instruction called CPUID that tells the program who made the CPU and what features it may have.
We try to get that info using x86-64 assembly in this tutorial.
System calls are made using the syscall instruction on an x86-64 version of GNU/Linux as opposed to using int 0x80 on an x86 version of GNU/Linux.
All programs are in long mode. Depending on the type of GNU/Linux system you use, the list of system calls can be found in /usr/include/asm/unistd_64.h for Debian-based systems or in /usr/include/asm-x86_64/unistd.h for Slackware, etc.
In 2006, there was only YASM that supported both the 32-bit x86 and the 64-bit x86-64 or amd64 instruction set. NASM only supported the 32-bit x86 instruction set. Today, in 2020, NASM also suppports the 64-bit x86-64 instruction set. Both YASM and NASM are
under active development, with YASM being fully cross-platform and working with Visual Studio 2019 as of this update in August 2020. Wherever possible, we have made sure that the programs run under both YASM and NASM.
ABI is an abbreviation for Application Binary Interface. Every processor’s instruction set has an ABI. This allows the developer to write code in the correct format that the processor is designed to accept. The x86-64 ABI document can be found at the following links: