Consider a computer with a processor and memory. The processor has registers, which store data. The processor takes the data in these registers as input, executes arithmetic and logical operations on that data and writes the output to the registers. The memory is a sequence of bits, where each byte (group of 8 bits) has a unique address. The processor can load data from memory into its registers and write data from registers into memory. We provide instructions to the processor via assembly. There are different assembly languages for different computer architectures. In this post, we focus on the x86-64 computer architecture. In assembly, we refer to a register by its name. For example, rax is the name of one of the general purpose, 64-bit registers. We can also use eax to refer to the lower 32 bits of that register, ax to refer to the lower 16 bits, or al to refer to the lower 8 bits.
Consider the following assembly program:
global _main
_main:
MOV rax, 42
RETSuppose we save the program in a file called main.asm. We can then use the nasm assembler to translate the assembly code to machine code (brew install nasm to install it):
nasm -fmacho64 main.asmWe use the -fmacho64 flag to specify the Mach-O 64 bit object file format used by macOS (the operating system for Mac computers).
The command creates an object file (in this case: main.o), which contains machine code as well as other data generated by the assembler. The object file is usually not executable, because it may contain references to functions or data defined in other object files or libraries (i.e., a collection of object files).
We then call ld (i.e., the GNU linker), which combines the object file with libraries to create an executable file (in this case: a.out).
ld -lSystem -L/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib main.oWe use the -lSystem flag to include the core system library on macOS and the -L/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib flag to add the macOS SDK lib directory to the search path for libraries.
We can then execute the program:
./a.outThe program does not have any output. To see the value of the rax register change, we can step through the program using the LLDB debugger:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) reg read rax
rax = 0x00000002000b2d20 dyld`lsl::sPoolBytes + 24736
(lldb) n
(lldb) reg read rax
rax = 0x000000000000002aWe use LLDB to set a breakpoint, run the program up until the breakpoint, read the initial value of the register, execute the next line of the program and then read the value of the register again.
2a is 42 in hexadecimal (2 * 16 + 10 = 42).
global is a directive, i.e., a command to the assembler that does not get translated into machine code. global _main makes the _main label visible to the linker. A label is a symbolic name assigned to specific memory address typically representing the location of an instruction or some data. _main is a special label that defines the entry point for the program.
The MOV instruction copies 42 into the rax register. The RET instruction returns control to the runtime so that the process can cleanly exit.
We can also dissemble the executable file:
objdump -M intel -d a.outWe use -M intel to return Intel instead AT&T assembly syntax.
We get:
a.out: file format mach-o 64-bit x86-64
Disassembly of section __TEXT,__text:
0000000100003fb2 <_main>:
100003fb2: b8 2a 00 00 00 mov eax, 42
100003fb7: c3 retThe rax register is 64 bits. Consider the following program that zeroes rax and fills the registers with 64 1s:
global _main
_main:
XOR rax, rax
MOV rax, 0xffffffffffffffff
RETStepping through the program:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) reg read rax
rax = 0x0000000000000000
(lldb) n
(lldb) reg read rax
rax = 0xffffffffffffffffIf we prepend another 1 to the numeric constant above and try to move 65 1s into the rax register:
global _main
_main:
XOR rax, rax
MOV rax, 0x1ffffffffffffffff
RETThen we get: "main.asm:4: warning: numeric constant 0x1ffffffffffffffff does not fit in 64 bits [-w+number-overflow]".
And stepping through the program, we see that only the lower 64 bits are kept:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) reg read rax
rax = 0x0000000000000000
(lldb) n
(lldb) reg read rax
rax = 0xffffffffffffffffWe can change the lower order 8 bits of the rax register as follows:
global _main
_main:
MOV rax, 0xffffffffffffffff
MOV al, 1
RETStepping through the program:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) reg read rax
rax = 0xffffffffffffffff
(lldb) n
(lldb) reg read rax
rax = 0xffffffffffffff01Each hexademical digit is 4 bits. We see that the lower 8 bits have been replaced by 0x01, i.e., 00000001.
We can't copy the value of a smaller register to a larger one:
global _main
_main:
XOR rax, rax
MOV bl, 0x80
MOV rax, bl
RETThe assembler throws the error: "main.asm:5: error: invalid combination of opcode and operands".
But we can copy the value of a smaller register to a larger one with zero extension:
global _main
_main:
XOR rax, rax
MOV bl, 0x80
MOVZX rax, bl
RETStepping through the program, we see that zero extension extends zeroes to all the other bits in the register:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) n
(lldb) n
(lldb) reg read rax bl
rax = 0x0000000000000080
bl = 0x80We can also copy the value of a smaller register to a larger one with sign extension:
global _main
_main:
XOR rax, rax
MOV bl, 0x80
MOVSX rax, bl
RETStepping through the program, we see that sign extension preserves the other bits:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) n
(lldb) n
(lldb) reg read rax bl
rax = 0xffffffffffffff80
bl = 0x800x80, or 10000000, interpreted as a signed integer using 2s complement is -128. 0xffffffffffffff80 interpreted as a signed integer using 2s complement is also -128.
global _main
_main:
XOR rax, rax
MOV bl, 0x7f
MOVSX rax, bl
RETIf instead we copy 0x7f into bl, we get the same result as zero extension:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) n
(lldb) n
(lldb) reg read rax bl
rax = 0x000000000000007f
bl = 0x7fWe can read multiple bytes into a register:
global _main
_main:
MOV rax, [rel message]
RET
section .data
message:
DB "abc"Stepping through the program:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) reg read rax
rax = 0x0000000000636261Or 1 byte into a register:
global _main
_main:
MOVZX rax, byte [rel message]
RET
section .data
message:
DB "abc"Stepping through the program:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) reg read rax
rax = 0x0000000000000061The following program adds 2 numbers together:
global _main
_main:
MOV rax, 1
ADD rax, 2
RETThe MOV instruction copies 1 into the rax register. The ADD instruction adds 2 to the value in the rax register and writes the output to the rax register.
Stepping through the program:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) reg read rax
rax = 0x00000002000b2d20
(lldb) n
(lldb) reg read rax
rax = 0x0000000000000001
(lldb) n
(lldb) reg read rax
rax = 0x0000000000000003Consider the following assembly program to compute the volume of a cone:
section .text
global _main
_main:
MOVSS xmm0, [rel x]
MOVSS xmm1, [rel y]
MULSS xmm0, xmm0
MULSS xmm0, xmm1
MULSS xmm0, [rel one_third]
MULSS xmm0, [rel pi]
done:
RET
section .data
x:
; 1.234
DD 0x3f9df3b6
y:
; 5.678
DD 0x40b5b22d
one_third:
; 1./3
DD 0x3EAAAAAB
pi:
; 3.1415
DD 0x40490E56Or almost equivalently:
section .text
global _main
_main:
MOVSS xmm0, [rel x]
MOVSS xmm1, [rel y]
MULSS xmm0, xmm0
MULSS xmm0, xmm1
MULSS xmm0, [rel one_third]
MULSS xmm0, [rel pi]
done:
RET
section .data
x:
DD 1.234
y:
DD 5.678
one_third:
DD 0.33333
pi:
DD 3.1415Stepping through the program:
lldb ./a.out
(lldb) b done
(lldb) r
(lldb) reg read xmm0 -f float32[]
xmm0 = {9.05393 0 0 0}The processor has registers devoted for storing floating points (some reasons why here). These registers are xmm0, xmm1, ... xmm15 and 128 bits wide. The program also uses the instructions devoted to manipulating floating points. For example, it uses MOVSS (i.e., move scalar single precision floating points) instead of MOV and MULSS (i.e., multiply scalar single precision floating points) instead of MUL.
The floating point values are encoded using the 32-bit IEEE 754 single-precision floating-point number format. Take 1./3 as an example. We encode it as 0x3EAAAAAB, or 00111110101010101010101010101011. We interpret it as follows (note that the function does not handle all edge cases):
def bin32_to_float(bin32: str) -> float:
assert len(bin32) == 32
assert set(bin32) == {"0", "1"}
sign = -1 if bin32[0] == "1" else 1
exponent = int(bin32[1:9], 2) - 127
mantissa = 0
for i in range(len(bin32[9:])):
mantissa += int(bin32[9+i]) * (2**(-(i+1)))
return sign * (2**exponent) * (1 + mantissa)
# 0.3333333432674408
print(bin32_to_float("00111110101010101010101010101011"))We can't use immediate values with these instructions, but instead of using a .data section, we could do the following:
section .text
global _main
_main:
MOV eax, 0x3f9df3b6
MOVD xmm0, eax
MOV eax, 0x40b5b22d
MOVD xmm1, eax
MULSS xmm0, xmm0
MULSS xmm0, xmm1
MOV ebx, 0x3EAAAAAB
MOVD xmm1, ebx
MULSS xmm0, xmm1
MOV ebx, 0x40490E56
MOVD xmm1, ebx
MULSS xmm0, xmm1
done:
RETThe instruction MOVD copies a double word from source operand to the destination operand. We use double word, because (1) a word in x86-64 is 16 bits and (2) we're using single precision (32 bit) floating point values.
Consider the following program:
global _main
_main:
MOV rax, 42
CMP rax, 0
JG done
RET
done:
MOV rax, 40
RETThe CMP instruction compares the first operand (rax) to the second operand (0) and sets the flags in the rflags register according to the results. In particular, the comparison is "performed by subtracting the second operand from the first operand and then setting the status flags in the same manner as the SUB instruction". The SUB instruction sets the "OF, SF, ZF, AF, PF, and CF flags...according to the result". The JG instruction jumps "if greater (ZF=0 and SF=OF)".
Stepping through the program:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) reg read rax
rax = 0x00000002000b2d20
(lldb) reg read rflags -f binary
rflags = 0b0000000000000000000000000000000000000000000000000000001001000110
(lldb) n
(lldb) n
(lldb) n
(lldb) reg read rax
rax = 0x000000000000002a
(lldb) reg read rflags -f binary
rflags = 0b0000000000000000000000000000000000000000000000000000001000000010
(lldb) n
(lldb) reg read rax
rax = 0x0000000000000028
(lldb) reg read rflags -f binary
rflags = 0b0000000000000000000000000000000000000000000000000000001000000010Here are the flag values of the first 11 bits in the rflags register after the CMP instruction:
0: 0 CF
1: 1 Reserved
2: 0 PF
3: 0 Reserved
4: 0 AF
5: 0 Reserved
6: 0 ZF
7: 0 SF
8: 0 TF
9: 1 IF
10: 0 DF
11: 0 OF
The flag definitions are here.
In this case, ZF equals 0 and SF equals OF, so we jump to done.
Modifying the program to trigger the other branch:
global _main
_main:
MOV rax, 42
CMP rax, 43
JG done
RET
done:
MOV rax, 40
RETAnd stepping through the program:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) reg read rax
rax = 0x00000002000b2d20
(lldb) reg read rflags -f binary
rflags = 0b0000000000000000000000000000000000000000000000000000001001000110
(lldb) n
(lldb) n
(lldb) n
(lldb) reg read rax
rax = 0x000000000000002a
(lldb) reg read rflags -f binary
rflags = 0b0000000000000000000000000000000000000000000000000000001010010111
(lldb) n
(lldb) reg read rax
rax = 0x000000000000002a
(lldb) reg read rflags -f binary
rflags = 0b0000000000000000000000000000000000000000000000000000001010010111Here again are the flag values of the first 11 bits in the rflags register after the CMP instruction:
0: 1 CF
1: 1 Reserved
2: 1 PF
3: 0 Reserved
4: 1 AF
5: 0 Reserved
6: 0 ZF
7: 1 SF
8: 0 TF
9: 1 IF
10: 0 DF
11: 0 OF
In this case, SF does not equal OF, so we immediately return instead of jumping to done.
Finally as another example, which uses similar concepts, consider this program to compute 1 + 2 + 3 + 4:
global _main
section .text
_main:
MOV rdi, 4
MOV rax, 0
iteration:
; compare rdi to 0 and set a flag
CMP rdi, 0
; jump to done if rdi <= 0
JLE done
; add rdi to rax
ADD rax, rdi
; subtract 1 from rdi
SUB rdi, 1
; compare rdi to 0 and set a flag
CMP rdi, 0
; jump to iteration if rdi > 0
JG iteration
done:
RETTo see the result:
lldb ./a.out
(lldb) b done
(lldb) r
(lldb) reg read rax
rax = 0x000000000000000aa is 10 in hexadecimal.
We do not specify sections in the assembly programs above, because the default section is the text section, but consider the following program with a text section and a data section:
global _main
section .text
_main:
MOV rax, num
MOV rbx, [rax]
RET
section .data
num:
DB 42section is a directive to toggle to a specific type of section. The assembler treats the code that follows the directive differently depending on the type of section. The text section defines instructions to execute. The data section defines variables to initialize at the start of the program. The DB instruction defines a byte. num is a label representing the address where 42 is stored. [rax] takes the value at the address stored in the register.
Stepping through the program:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) reg read rax rbx
rax = 0x00000002000b71e0
rbx = 0x00000002000b0c28
(lldb) n
(lldb) reg read rax rbx
rax = 0x0000000100004000
rbx = 0x00000002000b0c28
(lldb) n
(lldb) reg read rax rbx
rax = 0x0000000100004000
rbx = 0x000000000000002aNote that if we had done MOV rax, [num], then we would get the error "Mach-O 64-bit format does not support 32-bit absolute addresses". MOV rax, num stores the 32-bit absolute address in a 64-bit register. In the next line, we dereference the 64-bit register. We could also do the following:
global _main
section .text
_main:
MOV rax, [rel num]
RET
section .data
num:
DB 42The rel keyword directs the assembler to use RIP-relative addressing, i.e., the assembler does not encode the absolute address. Instead, it computes and encodes the offset from the address of the instruction stored in the rip register. At runtime, the processor adds this offset to the rip register to find the address of num. RIP-relative addressing uses a 32-bit signed offset instead of a 64-bit absolute address.
The following assembly program reserves space in memory, stores data in that reserved space and reads that data from memory:
section .text
global _main
_main:
MOV rax, 42
MOV [rel storage], rax
MOV rbx, [rel storage]
RET
section .bss
storage:
RESQ 1The .bss section ("Block Started by Symbol") stores uninitialized variables.
Stepping through the program:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) n
(lldb) n
(lldb) reg read rax rbx
rax = 0x000000000000002a
rbx = 0x000000000000002aThe program below follows the System V AMD64 ABI calling convention where integers function arguments use the rdi and rsi registers and where "Integer return values up to 64 bits in size are stored in RAX...". Note that floating point uses different registers: floating point arguments in xmm0, xmm1, ..., xmm7 and return values in xmm0 and xmm1.
global _main
_add:
MOV rax, rdi
ADD rax, rsi
RET
_main:
MOV rdi, 5
MOV rsi, 3
CALL _add
RETThe CALL instruction "pushes the value of the EIP register (which contains the offset of the instruction following the CALL instruction) on the stack (for use later as a return-instruction pointer)" and "then branches to the address in the current code segment specified by the target operand".
The eip register, or the instruction pointer register or the program counter register, is a special register that stores the address of the next instruction to execute. The processor also has a stack register, or stack pointer register, that stores the address for the top of the stack. Pushing to the stack decrements the value of this register to reserve space for a new item and then stores the address of that new item in that space.
For a RET instruction returning to a calling procedure within the current code segment, "the processor pops the return instruction pointer (offset) from the top of the stack into the EIP register and begins program execution at the new instruction pointer".
Stepping through the program:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) n
(lldb) n
(lldb) reg read rax
rax = 0x0000000000000008The following assembly program computes the Fibonacci number of a given integer:
global _main
section .text
fib:
CMP rdi, 1
JLE base_case
PUSH rdi
DEC rdi
CALL fib
POP rdi
PUSH rax
SUB rdi, 2
CALL fib
MOV rbx, rax
POP rax
ADD rax, rbx
RET
base_case:
MOV rax, rdi
RET
_main:
MOV rdi, 10
CALL fib
RETThe PUSH instruction "Decrements the stack pointer and then stores the source operand on the top of the stack". The POP instruction "Loads the value from the top of the stack to the location specified with the destination operand (or explicit opcode) and then increments the stack pointer".
Stepping through the program:
lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) n
(lldb) reg read rax
rax = 0x00000000000000373 * 16 + 7 = 55.
Stepping through the program by hand for n = 2 instead of n = 3 to make it simpler:
MOV rdi, 2 | rdi: 2 | stack: []
PUSH rdi | rdi: 2 | stack: [2]
DEC rdi | rdi: 1 | stack: [2]
CALL fib | rdi: 1 | stack: [POP rdi, 2]
MOV rax, rdi | rdi: 1, rax: 1 | stack: [POP rdi, 2]
RET | rdi: 1, rax: 1 | stack: [2]
POP rdi | rdi: 2, rax: 1 | stack: []
PUSH rax | rdi: 2, rax: 1 | stack: [1]
SUB rdi, 2 | rdi: 0, rax: 1 | stack: [1]
CALL fib | rdi: 0, rax: 1 | stack: [MOV rbx ..., 1]
MOV rax, rdi | rdi: 0, rax: 0 | stack: [MOV rbx ..., 1]
RET | rdi: 0, rax: 0 | stack: [1]
MOV rbx, rax | rdi: 0, rax: 0, rbx: 0 | stack: [1]
POP rax | rdi: 0, rax: 1, rbx: 0 | stack: []
ADD rax, rbx | rdi: 0, rax: 1, rbx: 0 | stack: []
The following program prints out "Hello, World!" using a system call:
global _main
section .text
_main:
MOV rax, 0x02000004 ; system call for write
MOV rdi, 1 ; file handle 1 is stdout
MOV rsi, message ; address of string to output
MOV rdx, 13 ; number of bytes
SYSCALL ; invoke operating system to do the write
RET
section .data
message:
DB "Hello, World!"We can also use a system call to exit:
global _main
section .text
_main:
MOV rax, 0x02000004 ; system call for write
MOV rdi, 1 ; file handle 1 is stdout
MOV rsi, message ; address of string to output
MOV rdx, 13 ; number of bytes
SYSCALL ; invoke operating system to do the write
MOV rax, 0x02000001 ; system call for exit
XOR rdi, rdi ; exit code 0
SYSCALL ; invoke operating system to exit
section .data
message:
DB "Hello, World!"We create add.asm with the following code:
global _add
_add:
MOV rax, rdi
ADD rax, rsi
RETWe create main.c with the following code:
#include <stdio.h>
extern int add(int a, int b);
int main() {
int x = 2;
int y = 3;
int result = add(x, y);
printf("Adding %d + %d = %d\n", x, y, result);
return 0;
}We execute it:
nasm -fmacho64 add.asm -o add.o && gcc -arch x86_64 main.c add.o && ./a.outWe can also step through the program:
lldb ./a.out
(lldb) b add
(lldb) r- "Low‑level programming" (https://csprimer.com/courses/systems/)
- https://www.felixcloutier.com/x86/