Assembly

Preliminaries

Consider a computer with a processor and memory. The processor has registers, which store data. The processor takes the data in these registers as input, executes arithmetic and logical operations on that data and writes the output to the registers. The memory is a sequence of bits, where each byte (group of 8 bits) has a unique address. The processor can load data from memory into its registers and write data from registers into memory. We provide instructions to the processor via assembly. There are different assembly languages for different computer architectures. In this post, we focus on the x86-64 computer architecture. In assembly, we refer to a register by its name. For example, rax is the name of one of the general purpose, 64-bit registers. We can also use eax to refer to the lower 32 bits of that register, ax to refer to the lower 16 bits, or al to refer to the lower 8 bits.

Consider the following assembly program:

global	_main
_main:
	MOV	rax, 42
	RET

Suppose we save the program in a file called main.asm. We can then use the nasm assembler to translate the assembly code to machine code (brew install nasm to install it):

nasm -fmacho64 main.asm

We use the -fmacho64 flag to specify the Mach-O 64 bit object file format used by macOS (the operating system for Mac computers).

The command creates an object file (in this case: main.o), which contains machine code as well as other data generated by the assembler. The object file is usually not executable, because it may contain references to functions or data defined in other object files or libraries (i.e., a collection of object files).

We then call ld (i.e., the GNU linker), which combines the object file with libraries to create an executable file (in this case: a.out).

ld -lSystem -L/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib main.o

We use the -lSystem flag to include the core system library on macOS and the -L/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib flag to add the macOS SDK lib directory to the search path for libraries.

We can then execute the program:

./a.out

The program does not have any output. To see the value of the rax register change, we can step through the program using the LLDB debugger:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) reg read rax
     rax = 0x00000002000b2d20  dyld`lsl::sPoolBytes + 24736
(lldb) n 
(lldb) reg read rax
     rax = 0x000000000000002a

We use LLDB to set a breakpoint, run the program up until the breakpoint, read the initial value of the register, execute the next line of the program and then read the value of the register again.

2a is 42 in hexadecimal (2 * 16 + 10 = 42).

global is a directive, i.e., a command to the assembler that does not get translated into machine code. global _main makes the _main label visible to the linker. A label is a symbolic name assigned to specific memory address typically representing the location of an instruction or some data. _main is a special label that defines the entry point for the program.

The MOV instruction copies 42 into the rax register. The RET instruction returns control to the runtime so that the process can cleanly exit.

We can also dissemble the executable file:

objdump -M intel -d a.out

We use -M intel to return Intel instead AT&T assembly syntax.

We get:

a.out:	file format mach-o 64-bit x86-64

Disassembly of section __TEXT,__text:

0000000100003fb2 <_main>:
100003fb2: b8 2a 00 00 00              	mov	eax, 42
100003fb7: c3                          	ret

Registers

The rax register is 64 bits. Consider the following program that zeroes rax and fills the registers with 64 1s:

global _main
_main:
	XOR rax, rax
	MOV rax, 0xffffffffffffffff
	RET

Stepping through the program:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) reg read rax
     rax = 0x0000000000000000
(lldb) n
(lldb) reg read rax    
     rax = 0xffffffffffffffff

If we prepend another 1 to the numeric constant above and try to move 65 1s into the rax register:

global _main
_main:
	XOR rax, rax
	MOV rax, 0x1ffffffffffffffff
	RET

Then we get: "main.asm:4: warning: numeric constant 0x1ffffffffffffffff does not fit in 64 bits [-w+number-overflow]".

And stepping through the program, we see that only the lower 64 bits are kept:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) reg read rax
     rax = 0x0000000000000000
(lldb) n
(lldb) reg read rax    
     rax = 0xffffffffffffffff

We can change the lower order 8 bits of the rax register as follows:

global _main
_main:
	MOV rax, 0xffffffffffffffff
	MOV al, 1
	RET

Stepping through the program:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) reg read rax
     rax = 0xffffffffffffffff
(lldb) n
(lldb) reg read rax
     rax = 0xffffffffffffff01

Each hexademical digit is 4 bits. We see that the lower 8 bits have been replaced by 0x01, i.e., 00000001.

We can't copy the value of a smaller register to a larger one:

global _main
_main:
	XOR rax, rax
	MOV bl, 0x80
	MOV rax, bl
	RET

The assembler throws the error: "main.asm:5: error: invalid combination of opcode and operands".

But we can copy the value of a smaller register to a larger one with zero extension:

global _main
_main:
	XOR rax, rax
	MOV bl, 0x80
	MOVZX rax, bl
	RET

Stepping through the program, we see that zero extension extends zeroes to all the other bits in the register:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) n
(lldb) n
(lldb) reg read rax bl
     rax = 0x0000000000000080
      bl = 0x80

We can also copy the value of a smaller register to a larger one with sign extension:

global _main
_main:
	XOR rax, rax
	MOV bl, 0x80
	MOVSX rax, bl
	RET

Stepping through the program, we see that sign extension preserves the other bits:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) n
(lldb) n
(lldb) reg read rax bl
     rax = 0xffffffffffffff80
      bl = 0x80

0x80, or 10000000, interpreted as a signed integer using 2s complement is -128. 0xffffffffffffff80 interpreted as a signed integer using 2s complement is also -128.

global _main
_main:
	XOR rax, rax
	MOV bl, 0x7f
	MOVSX rax, bl
	RET

If instead we copy 0x7f into bl, we get the same result as zero extension:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) n
(lldb) n
(lldb) reg read rax bl
     rax = 0x000000000000007f
      bl = 0x7f

We can read multiple bytes into a register:

global _main
_main:
	MOV rax, [rel message]
	RET
section .data
message:
	DB "abc"

Stepping through the program:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) reg read rax
     rax = 0x0000000000636261

Or 1 byte into a register:

global _main
_main:
	MOVZX rax, byte [rel message]
	RET
section .data
message:
	DB "abc"

Stepping through the program:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) reg read rax
     rax = 0x0000000000000061

Basic arithmetic

The following program adds 2 numbers together:

global	_main
_main:
	MOV	rax, 1
	ADD rax, 2
	RET

The MOV instruction copies 1 into the rax register. The ADD instruction adds 2 to the value in the rax register and writes the output to the rax register.

Stepping through the program:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) reg read rax
     rax = 0x00000002000b2d20
(lldb) n
(lldb) reg read rax
     rax = 0x0000000000000001
(lldb) n
(lldb) reg read rax
     rax = 0x0000000000000003

Floating point arithmetic

Consider the following assembly program to compute the volume of a cone:

section .text
global _main
_main:
	MOVSS xmm0, [rel x]
	MOVSS xmm1, [rel y]
	MULSS xmm0, xmm0
	MULSS xmm0, xmm1
	MULSS xmm0, [rel one_third]
	MULSS xmm0, [rel pi]
done:
 	RET
section .data
x:
	; 1.234
	DD 0x3f9df3b6
y:
	; 5.678
	DD 0x40b5b22d
one_third:
	; 1./3
	DD 0x3EAAAAAB
pi:
	; 3.1415
	DD 0x40490E56

Or almost equivalently:

section .text
global _main
_main:
	MOVSS xmm0, [rel x]
	MOVSS xmm1, [rel y]
	MULSS xmm0, xmm0
	MULSS xmm0, xmm1
	MULSS xmm0, [rel one_third]
	MULSS xmm0, [rel pi]
done:
 	RET
section .data
x:
	DD 1.234
y:
	DD 5.678
one_third:
	DD 0.33333
pi:
	DD 3.1415

Stepping through the program:

lldb ./a.out
(lldb) b done
(lldb) r
(lldb) reg read xmm0 -f float32[]
    xmm0 = {9.05393 0 0 0}

The processor has registers devoted for storing floating points (some reasons why here). These registers are xmm0, xmm1, ... xmm15 and 128 bits wide. The program also uses the instructions devoted to manipulating floating points. For example, it uses MOVSS (i.e., move scalar single precision floating points) instead of MOV and MULSS (i.e., multiply scalar single precision floating points) instead of MUL.

The floating point values are encoded using the 32-bit IEEE 754 single-precision floating-point number format. Take 1./3 as an example. We encode it as 0x3EAAAAAB, or 00111110101010101010101010101011. We interpret it as follows (note that the function does not handle all edge cases):

def bin32_to_float(bin32: str) -> float:
	assert len(bin32) == 32
	assert set(bin32) == {"0", "1"}
	sign = -1 if bin32[0] == "1" else 1
	exponent = int(bin32[1:9], 2) - 127
	mantissa = 0
	for i in range(len(bin32[9:])):
		mantissa += int(bin32[9+i]) * (2**(-(i+1)))
	return sign * (2**exponent) * (1 + mantissa)

# 0.3333333432674408
print(bin32_to_float("00111110101010101010101010101011"))

We can't use immediate values with these instructions, but instead of using a .data section, we could do the following:

section .text
global _main
_main:
	MOV eax, 0x3f9df3b6
	MOVD xmm0, eax
	MOV eax, 0x40b5b22d
	MOVD xmm1, eax
	MULSS xmm0, xmm0
	MULSS xmm0, xmm1
	MOV ebx, 0x3EAAAAAB
	MOVD xmm1, ebx
	MULSS xmm0, xmm1
	MOV ebx, 0x40490E56
	MOVD xmm1, ebx
	MULSS xmm0, xmm1
done:
 	RET

The instruction MOVD copies a double word from source operand to the destination operand. We use double word, because (1) a word in x86-64 is 16 bits and (2) we're using single precision (32 bit) floating point values.

Control flow

Consider the following program:

global	_main
_main:
	MOV rax, 42
	CMP	rax, 0
	JG done
	RET
done:
	MOV rax, 40
	RET

The CMP instruction compares the first operand (rax) to the second operand (0) and sets the flags in the rflags register according to the results. In particular, the comparison is "performed by subtracting the second operand from the first operand and then setting the status flags in the same manner as the SUB instruction". The SUB instruction sets the "OF, SF, ZF, AF, PF, and CF flags...according to the result". The JG instruction jumps "if greater (ZF=0 and SF=OF)".

Stepping through the program:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) reg read rax
     rax = 0x00000002000b2d20
(lldb) reg read rflags -f binary
  rflags = 0b0000000000000000000000000000000000000000000000000000001001000110
(lldb) n
(lldb) n
(lldb) n
(lldb) reg read rax
     rax = 0x000000000000002a
(lldb) reg read rflags -f binary
  rflags = 0b0000000000000000000000000000000000000000000000000000001000000010
(lldb) n
(lldb) reg read rax
     rax = 0x0000000000000028
(lldb) reg read rflags -f binary
  rflags = 0b0000000000000000000000000000000000000000000000000000001000000010

Here are the flag values of the first 11 bits in the rflags register after the CMP instruction:

0: 0 CF
1: 1 Reserved
2: 0 PF
3: 0 Reserved
4: 0 AF
5: 0 Reserved
6: 0 ZF
7: 0 SF
8: 0 TF
9: 1 IF
10: 0 DF
11: 0 OF

The flag definitions are here.

In this case, ZF equals 0 and SF equals OF, so we jump to done.

Modifying the program to trigger the other branch:

global	_main
_main:
	MOV rax, 42
	CMP	rax, 43
	JG done
	RET
done:
	MOV rax, 40
	RET

And stepping through the program:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) reg read rax
     rax = 0x00000002000b2d20
(lldb) reg read rflags -f binary
  rflags = 0b0000000000000000000000000000000000000000000000000000001001000110
(lldb) n
(lldb) n
(lldb) n
(lldb) reg read rax
     rax = 0x000000000000002a
(lldb) reg read rflags -f binary
  rflags = 0b0000000000000000000000000000000000000000000000000000001010010111
(lldb) n
(lldb) reg read rax
     rax = 0x000000000000002a
(lldb) reg read rflags -f binary
  rflags = 0b0000000000000000000000000000000000000000000000000000001010010111

Here again are the flag values of the first 11 bits in the rflags register after the CMP instruction:

0: 1 CF
1: 1 Reserved
2: 1 PF
3: 0 Reserved
4: 1 AF
5: 0 Reserved
6: 0 ZF
7: 1 SF
8: 0 TF
9: 1 IF
10: 0 DF
11: 0 OF

In this case, SF does not equal OF, so we immediately return instead of jumping to done.

Finally as another example, which uses similar concepts, consider this program to compute 1 + 2 + 3 + 4:

global _main
section .text
_main:
	MOV rdi, 4
	MOV rax, 0
iteration:
	; compare rdi to 0 and set a flag
	CMP rdi, 0
	; jump to done if rdi <= 0
	JLE done
	; add rdi to rax
	ADD rax, rdi
	; subtract 1 from rdi
	SUB rdi, 1
	; compare rdi to 0 and set a flag
	CMP rdi, 0
	; jump to iteration if rdi > 0
	JG iteration
done:
	RET

To see the result:

lldb ./a.out
(lldb) b done
(lldb) r
(lldb) reg read rax
     rax = 0x000000000000000a

a is 10 in hexadecimal.

Sections

We do not specify sections in the assembly programs above, because the default section is the text section, but consider the following program with a text section and a data section:

global	_main
section	.text
_main:
	MOV rax, num
	MOV rbx, [rax]
	RET
section .data
num:
	DB 42

section is a directive to toggle to a specific type of section. The assembler treats the code that follows the directive differently depending on the type of section. The text section defines instructions to execute. The data section defines variables to initialize at the start of the program. The DB instruction defines a byte. num is a label representing the address where 42 is stored. [rax] takes the value at the address stored in the register.

Stepping through the program:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) reg read rax rbx
     rax = 0x00000002000b71e0
     rbx = 0x00000002000b0c28
(lldb) n
(lldb) reg read rax rbx
     rax = 0x0000000100004000
     rbx = 0x00000002000b0c28
(lldb) n
(lldb) reg read rax rbx
     rax = 0x0000000100004000
     rbx = 0x000000000000002a

Note that if we had done MOV rax, [num], then we would get the error "Mach-O 64-bit format does not support 32-bit absolute addresses". MOV rax, num stores the 32-bit absolute address in a 64-bit register. In the next line, we dereference the 64-bit register. We could also do the following:

global	_main
section	.text
_main:
	MOV rax, [rel num]
	RET
section .data
num:
	DB 42

The rel keyword directs the assembler to use RIP-relative addressing, i.e., the assembler does not encode the absolute address. Instead, it computes and encodes the offset from the address of the instruction stored in the rip register. At runtime, the processor adds this offset to the rip register to find the address of num. RIP-relative addressing uses a 32-bit signed offset instead of a 64-bit absolute address.

Memory

The following assembly program reserves space in memory, stores data in that reserved space and reads that data from memory:

section .text
global _main
_main:
    MOV rax, 42
    MOV [rel storage], rax
    MOV rbx, [rel storage]
    RET
section .bss
storage:
	RESQ 1

The .bss section ("Block Started by Symbol") stores uninitialized variables.

Stepping through the program:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) n
(lldb) n
(lldb) reg read rax rbx
     rax = 0x000000000000002a
     rbx = 0x000000000000002a

Functions

The program below follows the System V AMD64 ABI calling convention where integers function arguments use the rdi and rsi registers and where "Integer return values up to 64 bits in size are stored in RAX...". Note that floating point uses different registers: floating point arguments in xmm0, xmm1, ..., xmm7 and return values in xmm0 and xmm1.

global	_main
_add:
	MOV rax, rdi
	ADD rax, rsi
	RET
_main:
	MOV rdi, 5
	MOV rsi, 3
	CALL _add
	RET

The CALL instruction "pushes the value of the EIP register (which contains the offset of the instruction following the CALL instruction) on the stack (for use later as a return-instruction pointer)" and "then branches to the address in the current code segment specified by the target operand".

The eip register, or the instruction pointer register or the program counter register, is a special register that stores the address of the next instruction to execute. The processor also has a stack register, or stack pointer register, that stores the address for the top of the stack. Pushing to the stack decrements the value of this register to reserve space for a new item and then stores the address of that new item in that space.

For a RET instruction returning to a calling procedure within the current code segment, "the processor pops the return instruction pointer (offset) from the top of the stack into the EIP register and begins program execution at the new instruction pointer".

Stepping through the program:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) n
(lldb) n
(lldb) reg read rax
     rax = 0x0000000000000008

Recursive functions

The following assembly program computes the Fibonacci number of a given integer:

global _main
section .text
fib:
    CMP rdi, 1
    JLE base_case
    PUSH rdi
    DEC rdi
    CALL fib
    POP rdi
    PUSH rax
    SUB rdi, 2
    CALL fib
    MOV rbx, rax
    POP rax
    ADD rax, rbx
    RET
base_case:
    MOV rax, rdi
    RET
_main:
    MOV rdi, 10
    CALL fib
    RET

The PUSH instruction "Decrements the stack pointer and then stores the source operand on the top of the stack". The POP instruction "Loads the value from the top of the stack to the location specified with the destination operand (or explicit opcode) and then increments the stack pointer".

Stepping through the program:

lldb ./a.out
(lldb) b main
(lldb) r
(lldb) n
(lldb) n
(lldb) reg read rax
     rax = 0x0000000000000037

3 * 16 + 7 = 55.

Stepping through the program by hand for n = 2 instead of n = 3 to make it simpler:

MOV rdi, 2 | rdi: 2 | stack: []
PUSH rdi | rdi: 2 | stack: [2]
DEC rdi | rdi: 1 | stack: [2]
CALL fib | rdi: 1 | stack: [POP rdi, 2]
MOV rax, rdi | rdi: 1, rax: 1 | stack: [POP rdi, 2]
RET | rdi: 1, rax: 1 | stack: [2]
POP rdi | rdi: 2, rax: 1 | stack: []
PUSH rax | rdi: 2, rax: 1 | stack: [1]
SUB rdi, 2 | rdi: 0, rax: 1 | stack: [1]
CALL fib | rdi: 0, rax: 1 | stack: [MOV rbx ..., 1]
MOV rax, rdi | rdi: 0, rax: 0 | stack: [MOV rbx ..., 1]
RET | rdi: 0, rax: 0 | stack: [1]
MOV rbx, rax | rdi: 0, rax: 0, rbx: 0 | stack: [1]
POP rax | rdi: 0, rax: 1, rbx: 0 | stack: []
ADD rax, rbx | rdi: 0, rax: 1, rbx: 0 | stack: []

System calls

The following program prints out "Hello, World!" using a system call:

global _main
section .text
_main:    
	MOV rax, 0x02000004  ; system call for write
	MOV rdi, 1  ; file handle 1 is stdout
	MOV rsi, message  ; address of string to output
	MOV rdx, 13  ; number of bytes
	SYSCALL  ; invoke operating system to do the write
	RET
section .data
message:
	DB "Hello, World!"

We can also use a system call to exit:

global _main
section .text
_main:    
	MOV rax, 0x02000004  ; system call for write
	MOV rdi, 1  ; file handle 1 is stdout
	MOV rsi, message  ; address of string to output
	MOV rdx, 13  ; number of bytes
	SYSCALL  ; invoke operating system to do the write
	MOV rax, 0x02000001  ; system call for exit
	XOR rdi, rdi  ; exit code 0
	SYSCALL  ; invoke operating system to exit
section .data
message:
	DB "Hello, World!"

C functions

We create add.asm with the following code:

global	_add
_add:
	MOV rax, rdi
	ADD rax, rsi
	RET

We create main.c with the following code:

#include <stdio.h>

extern int add(int a, int b);

int main() {
	int x = 2;
	int y = 3;
	int result = add(x, y);
	printf("Adding %d + %d = %d\n", x, y, result);
	return 0;
}

We execute it:

nasm -fmacho64 add.asm -o add.o && gcc -arch x86_64 main.c add.o && ./a.out

We can also step through the program:

lldb ./a.out
(lldb) b add
(lldb) r

Sources

"Low‑level programming" (https://csprimer.com/courses/systems/)
https://www.felixcloutier.com/x86/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assembly

Preliminaries

Registers

Basic arithmetic

Floating point arithmetic

Control flow

Sections

Memory

Functions

Recursive functions

System calls

C functions

Sources

Additional sources

FilesExpand file tree

assembly.md

Latest commit

History

assembly.md

File metadata and controls

Assembly

Preliminaries

Registers

Basic arithmetic

Floating point arithmetic

Control flow

Sections

Memory

Functions

Recursive functions

System calls

C functions

Sources

Additional sources