Back to Assembly Language: Demystifying Data Access with Addressing Modes

gravatar
By Ranti
 · 
March 21, 2023
 · 
4 min read

Accessing data efficiently is a crucial aspect of programming, and it's facilitated by understanding various addressing modes. Here, I'll explore some of the different methods to access data, including absolute addressing, pointers, pointer+offset, and indexed addressing.

  • Accessing data using address / addressing modes
  • We can access data using an "intermediate" value
$$ mov \space $23, \% rax \text{ (numeric literal = 23)} $$
  • Register
$$ \text{ add } \% rax, \%rcx \\ \text{ Adding the contents in the %rax into the \%rcx} \\ \text{%rcx will now contain the sum of % rax + % rcx} $$
  • Absolute Address

Absolute Addressing:

Absolute addressing involves specifying the memory address directly. However, in practice, you rarely use absolute addresses due to their dynamic nature. Instead, labels or names are employed for global variables, providing flexibility in memory allocation.

  • For accessing data to and from memory
  • Fetch = retrieving data from memory
  • Store = sending data to memory

In this case, the address is specified on as a number

$$ \text{mov } 1000, \% rdx \\ \text{ - moves the contents at the address 1000 into % rdx} \\ \text{ - moves 64 bits (8 bytes) starting at address 1000 } $$
  • You are never going to know the absolute address of something because it depends on how the programme is loaded into memory
  • You will never write a number as an absolute address but the assembler will let you use labels (names) for global variables

  • Pointer

A pointer contains an address, and dereferencing it allows access to the data at that address. It's important to distinguish between constant values and addresses in assembly language, denoted by the '$' symbol.

  • A register contains an address, and that address is dereferenced

Remember, there is a difference between $1000 and 1000. The '$' dollar sign means it is constant without the dollar sign the assembler thinks it is the address

$$ addq \space \% rbx, \space (\% rax) \\ (\% rax) \text{ Is the place in memory that %rax points to} $$
add %rbx, %rax # is different from
add %rbx, (%rax)
ASM
x = x + 3; // in the same way this is different from
*x = *x + 3;
C
  • Pointer + Offset

This mode allows you to access memory at an offset from the address pointed to by a register. This is particularly useful for working with data structures like structs in C. For example, given a struct NODE with various members, you can access them by adding fixed offsets to the base pointer.

  • Access memory at some offset from where the registers points to
$$ mov \text{12(%rsi), %rbx} \\ \text{ specifies an address that is at %rsi + 12} \\ \text{So, if %rsi contains 1000, the memory location accessed is: 1,000 + 12 = 1012} $$
  • This is especially useful for structs in C
typedef struct node {
    long value;
    struct node *left;
    struct node *right; // All 8 bytes
} NODE;

NODE *p = malloc(sizeof(NODE));
C

Assuming p is in the %rcx register:

$$ \text{If you want a fixed offset from a pointer} \\ ( \% rcx) = 0(\% rcx) \text{ gives us p} \rightarrow \text{value} \\ 8(\%rcx) \text{ gives us p} \rightarrow \text{left} \\ 16(\%rcx) \text{ gives us p} \rightarrow \text{right} $$
  • The offset can also be negative

  • Indexed Addressing

Indexed addressing is employed for efficient access within arrays. It combines a base address, an index, and the size of each element to calculate the target address. This mode is essential for iterating through arrays and performing operations on their elements.

$$ \text{subq %rdx, (%rsi, %rdi, 4)} \\ \text{The address used here is } \% rsi + (\%rdi \times 4) \\ \text{So, if the address at %rsi = 1000 and %rdi = 30 then} \\ \%rsi + (\%rdi \times 4) = 1000 + (30 \times 4) = 1000 + 120 = 1120 \text{ ( This is the address)} $$

So the value in %rdx will be subtracted from whatever is in the 64 bits starting at address 1120

Indexing addresses is useful for accessing addresses inside an array

$$ (\%rsi, \%rdi, 4) \\ \% rsi = \text{Where the array starts} \\ \% rdi = \text{ The index } \\ 4 = \text{Size of each element} $$
$$ (\%r8, \%rbx, 8) \\ \%r8 = \text{Points to start of an array} \\ \%rbx = \text{ Contains the index into the array (must be a 64-bit register)} \\ 8 = \text{ size of each element of the array} $$

Let's consider an example where we want to sum up the elements of an array. Assuming %rdi points to the start of the array, %rsi contains the size of the array, and each element is an int (4 bytes), we can use indexed addressing to iterate through the array efficiently.

Example: Adding up the element of an array

  • Assume: %rdi points to the start of the array
  • Assume that %rsi contains the size (number of elements) of the array
  • Assume that each element is an int (4 bytes)

Conclusion:

Understanding addressing modes is fundamental to efficient data access in programming. Whether it's absolute addressing, pointer dereferencing, pointer+offset, or indexed addressing, each mode has its unique applications. By leveraging these modes appropriately, you can optimise your code for performance and readability.

Addressing modes play a crucial role in low-level programming, and mastering them can greatly enhance your ability to work with memory effectively.

View