Displaying all argv in x64 assembly

Recently I’ve been doing some x64 assembly hacking, and something I had to Google a bit and collect from a few places is how to go over all command-line arguments (colloquially known as argv from C) and do something with them.

I already discussed how arguments get passed into a program in the past (not the C main, mind you, but rather the real entry point of a program – _start), so what was left is just a small matter of implementation. Here it is, in GNU Assembly (gas) syntax for Linux. This is pure assembly code – it does not use the C standard library or runtime at all. It demonstrates a lot of interesting concepts such as reading command-line arguments, issuing Linux system calls and string processing.

#---------------- DATA ----------------#    .data    # We need buf_for_itoa to be large enough to contain a 64-bit integer.    # endbuf_for_itoa will point to the end of buf_for_itoa and is useful    # for passing to itoa.    .set BUFLEN, 32buf_for_itoa:    .space BUFLEN, 0x0    .set endbuf_for_itoa, buf_for_itoa + BUFLEN - 1newline_str:    .asciz "\n"argc_str:    .asciz "argc: "#---------------- CODE ----------------#    .globl _start    .text_start:    # On entry to _start, argc is in (%rsp), argv[0] in 8(%rsp),    # argv[1] in 16(%rsp) and so on.    lea argc_str, %rdi    call print_cstring    mov (%rsp), %r12               # save argc in r12    # Convert the argc value to a string and print it out    mov %r12, %rdi    lea endbuf_for_itoa, %rsi    call itoa    mov %rax, %rdi    call print_cstring    lea newline_str, %rdi    call print_cstring    # In a loop, pick argv[n] for 0 <= n < argc and print it out,    # followed by a newline. r13 holds n.    xor %r13, %r13.L_argv_loop:    mov 8(%rsp, %r13, 8), %rdi      # argv[n] is in (rsp + 8 + 8*n)    call print_cstring    lea newline_str, %rdi    call print_cstring    inc %r13    cmp %r12, %r13    jl .L_argv_loop    # exit(0)    mov $60, %rax    mov $0, %rdi    syscall

This code uses a couple of support functions. The first is print_cstring:

# Function print_cstring#   Print a null-terminated string to stdout.# Arguments:#   rdi     address of string# Returns: voidprint_cstring:    # Find the terminating null    mov %rdi, %r10.L_find_null:    cmpb $0, (%r10)    je .L_end_find_null    inc %r10    jmp .L_find_null.L_end_find_null:    # r10 points to the terminating null. so r10-rdi is the length    sub %rdi, %r10    # Now that we have the length, we can call sys_write    # sys_write(unsigned fd, char* buf, size_t count)    mov $1, %rax    # Populate address of string into rsi first, because the later    # assignment of fd clobbers rdi.    mov %rdi, %rsi    mov $1, %rdi    mov %r10, %rdx    syscall    ret

More interestingly, here is itoa. It’s a bit more general than what I actually use in the main program because it also supports negative numbers. It can convert any number that fits into a 64-bit register. Note the unusual API for receiving and returning the place where the actual string is written. Since it’s very natural for an itoa implementation to emit the digits in reverse, I wanted to avoid actual string reversing by writing the digits into a buffer from the end towards the beginning.

# Function itoa#   Convert an integer to a null-terminated string in memory.#   Assumes that there is enough space allocated in the target#   buffer for the representation of the integer. Since the number itself#   is accepted in the register, its value is bounded.# Arguments:#   rdi:    the integer#   rsi:    address of the *last* byte in the target buffer# Returns:#   rax:    address of the first byte in the target string that#           contains valid information.itoa:    movb $0, (%rsi)        # Write the terminating null and advance.    dec %rsi    # If the input number is negative, we mark it by placing 1 into r9    # and negate it. In the end we check if r9 is 1 and add a '-' in front.    mov $0, %r9    cmp $0, %rdi    jge .L_input_positive    neg %rdi    mov $1, %r9.L_input_positive:    mov %rdi, %rax          # Place the number into rax for the division.    mov $10, %r8            # The base is in r8.L_next_digit:    # Prepare rdx:rax for division by clearing rdx. rax remains from the    # previous div. rax will be rax / 10, rdx will be the next digit to    # write out.    xor %rdx, %rdx    div %r8    # Write the digit to the buffer, in ascii    dec %rsi    add $0x30, %dl    movb %dl, (%rsi)    cmp $0, %rax            # We're done when the quotient is 0.    jne .L_next_digit    # If we marked in r9 that the input is negative, it's time to add that    # '-' in front of the output.    cmp $1, %r9    jne .L_itoa_done    dec %rsi    movb $0x2d, (%rsi).L_itoa_done:    mov %rsi, %rax          # rsi points to the first byte now; return it.    ret

Some notes about the code:

GAS vs. Intel syntax: I used to believe the Intel syntax is better looking, but grew to tolerate GAS because it’s the default used by tools on Linux. After a very short time you get used to it and don’t really mind it any longer. Yes, even the weird indirect addressing syntax (mov 8(%rsp, %r13, 8), %rdi) grows on you. In other words, focus on the code, not syntax.
I could pick any representation for strings, but ended up going with the C-like null-terminated strings. If you look carefully at print_cstring you’ll notice that a length-prefix representation could be better since the write system call doesn’t care about the null and wants the length passed explicitly. However, since real life assembly code often does have to inter-operate with C, null-terminated strings make more sense.
Even though my own functions could use any calling convention, I’m sticking with the System V AMD64 ABI. It’s natural because system calls use it as well w.r.t. argument and return value passing. AFAIU they can also clobber scratch registers so care must be taken to preserve information in registers around system calls.

Creating a tiny ‘Hello World’ executable in assembly

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。