How to Convert a C Program to Assembly
This tutorial will discuss converting a C language program into assembly language code.
We will briefly discuss the fundamentals of Assembly and C languages. Later, we will see the conversion of the C program to Assembly code and the de-assembling of an Assembly code.
The Assembly Language
Assembly is a low-level interpreted language. Generally, a statement written in assembly language is translated into a single machine-level instruction.
However, it is much more readable than machine language because it uses mnemonics. The mnemonics are English-like instructions or operation codes.
For example, the mnemonic ADD
is used to add two numbers. Similarly, MOV
is used to perform data movements.
Likewise, CMP
compares two expressions, and JMP
jumps the execution control to some specific label or location marker.
Assembly language is very close to machine (hardware); thus, instructions written in assembly language are very fast. However, the programmer needs to have much more hardware knowledge than a developer of a high-level language.
Assembly language is typically used to write efficient system programs like device drivers, virus/anti-virus programs, embedded system software, and TSR (terminated and stay resident programs).
An assembler must assemble an assembly language program into a machine language program executable on the machine.
The C Language
C is a high-level machine-independent programming language. Usually, C programs don’t require hardware knowledge (only a little knowledge is required).
C has high-level statements and requires a compiler program that translates each statement of C language into one or multiple assembly language statements. For example, a simple instruction in C language, c = a + b
, is translated into the following assembly language statements:
mov edx, DWORD PTR - 12 [rbp] mov eax, DWORD PTR - 8 [rbp] add eax,
edx mov DWORD PTR - 4 [rbp], eax
Here, in the first & second statement value of variables from memory is moved to registers. The add
instruction is adding two register values.
In the fourth statement, the value from the register is moved to a variable in memory.
Besides, the compiler has to do a lot of work, but the programmer’s life is simple working in C language. C language has a broad spectrum of applications, from high-level business applications to low-level utility programs.
Convert a C Program to Assembly Language
Typically, people use the sophisticated integrated environment to write, edit, compile, run, modify, & debug C language programs or the gcc
command to convert the C language program into executable programs.
These tools keep the users unaware of the steps involved in converting a source code written in some high-level language like C into machine executable code. Typically, the following steps are performed in between:
- Pre-Processing - A pre-processor program does three tasks. The first task is to include header files, the second task is to replace macros, and the third task is to remove comments from the source program
- Compiler - In the second step, the compiler translates high-level language programs into assembly language programs
- Assembler - In the third step, the assembler program takes an assembly language program (translated by the compiler) and assembles it into a machine executable form called object code
- Linker - In the fourth step, a linker program attaches compiled library files with the object code to run this program independently
Commands to Convert C Code to an Assembly Equivalent
Typically, command line users type gcc program_name.c
, which generates an executable file (in case of no errors). If the target file name is not given, it is either available with a.out
in the UNIX operating systems family or program_name.exe
in the Windows operating system.
Nevertheless, the gcc
command has a vast list of parameters to perform specific tasks. This tutorial will discuss only -s
and -C
flags.
The -S
flag generates an assembly language program from the C source code. Let’s understand this flag using the following example where we have test.c
as a source file:
// test.c
int main() {
int a = 2, b = 3, c;
c = a + b;
return 0;
}
The following command will generate the target Assembly language code with the extension .S
:
$ gcc -S test.c
$ ls
test.c test.s
The command has not created machine language code; only the Assembly language code is generated. Let’s display the contents of this generated Assembly code using the cat
command in Bash:
$ cat test.s
.file "Test.c"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $2, -12(%rbp)
movl $3, -8(%rbp)
movl -12(%rbp), %edx
movl -8(%rbp), %eax
addl %edx, %eax
movl %eax, -4(%rbp)
...
The generated Assembly code may not be familiar to many programmers who have experience writing Assembly codes for Intel x86 architecture.
If we want the target Assembly code for Intel x86 architectures, the following command will do this for us:
$ gcc -S -masm=intel Test.c
Again, the output will be generated in the Test.s
file, which can be viewed using the cat
command in the Bash terminal. In Windows, we can open it in some editor like Notepad or a better editor.
Anyway, let’s see the contents of the Assembly code generated by the above command:
cat Test.s
.file "Test.c"
.intel_syntax noprefix
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
endbr64
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
mov DWORD PTR -12[rbp], 2
mov DWORD PTR -8[rbp], 3
mov edx, DWORD PTR -12[rbp]
mov eax, DWORD PTR -8[rbp]
add eax, edx
mov DWORD PTR -4[rbp], eax
...
The output is slightly different; the mov
and add
commands are very clear.
De-Assemble an Object Code
Besides converting a C language program into assembly language, one may want to disassemble binary code (machine code) to see the equivalent Assembly language code. We can use the objdump
utility in Linux to do that.
Example:
Assume we execute the gcc -c Test.c
command to compile the Test.c
file in a Bash terminal. It creates an object file (machine language code) with the name Test.o
.
Now, if we want to see re-convert/de-assemble this object code to the equivalent Assembly code, we can do that using the following Bash command:
$ objdump -d Test.o
Test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5 48 89 e5 mov %rsp,%rbp
8: c7 45 f4 02 00 00 00 movl $0x2,-0xc(%rbp)
f: c7 45 f8 03 00 00 00 movl $0x3,-0x8(%rbp)
16: 8b 55 f4 mov -0xc(%rbp),%edx
19: 8b 45 f8 mov -0x8(%rbp),%eax
1c: 01 d0 add %edx,%eax
1e: 89 45 fc mov %eax,-0x4(%rbp)
21: b8 00 00 00 00 mov $0x0,%eax
26: 5d pop %rbp
In this output, the code on the left-hand side is the binary code in hexadecimal. On the right-hand side, the assembly language code in readable form is visible.