Have you ever wondered how the C/C++ compiler (gcc) works to convert your lines of C/C++ code into an executable that can be processed and understood by your Windows, Linux, or macOS machine?
In this article, we will go over all four processes that your C/C++ program takes to compile and produce a file that can be executed on the machine:
During all these following steps, we will use this c/c++ programme as example:
**
* @file main.c
* @author freecoder
* @brief this program print the sum of the integers from 1 to 100
*
* @version 1.0
* @date 10 oct. 2021
*
* @copyright Copyright (c) 2021
*
*/
#include <stdio.h>
/* defines internal constantes */
#define FIRST_NUMBER ((unsigned int)0)
#define LAST_NUMBER ((unsigned int)100)
#define CONSOLE_MSG ((unsigned char *)"The sum equal to: %d\n")
/* main program entry */
int main(int argc, char **argv)
{
/* local variables */
unsigned int uiCtr;
unsigned int uiSum;
/* loop for the firt 100 integers */
for (uiCtr = FIRST_NUMBER; uiCtr < LAST_NUMBER; uiCtr++)
{
/* cumulate the numbers */
uiSum += uiCtr;
}
/* show the sum result message on the console */
printf(CONSOLE_MSG, uiSum);
return 0;
}
#Step 1. Preprocessing
During this step called preprocessing, which consists of replacing all of the preprocessorsin the program in C/C++ roughly, everything that is annotated with the symbol (#):
- Delete all comment lines in the program
- Inclusion of header files and third-party libraries (#include “* .h”)
- Constants and Macros (#define)
- Activate or deactivate program parts with conditional compilation directives (#ifedf #elseif #endif)
If you wish to analyse more closely this first step of preprocessing, you just have to run the program with this gcc option (-E):
gcc -E main.c -o main.i
This stage is completed by creating a *.i file that contains the new program to which the various preprocessing directives have been applied and looks like this:
Article-X git:(master) ✗ gcc -E main.c -o main.i
➜ Article-X git:(master) ✗ ll
total 24K
-rw-r--r-- 1 root root 758 Nov 20 12:28 main.c
-rw-r--r-- 1 root root 17K Nov 20 12:48 main.i
➜ Article-X git:(master) ✗ cat main.i
# 0 "main.c"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2
# 1 "main.c"
# 13 "main.c"
# 1 "/usr/include/stdio.h" 1 3 4
# 27 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/x86_64-linux-gnu/bits/libc-header-start.h" 1 3 4
# 33 "/usr/include/x86_64-linux-gnu/bits/libc-header-start.h" 3 4
# 1 "/usr/include/features.h" 1 3 4
# 461 "/usr/include/features.h" 3 4
# 1 "/usr/include/x86_64-linux-gnu/sys/cdefs.h" 1 3 4
# 452 "/usr/include/x86_64-linux-gnu/sys/cdefs.h" 3 4
# 1 "/usr/include/x86_64-linux-gnu/bits/wordsize.h" 1 3 4
# 453 "/usr/include/x86_64-linux-gnu/sys/cdefs.h" 2 3 4
# 1 "/usr/include/x86_64-linux-gnu/bits/long-double.h" 1 3 4
# 454 "/usr/include/x86_64-linux-gnu/sys/cdefs.h" 2 3 4
# 462 "/usr/include/features.h" 2 3 4
# 485 "/usr/include/features.h" 3 4
# 1 "/usr/include/x86_64-linux-gnu/gnu/stubs.h" 1 3 4
# 10 "/usr/include/x86_64-linux-gnu/gnu/stubs.h" 3 4
# 1 "/usr/include/x86_64-linux-gnu/gnu/stubs-64.h" 1 3 4
# 11 "/usr/include/x86_64-linux-gnu/gnu/stubs.h" 2 3 4
# 486 "/usr/include/features.h" 2 3 4
# 34 "/usr/include/x86_64-linux-gnu/bits/libc-header-start.h" 2 3 4
# 28 "/usr/include/stdio.h" 2 3 4
Note:
-E main.c : Generate the preprocessing code from the C/C++ program.
-o main.i : Specify the output file.
#Step 2. Compiling
This step is known as compilation phase, and it consists of determining whether or not the program in c/c++ preprocessing (generated in the previous step) contains any syntax errors before executing a set of assembler instructions involving machine (CPU) resources such as registers, memory, stacks, and so on.
Following this step, we’ll have an intermediate file with assembly code.
To learn more about this assembly file, run the following gcc option (-S) on the file created in the previous stage (main.i):
gcc -S main.i -o main.s
Then you can look at the output from the console:
➜ Article-X git:(master) ✗ ll
total 24K
-rw-r--r-- 1 root root 758 Nov 20 12:28 main.c
-rw-r--r-- 1 root root 17K Nov 20 12:48 main.i
➜ Article-X git:(master) ✗ gcc -S main.i -o main.s
➜ Article-X git:(master) ✗ ll
total 28K
-rw-r--r-- 1 root root 758 Nov 20 12:28 main.c
-rw-r--r-- 1 root root 17K Nov 20 12:48 main.i
-rw-r--r-- 1 root root 677 Nov 20 22:14 main.s
➜ Article-X git:(master) ✗ cat main.s
.file "main.c"
.text
.section .rodata
.LC0:
.string "The sum equal to: %d\n"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $32, %rsp
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movl $0, -4(%rbp)
jmp .L2
.L3:
movl -4(%rbp), %eax
addl %eax, -8(%rbp)
addl $1, -4(%rbp)
.L2:
cmpl $99, -4(%rbp)
jbe .L3
movl -8(%rbp), %eax
movl %eax, %esi
movl $.LC0, %edi
movl $0, %eax
call printf
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 11.1.0"
.section .note.GNU-stack,"",@progbits
➜ Article-X git:(master) ✗
Note:
-S main.i : Generate the assembly file form the input preprocessing code (main.i).
-o main.s : Specify the output file (main.s).
#Step 3. Assembling
All of the assembly instructions from the previous stage will be compiled to a low level machine code, also known as object or binary in this step, with the exception of the functions of the tier libraries, which will not be controlled at this level (e.g: printf()).
To complete this phase, the gcc will be invoked with the following argument:
gcc -C main.s -o main.o
The result on the output console as shown bellow:
➜ Article-X git:(master) ✗ ll
total 28K
-rw-r--r-- 1 root root 758 Nov 20 12:28 main.c
-rw-r--r-- 1 root root 17K Nov 20 12:48 main.i
-rw-r--r-- 1 root root 677 Nov 20 22:14 main.s
➜ Article-X git:(master) ✗ gcc -C main.s -o main.o
➜ Article-X git:(master) ✗ ll
total 44K
-rw-r--r-- 1 root root 758 Nov 20 12:28 main.c
-rw-r--r-- 1 root root 17K Nov 20 12:48 main.i
-rwxr-xr-x 1 root root 16K Nov 20 22:37 main.o
-rw-r--r-- 1 root root 677 Nov 20 22:14 main.s
Note:
-C main.s : Generate the object file form the input assembly code (main.s).
-o main.o : Specify the output file (main.o).
It is possible to check the content of the generated file by:
➜ Article-X git:(master) ✗ objdump -d main.o
main.o: file format elf64-x86-64
Disassembly of section .init:
0000000000401000 <_init>:
401000: 48 83 ec 08 sub $0x8,%rsp
401004: 48 8b 05 ed 2f 00 00 mov 0x2fed(%rip),%rax # 403ff8 <__gmon_start__>
40100b: 48 85 c0 test %rax,%rax
40100e: 74 02 je 401012 <_init+0x12>
401010: ff d0 callq *%rax
401012: 48 83 c4 08 add $0x8,%rsp
401016: c3 retq
Disassembly of section .plt:
0000000000401020 <.plt>:
401020: ff 35 e2 2f 00 00 pushq 0x2fe2(%rip) # 404008 <_GLOBAL_OFFSET_TABLE_+0x8>
401026: ff 25 e4 2f 00 00 jmpq *0x2fe4(%rip) # 404010 <_GLOBAL_OFFSET_TABLE_+0x10>
40102c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000401030 <printf@plt>:
401030: ff 25 e2 2f 00 00 jmpq *0x2fe2(%rip) # 404018 <printf@GLIBC_2.2.5>
401036: 68 00 00 00 00 pushq $0x0
40103b: e9 e0 ff ff ff jmpq 401020 <.plt>
Disassembly of section .text:
0000000000401040 <_start>:
401040: 31 ed xor %ebp,%ebp
401042: 49 89 d1 mov %rdx,%r9
401045: 5e pop %rsi
401046: 48 89 e2 mov %rsp,%rdx
401049: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
40104d: 50 push %rax
40104e: 54 push %rsp
40104f: 49 c7 c0 d0 11 40 00 mov $0x4011d0,%r8
401056: 48 c7 c1 70 11 40 00 mov $0x401170,%rcx
40105d: 48 c7 c7 26 11 40 00 mov $0x401126,%rdi
401064: ff 15 86 2f 00 00 callq *0x2f86(%rip) # 403ff0 <__libc_start_main@GLIBC_2.2.5>
40106a: f4 hlt
40106b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
....
#Step 4. Linking
The compiler will take the file (main.o) generated in the previous phase and replace the functions of the external libraries (eg: printf ()) and link them with their original definitions in this final stage.
This can be verified by using the gcc command as follows:
gcc main.c -o main
The result on output console is as shown bellow:
➜ Article-X git:(master) ✗ gcc main.c -o main
➜ Article-X git:(master) ✗ ll
total 60K
-rwxr-xr-x 1 root root 16K Nov 20 23:20 main
-rw-r--r-- 1 root root 758 Nov 20 12:28 main.c
-rw-r--r-- 1 root root 17K Nov 20 12:48 main.i
-rwxr-xr-x 1 root root 16K Nov 20 22:37 main.o
-rw-r--r-- 1 root root 677 Nov 20 22:14 main.s
➜ Article-X git:(master) ✗ size main
text data bss dec hex filename
1202 560 8 1770 6ea main
Note:
gcc main.c : Generate the executable file from the input C/C++ code (main.c).
-o main : Specify the output executable file (main).
It should be noted that the gcc compiler links external library functions dynamically rather than statically.
Conclusion
I hope this short article sheds some light on the dark side of C/C++ compilers and all the gibberish behind them.
I let you test and put this into practice, and from now if your program fails to compile, I’ll let you understand which of the four steps failed 🙂