GCC vs. Clang/LLVM: An In-Depth Comparison of C/C++ Compilers

Image for post
Image for post

Background

Visual C++, GNU Compiler Collection (GCC), and Clang/Low Level Virtual Machine (LLVM) are three mainstream C/C++ compilers in the industry. Visual C++ provides graphical user interfaces (GUIs) and is easy to debug, but it is not suitable for Linux platforms. Therefore, this document mainly compares GCC with Clang/LLVM.

Significance of a Good Compiler

Modern processors all have superscalar and long pipelines, and complex internal structures, and they support vector extension units in the Complex Instruction Set Computer (CISC) or Reduced Instruction Set Computer (RISC) architecture. For many programs that contain general computing-intensive kernels, programmers can use vector extension commands to greatly improve program execution performance. For example, in matrix and vector operations, the combined multiplication and addition commands are used to improve performance and accuracy. Bit mask commands are used for branch processing in vector operations. However, to achieve the highest performance, programmers and compilers still need to expend a lot of effort to handle tasks with complex memory access modes and non-standard kernels.

GCC Development History

Before learning GCC, you need to first understand the GNU Project. Richard Stallman launched the GNU Project in 1984 to build a UNIX-like open source software system. The GNU operating system has not evolved extensively over time. However, it has incubated many excellent and useful open source software tools, such as Make, Sed, Emacs, Glibc, GDB, and GCC as well. These GNU open source software and Linux kernels together constitute the GNU/Linux system. In the beginning, GCC provided stable and reliable compilers, based on the C programming language, for the GNU system. Its full name is GNU C Compiler. Later, more languages (such as Fortran, Obj-C, and Ada) were supported, and the full name of GCC changed to GNU Compiler Collection.

  • GCC-2.0: released in 1992 and supported C++. Later, the GCC community was split because Richard Stallman defined GCC as a reliable C compiler of the GNU system and thought that GCC at that time was sufficient for the GNU system and the development focus should be shifted from GCC to the GNU system itself. Other major developers hoped to continue improving GCC and make more radical developments and improvements in various aspects. These active developers left the GCC community in 1997 and developed the EGCS fork.
  • GCC-3.0: Obviously, developers generally had a strong desire for good compilers. The EGCS fork developed smoothly and became recognized by more and more developers. Eventually, EGCS was used as the new GCC backbone and GCC-3.0 was released in 2001. The split community was re-merged again, but Richard Stallman’s influence had been weakened to a certain extent. Additionally, the GCC Industrial Committee had begun to decide the development direction of GCC.
  • GCC-4.0: released in 2005. This version was integrated into Tree Serial Storage Architecture (SSA), and GCC evolved to be a modern compiler.
  • GCC-5.0: released in 2015. Later, the GCC version policy was adjusted and a major version was released each year. An unexpected benefit is that the version number corresponds with the year. For example, GCC-7 was released in 2017, and GCC-9 was released in 2019.

Development History of Clang and LLVM

LLVM

LLVM was originated from the research by Chris Lattner on UUIC in 2000. Chris Lattner wanted to create a dynamic compilation technology for all static and dynamic languages. LLVM is a type of open source software developed under the BSD License. The initial version 1.0 was released in 2003. In 2005, Apple Inc. hired Chris Lattner and his team to develop programming languages and compilers for Apple computers, after which the development of LLVM entered the fast lane. Starting from LLVM 2.5, two minor LLVM versions were released every year (generally in March and September). In November 2011, LLVM 3.0 was released to become the default XCode compiler. XCode 5 started to use Clang and LLVM 5.0 by default. The version policy was adjusted for LLVM 5.0 and later versions, and two major versions are released every year. The current stable version is 8.0.

Clang

Clang is designed to provide a frontend compiler that can replace GCC. Apple Inc. (including NeXT later) has been using GCC as the official compiler. GCC has always performed well as a standard compiler in the open source community. However, Apple Inc. has its own requirements for compilation tools. On the one hand, Apple Inc. added many new features for the Objective-C language (or even, later, the C language). However, GCC developers did not accept these features and assigned low priority to support for these features. Later, they were simply divided into two branches for separate development, and consequently the GCC version released by Apple Inc. is far earlier than the official version. On the other hand, the GCC code is highly coupled and hard to be developed separately. Additionally, in later versions, the code quality continues to decrease. However, many functions required by Apple Inc. (such as improved Integrated Development Environment (IDE) support) must call GCC as a module, but GCC never provides such support. Moreover, the GCC Runtime Library Exemptionfundamentally limits the development of LLVM GCC. Also limited by the license, Apple Inc. cannot use LLVM to further improve the code generation quality based on GCC. Therefore, Apple Inc. decided to write the frontend Clang of C, C++, and Objective-C languages from scratch to completely replace GCC.

Clang/LLVM and GCC Community

GCC Community

Like other open source software communities, the GCC community is dominated by free software enthusiasts and hackers. In the process of development, the GCC community management and participation mechanisms are gradually formed today. Currently, the GCC community is a relatively stable and well-defined acquaintance society in which each person has clear roles and duties:

  • GCC Industrial Committee: It manages the GCC community affairs, technology-independent GCC development topics, and the appointment and announcement of reviewers and maintainers. It currently has 13 members.
  • Global maintainers: They dominate GCC development activities. To some extent, they determine the development trend of GCC. Currently, there are 13 global maintainers, who do not all hold office in GCC Industrial Committee.
  • Frontend, middle-end, and backend maintainers: They are the maintainers of frontend, backend, and other modules. They are responsible for the code of the corresponding GCC module, and many of them are the main contributors to the module code. It is worth noting that reviewers are generally classified into this group. The difference is that reviewers cannot approve their own patch, while maintainers can submit their own modifications within their scope of responsibility without approval from reviewers.
  • Contributors: They are the most extensive developer groups in the GCC community. After signing the copyright agreement, any developers can apply for the Write after Approval permission from the community, and then submit the code by themselves.
  • Chip vendors, mainly including Intel, ARM, AMD, and IBM (PowerPC).
  • Specialized vendors, such as CodeSourcery and tool chain service providers like AdaCore based on the Ada language. CodeSourcery had a brilliant history and recruited many famous developers, but declined after it was acquired by Mentor.

LLVM Community

The LLVM community is a noob-friendly compiler community. It quickly responds to the questions of new users and patch reviews. This is also the basis and source for subsequent LLVM Foundation discussions and the adoption of the LLVM Community Code of Conduct, and causes a series of politically correct discussions.

  1. The creation of a package or release version containing LLVM. The association of LLVM with code authorized by all other major open source licenses (including BSD, MIT, GPLv2, and GPLv3).
  2. When distributing LLVM again, you must retain the copyright notice. You cannot delete or replace the copyright header. The binary file containing LLVM must contain the copyright notice.

Performance Comparison between GCC and LLVM

Test Server

Architecture: x86_64
Processor: Intel ® Xeon ® Platinum 8163 CPU @ 2.50 GHz
L1 data cache: 32 KB
L2 cache: 1,024 KB
L3 cache: 33,792 KB
Memory: 800 GB
Operating system: Alibaba Group Enterprise Linux Server release 7.2 (Paladin)
Kernel: 4.9.151–015.ali3000.alios7.x86_64
Compiler: Clang/LLVM 8.0 GCC8.3.1

Benchmark

SPEC CPU 2017 is a set of CPU subsystem test tools for testing the CPU, cache, memory, and compiler. It contains 43 tests of four categories, including SPECspeed 2017 INT and FP that test the integer speed and floating point operation speed and SPECrate 2017 INT and FP that test the integer concurrency rate and floating point concurrency rate. Clang does not support the Fortran language. Therefore, in this example, the C/C ++ programs in the SPEC Speed test set are used to test the single-core performance difference between the binary programs generated by Clang and GCC. The following table lists the SPEC CPU2017 C and C++ sets:

Image for post
Image for post

Test Methods

The LLVM-lnt automation framework is used to perform the test and compare the performance. It runs in the same way as runcpu of SPEC CPU. Before LLVM-lnt runs, cache (echo 3 > /proc/sys/vm/drop_caches) is cleared and then the test dataset runs. Next, the ref dataset runs three times. The average value of the three ref test run results is used as the final result. To reduce performance fluctuations caused by CPU migration or context switch, processes running on the test dataset and ref dataset are bound to a CPU core by using the CPU affinity tool. For the compile time test, this method uses thread 1 to build the test program and compare the test items that have been compiled for a long time. The compile time does not include the linker execution time. It only includes the time when all source files in all test programs are generated.

Compilation Performance Comparison

The GCC compilation process is as follows: read the source file, preprocess the source file, convert it into an IR, optimize and generate an assembly file. Then the assembler generates an object file. Clang and LLVM do not rely on independent compilers, but integrate self-implemented compilers at the backend. The process of generating assembly files is omitted in the process of generating object files. The object file is generated directly from the IR. Besides, compared with the GCC IR, the data structure of LLVM IR is more concise. It occupies less memory during compilation and supports faster traversal. Therefore, Clang and LLVM are advantageous in terms of the compilation time, which is proven by the data obtained from SPEC compilation, as shown in the figure below. Clang reduces the single-thread compilation time by 5% to 10% compared with GCC. Therefore, Clang offers more advantages for the construction of large projects.

Image for post
Image for post
Comparison of SPEC compilation time

Comparison of Execution Performance

Most cloud workloads require that the applications can run in different clusters. When creating these applications, do not specify machine-related parameters. To adapt to the fast iteration caused by demand changes, off-premises workloads must also be debuggable. Therefore, apart from some stable and common libraries that enable high compilation optimization levels, the workload itself has a low compilation and optimization level (O2 or below). To meet this requirement, this document compares the performance of different compilers at the O2 and O3 optimization levels for INT Speed programs, as shown in the following figure:

Image for post
Image for post
Performance comparison of the SPEC CPU2017 INT Speed
Image for post
Image for post
Performance comparison of SPEC CPU2017 FP Speed

Concluding Remarks

From the benchmarking tests above, we can see that Clang offers more advantages for the construction of large projects while GCC is always advantageous in performance optimization. The bla depends on your specific application

Advantages of GCC

  • GCC supports more traditional languages than Clang and LLVM, such as Ada, Fortran, and Go.
  • GCC supports more less-popular architectures, and supported RISC-V earlier than Clang and LLVM.
  • GCC supports more language extensions and more assembly language features than Clang and LLVM. GCC is still the only option for compiling the Linux kernel. Although research on kernel compilation by using Clang and LLVM is also reported in the industry, the kernel cannot be compiled without modifying the source code and compilation parameters.

Advantages of Clang and LLVM

  • Emerging languages are using the LLVM frameworks, such as Swift, Rust, Julia, and Ruby.
  • Clang and LLVM comply with the C and C ++ standards more strictly than GCC. GNU Inline and other problems during GCC upgrade do not occur.
  • Clang also supports some extensions, such as attributes for thread security check.
  • Clang provides additional useful tools, such as scan-build and clang static analyzer for static analysis, clang-format and clang-tidy for syntax analysis, as well as the editor plug-in Clangd.
  • Clang provides more accurate and friendly diagnostic information, and highlights error messages, error lines, error line prompts, and repair suggestions. Clang regards the diagnostic information as a feature. The diagnostic information began to be improved only from GCC 5.0, and became mature in GCC 8.

Original Source

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store