A guide to unity builds

A unity build can cut down build times dramatically and is HIGHLY underrated and easily dismissed by many senior software engineers (just like precompiled headers). In this post we will go over what it is, all its pros and cons, and why a “dirty hack” might be worth it if it speeds up your builds by at least a factor of 2 or perhaps even in the double-digits.

Why care about build times in the first place

Well… time is money! Let’s do the math: assuming an annual salary of 80k $ - waiting for 30 minutes extra a day for builds is 1/16 of the time of a developer ===> 5k $ per year. 200 such employees and we reach 1 million $ annually. But employees bring more value to the company than what they get as a salary (usually atleast x3 - if the company is adequate) - so the costs to the employer are actually bigger.

Now let’s consider the facts that waiting for long builds discourages refactoring and experimentation and leads to mental context switches + distractions which are always expensive - so we reach the only possible conclusion: any time spent reducing build times is worthwhile!

Introduction - a short summary to unity builds

A unity build is when a bunch of source files are #include‘d into a single file which is then compiled:

// unity_file.cpp

#include "widget.cpp"
#include "gui.cpp"
#include "test.cpp"

Also known as: SCU (single compilation unit), amalgamated or jumbo.

The main benefit is lower build times (compile + link) because:

Commonly included headers get parsed/compiled only once.
Less reinstantiation of the same templates: like std::vector<int>.
Less work for the linker (for example not having to remove N-1 copies of the same weak symbol - an inline function defined in a header and included in N source files).
less compiler invocations.

Note that we don’t have to include all sources in one unity file - as an example: 80 source files can be split in 8 unity files with 10 of the original sources included in each of them and then they can be built in parallel on 8 cores.

Why redundant header parsing/compilation is slow:

Here is what happens after including a single header with 2 popular compilers and running only the preprocessor (in terms of file size and lines of code):

header	GCC 7 size	GCC 7 loc	MSVC 2017 size	MSVC 2017 loc
cstdlib	43 kb	1k loc	158 kb	11k loc
cstdio	60 kb	1k loc	251 kb	12k loc
iosfwd	80 kb	1.7k loc	482 kb	23k loc
chrono	180 kb	6k loc	700 kb	31k loc
variant	282 kb	10k loc	1.1 mb	43k loc
vector	320 kb	13k loc	950 kb	45k loc
algorithm	446 kb	16k loc	880 kb	41k loc
string	500 kb	17k loc	1.1 mb	52k loc
optional	660 kb	22k loc	967 kb	37k loc
tuple	700 kb	23k loc	857 kb	33k loc
map	700 kb	24k loc	980 kb	46k loc
iostream	750 kb	26k loc	1.1 mb	52k loc
memory	760 kb	26k loc	857 kb	40k loc
random	1.1 mb	37k loc	1.4 mb	67k loc
functional	1.2 mb	42k loc	1.4 mb	58k loc
all of them	2.2 mb	80k loc	2.1 mb	88k loc

And here are some (common) headers from Boost (version 1.66):

header	GCC 7 size	GCC 7 loc	MSVC 2017 size	MSVC 2017 loc
hana	857 kb	24k loc	1.5 mb	69k loc
optional	1.6 mb	50k loc	2.2 mb	90k loc
variant	2 mb	65k loc	2.5 mb	124k loc
function	2 mb	68k loc	2.6 mb	118k loc
format	2.3 mb	75k loc	3.2 mb	158k loc
signals2	3.7 mb	120k loc	4.7 mb	250k loc
thread	5.8 mb	188k loc	4.8 mb	304k loc
asio	5.9 mb	194k loc	7.6 mb	513k loc
wave	6.5 mb	213k loc	6.7 mb	454k loc
spirit	6.6 mb	207k loc	7.8 mb	563k loc
geometry	9.6 mb	295k loc	9.8 mb	448k loc
all of them	18 mb	560k loc	16 mb	975k loc

The point here is not to discredit Boost - this is an issue with the language itself when building zero-cost abstractions.

So if we have a few 5 kb source files with a 100 lines of code in each (because we write modular code) and we include some of these - we can easily get hundreds of thousands of lines of code (reaching megabyte sizes) for the compiler to go through for each source file of our tiny program. If some headers are commonly included in those source files then by employing the unity build technique we will compile the contents of each header just once - and this is where the biggest gains from unity builds come from.

A common misconception is that unity builds offer gains because of the reduced disk I/O - after the first time a header is read it is cached by the filesystem (they cache very aggressively since a cache miss is a huge hit).

The PROS of unity builds:

Up to 90+% faster (depends on modularity - stitching a few 10k loc files together wouldn’t be much beneficial) - the best gains are with short sources and lots of (heavy) includes.
Same as LTO (link-time optimizations - also LTCG) but even faster than normal full builds! Usually LTO builds take tremendously more time (but there are great improvements in that area such as clang’s ThinLTO).
ODR (One Definition Rule) violations get caught (see this) - there are still no reliable tools for that. Example - the following code will result in a runtime bug since the linker will randomly remove one of the 2 methods and use the other one since they seem to be identical:
```
// a.cpp
struct Foo {
  int method() { return 42; } // implicitly inline
};
```
```
// b.cpp
struct Foo {
  int method() { return 666; } // implicitly inline
};
```
Enforces code hygiene such as include guards (or #pragma once) in headers

The CONS:

Not all valid C++ continues to compile:
- Clashes of symbols with identical names and internal linkage (in anonymous namespaces or static)
```
// a.cpp
namespace {
  int local;
}
```
```
// b.cpp
static int local;
```
- Overload ambiguities (also non-explicit 1 argument constructor…?)
- Using namespaces in sources can be a problem
- Leaked preprocessor identifiers after some source which defines them
Might slow down some workflows:
- Minimal rebuilds - but if a source file can be excluded from the unity ones for faster iteration it should all be OK
- Might interfere with parallel compilation - but that can be tuned by better grouping of the sources to avoid “long poles” in compilation
Might need a lot of RAM depending on how many sources you combine.
One scary caveat is a miscompilation - when the program compiles successfully but in a wrong way (perhaps a better matching overload got chosen somewhere - or something to do with the preprocessor). Example:
```
// a.cpp
struct MyStruct {
  MyStruct(int arg) : data(arg) {}
  int data;
};
int func(MyStruct arg) { return arg.data; }
int main() { return func(42); }
```
```
// b.cpp
int func(int arg) { return arg * 2; }
```
If b.cpp ends up before a.cpp then we would get 84 instead of 42. However I haven’t seen this mentioned anywhere - people don’t run that much into it. Also good tests will definitely help.

How to maintain

We can manually maintain a set of unity source files - or automate that:

It is desirable to have control on:

how many unity source files there are
the order of source files in the unity files
the ability to exclude certain files (if problematic or for iterating over them)

Projects using this technique

Unity builds are used in Ubisoft for almost 14 years! Also WebKit! And Unreal…

There are also efforts by people who build chrome often (and have gotten very good results - parts of it get built in 30% of the original time) to bring native support for unity builds into clang to minimize the code changes needed for the technique (gives unique names to objects with internal linkage (static or in anonymous namespaces), undefines macros between source files, etc.):

Share on

Twitter Facebook Google+ LinkedIn

A guide to unity builds

Viktor Kirilov

Why care about build times in the first place

Introduction - a short summary to unity builds

Why redundant header parsing/compilation is slow:

The PROS of unity builds:

The CONS:

How to maintain

Projects using this technique

Share on

Leave a Comment

You May Also Enjoy

CMake 3.16 added support for precompiled headers & unity builds - what you need to know

Read-Compile-Run-Loop - a tiny REPL for C++

CppCon 2017 trip report

Simple C++ reflection with CMake