Modules are coming in C++20 but it will take a while before they are widely adopted, optimized and supported by tooling - what can we do right now?
I recently consulted a company on this exact matter - luckily CMake 3.16 was just released and there was no need to resort to 3rd party CMake scripts for precompiled headers and unity builds (thanks to Cristian Adam for the hard work - MR 1, MR 2!!!). Here is what I told them:
Precompiled headers (PCH)
The idea is to precompile a bunch of common header files
- to precompile means that the compiler will parse the C++ headers and save its intermediate representation (IR) into a file, and then when compiling the
.cpp
files of the target that IR will be prepended to them - as if the headers were included - the contents of the PCH are the first thing each translation unit sees - easy to integrate - doesn’t require any C++ code changes
- ~20-30% speedup with GCC/Clang (can be up to 50%+ with MSVC)
- for targets with at least 10
.cpp
files (takes space & time to compile)
What to put in a PCH
- STL & third-party libs like boost (used in at least ~30% of the sources)
- some rarely changing project-specific headers (at least 30% use)
- for example if you have common utilities for logging/etc.
- each time any header which ends up in the PCH is changed - the entire PCH is recompiled along with the entire target which includes it
- careful not to put too much into a PCH - once it reaches ~150-200MB you might start hitting diminishing returns
- how to determine which are the most commonly used header files
- option 1: do a few searches in the codebase/target
<algorithm>
,<vector>
,<boost/asio.hpp>
, etc.- note that some header might be included only in a few other header files, but if those headers go everywhere, then the other header gets included almost everywhere as well
- option 2: - use software to visualize includes & dependencies
- option 1: do a few searches in the codebase/target
How to use
target_precompile_headers(<my_target> PRIVATE my_pch.h)
- the PCH will be included automatically in every
.cpp
file- adding a PCH to a target doesn’t require that you remove the headers in it from all
.cpp
files - the C preprocessor is fast
- adding a PCH to a target doesn’t require that you remove the headers in it from all
- easiest if a single header includes the common ones - example
- you could have per-project precompiled header files or you could reuse a PCH from one CMake target in another - remember that each PCH takes around ~50-200MB and takes some time to compile…
- you could list the headers which you want precompiled directly in the call to
target_precompile_headers
and even set them asPUBLIC
/PRIVATE
selectively so other targets which link to the current one would also precompile those, but I’m old fashioned and prefer to maintain the PCH for each target on my own.
Some problems
- adding a header which was used only in 30% of the
.cpp
files to the precompiled header means that all.cpp
files in the target will now have access to it - in time more files might start depending on it without you even noticing - the code might not build without the PCH anymore- do you care if it compiles successfully without a PCH? If so, make every
.cpp
explicitly include the precompiled header. This would be problematic if the same.cpp
file is used in 2 or more CMake targets with different PCHs but in that case you should probably move that.cpp
into a static lib, compile it only once and link against that!
- do you care if it compiles successfully without a PCH? If so, make every
- if you are using GCC but are using
ccls
/cquery
/clangd
/rtags
or (based on clang) as a language server - those tools might not work because they will try to read the.gch
file produced by GCC - bug report
Unity builds
The idea is to cram the .cpp
files of a CMake target into a few .cpp
files which include the original .cpp
files
- up to 7-8 times faster builds (usually x3 or x4)
- example: a project of 200
.cpp
files might be divided into 16 unity.cpp
(or.cxx
- whatever) files each including about 13 of the original.cpp
files => 16.cpp
files to build in parallel and 16.obj
files to link - the reasons for the speedup are:
- common headers from the different
.cpp
files end up being included and parsed fewer times (this is beneficial even when using PCHs!) - common template instantiations with the same types in separate
.cpp
files (vector<int>
) end up being done in fewer places - the linker has to stitch much fewer
.obj
files in the end - there are a lot less weak symbols to deduplicate (inline
/template functions from headers end up in every.obj
=> linkers leave just 1)- surprisingly incremental builds (changing a single
.cpp
) will probably also be faster instead of slower (even though you compile more.cpp
files together) - precisely because of the reduced number of weak duplicated symbols!
- surprisingly incremental builds (changing a single
- less compiler invocations and less
.obj
files are written to disk
- common headers from the different
- the most reliable way to detect ODR violations
- runtime (final binary) might even be faster! (free LTO) - because the compiler sees more symbols from different
.cpp
files together - A detailed blog post about unity builds and why they make builds faster
How to use
- CMake 3.16 adds the
UNITY_BUILD
target property- you can set this property per target explicitly
set_target_properties(<target> PROPERTIES UNITY_BUILD ON)
- or set it globally:
set(CMAKE_UNITY_BUILD ON)
(or call CMake with-DCMAKE_UNITY_BUILD=ON
) and then you can explicitly disable it for some targets by setting their property toOFF
- you can set this property per target explicitly
- the order in which the
.cpp
files go into the batches depends on the order they were given to a target inadd_library
/add_executable
- if for some reason 2
.cpp
files are hard to compile together they can be separated in different batches by reordering the sources- or use
SKIP_UNITY_BUILD_INCLUSION
to exclude one
- or use
- about 10-20
.cpp
files per unity is the most optimal- this is controlled through the
UNITY_BUILD_BATCH_SIZE
target property - default is 8 (can be set globally withCMAKE_UNITY_BUILD_BATCH_SIZE
) - don’t worry if a target has few
.cpp
files - if it has more than 1 it would benefit from a unity build, + decent build systems like ninja will schedule.obj
files from different targets to be built in parallel
- this is controlled through the
- the unity
.cpp
files will go in the build directory - you don’t have to maintain them or add them to version control
Initial problems when trying to compile a project as unity
- some headers will be missing include guards or
#pragma once
- there will be static globals (or in anonymous namespaces) in different
.cpp
files with identical names which would clash- either rename them or put such globals into an additional namespace - perhaps with the name of the file: for
GraphVisitor.cpp
I would recommendGRAPH_VISITOR_CPP
(putting static symbols inside of a named namespace in a.cpp
keeps their linkage to internal , and the same is true with nesting anonymous namespaces into named ones)
- either rename them or put such globals into an additional namespace - perhaps with the name of the file: for
- there will be symbol ambiguities
- mostly because some
.cpp
file uses a namespace, and then some other.cpp
file which ends up in the same unity.cpp
cannot compile - either remove the
using namespace ...
stuff or fully qualify symbols where necessary (can use::
to mean from global scope)
- mostly because some
- some macros might have to be explicitly
#undef
-ined at the end of the.cpp
files where they are defined/used- take a look at
UNITY_BUILD_CODE_BEFORE_INCLUDE
andUNITY_BUILD_CODE_AFTER_INCLUDE
- or rewrite them to C++ constructs - macros are $h1t anyway
- take a look at
- another more obscure possible problem: if a static library gets built as a unity build => there are fewer
.obj
files stitched together, and if some other target links to the static library and uses only some symbols - from a few of the original.obj
files, it might no longer link because now there are other symbols in the same.obj
file where the needed symbols reside, and those other symbols might need some other symbols which are in some other static library- solution: find what else needs to be linked in
- unity build failures are rare after the initial cleanup (and if everyone uses unity builds locally there wouldn’t be any)
- it took me about ~2 full days to compile about 2000
.cpp
files in 20+ different CMake targets as unity in my current company (NuoDB])
Recommended way to go about it
- I would recommend trying targets 1 by 1 by setting their target property
UNITY_BUILD
to ON - and not to enable it globally directly - start with low batching - first try to get the project to compile with 4
.cpp
files per unity, and then increase - if you desire to have 20
.cpp
files per batch in the end - go to atleast 40 or 50, clean the errors and then move back to 20- => future problems will be less likely - when a new
.cpp
file is added somewhere it changes which.cpp
files get paired together
- => future problems will be less likely - when a new
- unity builds should eventually become the default mode for building
- there should be a separate CI build that checks that the project still compiles not as unity - checking for missing includes.
Some problems
If you use -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
you get a file called compile_commands.json
generated by CMake in the build folder
- that file contains compile commands for each translation unit - with all definitions and includes necessary for parsing the C++
- that file is used by tools such as
ccls
/cquery
/clangd
/rtags
- language servers which are usually integrated with editors and IDEs such asvim
/emacs
/VSCode
for intellisense (code completion, refactoring and syntax highlighting) - when using unity builds the build information there will not contain compile commands for the specific
.cpp
files but only for the actual unity.cpp
files - these tools might stop working correctly (bug report). In my current company we have a wrapper script which does the calls to CMake like this:- calls CMake once with disabled unity builds + generation of the
compile_commands.json
file - calls CMake again with unity builds enabled + no
compile_commands.json
- => the only downside of this is that if later we invoke our build system directly and it detects changes in CMake and reconfigures/regenerates the build files, it will be done with unity enabled & the generation of
compile_commands.json
disabled - perhaps leaving it stale
- calls CMake once with disabled unity builds + generation of the
Some other good tips to make builds faster
- use ninja instead of GNU make - developed for building Google Chrome
- superior scheduler, dependency tracking & change detection - optimal parallelization of building object files and linking targets - just a single standalone portable binary
- CMake can generate ninja build files instead of Makefiles
- just pass
-G "Ninja"
when calling CMake - to build the code either call
cmake --build <path-to-build-dir>
(instead ofmake -j
) or justninja -C <path-to-build-dir>
- just pass
- change your compilers and linkers
- experiment moving from gcc to clang
- move from
ld
(the default linker) togold
(-fuse-ld=gold
) or evenlld
(part of the LLVM project)
- use dynamic linking instead of static - at least for internal builds if concerned about runtime performance
- you could experiment with
-fvisibility-inlines-hidden
- docs
- you could experiment with
- use doctest for unit tests (benchmarks) - migrate from googletest/Catch2
- include-what-you-use
- based on clang - 100% correct parsing - helps identify unnecessary headers & where a simple forward declaration would do
- Bloaty McBloatface - shows you a size profile of binaries
- great to identify where code bloat is coming from and which symbols take the most space - which are the most offending templates?
- look into
extern template
from C++11 (blog)- tells the compiler to not instantiate a specific template (for example
std::vector<int>
) in the current translation unit - diagnosing which templates are a problem is easiest with:
- Bloaty McBloatface, or any of these tools
- tells the compiler to not instantiate a specific template (for example
- caching & distributed builds
- inspecting the physical structure of projects - targets & dependencies
- Graphviz (in CMake) -
cmake --graphviz=<file>
- sourcetrail, or other tools
- Graphviz (in CMake) -
- PIMPL (1, 2), disabling inlining for some functions, rewriting templates… too much effort - do this as a last resort
- on the hardware side - more cores, more RAM… Duuh :D
- use RAM disks (filesystem in your RAM) - every OS supports those. Put the compiler and the temp & output directories there
Final thoughts
If it was up to me most of the techniques listed here would be put to use - from top to bottom - they are sorted based on impact and cost to implement. Slow builds don’t just waste time - they also break the ‘flow’ (context switching) and discourage refactoring and experimentation - how do you put a price on that?
Based on my The Hitchhiker's Guide to Faster Builds
talk (slides).
You can checkout the reddit discussion for this article here.
I’m available for hire to fix your C++ build times - you can expect something along the lines of whatever is in this blog post.
Leave a Comment